On 03/13/2012 08:55 AM, Luc Maisonobe wrote: > Le 13/03/2012 00:53, James Carman a écrit : >> A lot of bioinformaticians would love us if we added this!
I picked this topic up as I find it interesting to myself and it would be a useful addition for many other people too I guess, but from what I have seen so far, bioinformaticians wouldn't be necessarily impressed by that ;-). Afaik they have pretty good tools, and there exist special algorithms to compute suffix trees for really large strings in clusters or on disk as they wont fit in memory anymore. > In the same spirit, I know an implementation of the Myers difference > algorithm that runs on any object implementing equals and also provides > an API for browsing the "edit script" resulting from the comparison. > This allows for example to retrieve only the shared elements, or only > the ones in the first or the second sequence, or "running" the script, > or whatever. > > If you consider this could be a good addition to [lang] or another > component ([graph] ?) I can ask for a grant for this. this would be a perfect companion for the longest common substring problem, the o.a.c.l.text package looks like a good fit for these things imho. Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org