A lot of bioinformaticians would love us if we added this! On Mar 12, 2012 4:20 PM, "Thomas Neidhart" <[email protected]> wrote:
> Hi, > > on the weekend, I started to work on issue LANG-680 > (https://issues.apache.org/jira/browse/LANG-680), which is about adding > support for finding the longest common substring of a set of Strings. > > Suffix Trees are a standard data structure to efficiently solve this > problem, and I created a variant of Ukkonen's algorithm to construct > such a tree in linear time. > > After some research to create a generalized version of it (to support > multiple strings), I found a very nice implementation from Alessandro > Bahgat (https://github.com/abahgat/suffixtree), which is very similar to > my own, and contacted the author if he is interested to put his code > under apache license (which he did already). > > So the question basically is how to proceed, as adding a data structure > like a Suffix Tree to commons-lang may be out-of-scope. On the other > hand there are several string related problems that can be efficiently > solved with a Suffix Tree: > > * longest common substring > * longest repeated substring > * exact string set matching > * palindrom finding > * finding frequent substrings > > Implementing a longest common substring for two strings is trivial > (using dynamic programming ~20 lines of code), but for a set of strings > a suffix tree would be needed. > > Henri already outlined a possible API as comment to the issue, which I > like, so what do other people think about adding a suffix tree to > commons-lang? > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
