A lot of bioinformaticians would love us if we added this!
On Mar 12, 2012 4:20 PM, "Thomas Neidhart" <thomas.neidh...@gmail.com>
wrote:

> Hi,
>
> on the weekend, I started to work on issue LANG-680
> (https://issues.apache.org/jira/browse/LANG-680), which is about adding
> support for finding the longest common substring of a set of Strings.
>
> Suffix Trees are a standard data structure to efficiently solve this
> problem, and I created a variant of Ukkonen's algorithm to construct
> such a tree in linear time.
>
> After some research to create a generalized version of it (to support
> multiple strings), I found a very nice implementation from Alessandro
> Bahgat (https://github.com/abahgat/suffixtree), which is very similar to
> my own, and contacted the author if he is interested to put his code
> under apache license (which he did already).
>
> So the question basically is how to proceed, as adding a data structure
> like a Suffix Tree to commons-lang may be out-of-scope. On the other
> hand there are several string related problems that can be efficiently
> solved with a Suffix Tree:
>
>  * longest common substring
>  * longest repeated substring
>  * exact string set matching
>  * palindrom finding
>  * finding frequent substrings
>
> Implementing a longest common substring for two strings is trivial
> (using dynamic programming ~20 lines of code), but for a set of strings
> a suffix tree would be needed.
>
> Henri already outlined a possible API as comment to the issue, which I
> like, so what do other people think about adding a suffix tree to
> commons-lang?
>
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to