Am 12. März 2012 21:20 schrieb Thomas Neidhart <thomas.neidh...@gmail.com>: > Hi, > > on the weekend, I started to work on issue LANG-680 > (https://issues.apache.org/jira/browse/LANG-680), which is about adding > support for finding the longest common substring of a set of Strings. > > Suffix Trees are a standard data structure to efficiently solve this > problem, and I created a variant of Ukkonen's algorithm to construct > such a tree in linear time. > > After some research to create a generalized version of it (to support > multiple strings), I found a very nice implementation from Alessandro > Bahgat (https://github.com/abahgat/suffixtree), which is very similar to > my own, and contacted the author if he is interested to put his code > under apache license (which he did already). > > So the question basically is how to proceed, as adding a data structure > like a Suffix Tree to commons-lang may be out-of-scope. On the other > hand there are several string related problems that can be efficiently > solved with a Suffix Tree: > > * longest common substring > * longest repeated substring > * exact string set matching > * palindrom finding > * finding frequent substrings >
Maybe it would make sense to create an new class "SubstringUtils" instead of adding all that logic to StringUtils (which already has a fair amount of funtionality). > Implementing a longest common substring for two strings is trivial > (using dynamic programming ~20 lines of code), but for a set of strings > a suffix tree would be needed. > > Henri already outlined a possible API as comment to the issue, which I > like, so what do other people think about adding a suffix tree to > commons-lang? > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org