Hi, on the weekend, I started to work on issue LANG-680 (https://issues.apache.org/jira/browse/LANG-680), which is about adding support for finding the longest common substring of a set of Strings.
Suffix Trees are a standard data structure to efficiently solve this problem, and I created a variant of Ukkonen's algorithm to construct such a tree in linear time. After some research to create a generalized version of it (to support multiple strings), I found a very nice implementation from Alessandro Bahgat (https://github.com/abahgat/suffixtree), which is very similar to my own, and contacted the author if he is interested to put his code under apache license (which he did already). So the question basically is how to proceed, as adding a data structure like a Suffix Tree to commons-lang may be out-of-scope. On the other hand there are several string related problems that can be efficiently solved with a Suffix Tree: * longest common substring * longest repeated substring * exact string set matching * palindrom finding * finding frequent substrings Implementing a longest common substring for two strings is trivial (using dynamic programming ~20 lines of code), but for a set of strings a suffix tree would be needed. Henri already outlined a possible API as comment to the issue, which I like, so what do other people think about adding a suffix tree to commons-lang? Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org