[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library
Bryan O'Sullivan wrote: ChrisK wrote: That is almost certainly because the algorithm expects the source string to have a unique character at its end. Chris is correct. I'll ensure that the docs make this clear. Apologies, I should have thought of this myself. Thanks. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library
Bryan O'Sullivan wrote: I just posted a library named suffixtree to Hackage. http://www.serpentine.com/software/suffixtree/ It implements Giegerich and Kurtz's lazy construction algorithm, with a few tweaks for better performance and resource usage. API docs: http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html I've tested it on multi-megabyte input strings. I think I found a bug: import qualified Data.SuffixTree as T T.countRepeats ab (T.construct abab) 1 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library
Gleb Alexeyev wrote: Bryan O'Sullivan wrote: I just posted a library named suffixtree to Hackage. http://www.serpentine.com/software/suffixtree/ It implements Giegerich and Kurtz's lazy construction algorithm, with a few tweaks for better performance and resource usage. API docs: http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html I've tested it on multi-megabyte input strings. I think I found a bug: import qualified Data.SuffixTree as T T.countRepeats ab (T.construct abab) 1 That is almost certainly because the algorithm expects the source string to have a unique character at its end. The library should either make that clear or add such a character on its own. Otherwise the ab suffix is a prefix of the abab suffix and the shorter one gets lost. If you end in $ then ab$ cannot merge with abab$ and there are no distinct suffixes a and b for which (isPrefixOf a b) is true. Example: *Data.SuffixTree countRepeats ab (construct abab) 1 *Data.SuffixTree countRepeats ab (construct abab$) 2 -- Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library
ChrisK wrote: That is almost certainly because the algorithm expects the source string to have a unique character at its end. Chris is correct. I'll ensure that the docs make this clear. b ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe