[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library

2007-08-28 Thread Gleb Alexeyev

Bryan O'Sullivan wrote:

ChrisK wrote:

That is almost certainly because the algorithm expects the source 
string to have

a unique character at its end.


Chris is correct.  I'll ensure that the docs make this clear.


Apologies, I should have thought of this myself.

Thanks.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library

2007-08-27 Thread Gleb Alexeyev

Bryan O'Sullivan wrote:

I just posted a library named suffixtree to Hackage.

http://www.serpentine.com/software/suffixtree/

It implements Giegerich and Kurtz's lazy construction algorithm, with a 
few tweaks for better performance and resource usage.


API docs:

http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html

I've tested it on multi-megabyte input strings.


I think I found a bug:
import qualified Data.SuffixTree as T

 T.countRepeats ab (T.construct abab)
1

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library

2007-08-27 Thread ChrisK
Gleb Alexeyev wrote:
 Bryan O'Sullivan wrote:
 I just posted a library named suffixtree to Hackage.

 http://www.serpentine.com/software/suffixtree/

 It implements Giegerich and Kurtz's lazy construction algorithm, with
 a few tweaks for better performance and resource usage.

 API docs:

 http://darcs.serpentine.com/suffixtree/dist/doc/html/Data-SuffixTree.html

 I've tested it on multi-megabyte input strings.

 I think I found a bug:
 import qualified Data.SuffixTree as T
 
 T.countRepeats ab (T.construct abab)
 1


That is almost certainly because the algorithm expects the source string to have
a unique character at its end.

The library should either make that clear or add such a character on its own.

Otherwise the ab suffix is a prefix of the abab suffix and the shorter one
gets lost.  If you end in $ then ab$ cannot merge with abab$ and there are
no distinct suffixes a and b for which (isPrefixOf a b) is true.

Example:

 *Data.SuffixTree countRepeats ab (construct abab)
 1
 *Data.SuffixTree countRepeats ab (construct abab$)
 2

-- 
Chris

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: [ANN] An efficient lazy suffix tree library

2007-08-27 Thread Bryan O'Sullivan

ChrisK wrote:


That is almost certainly because the algorithm expects the source string to have
a unique character at its end.


Chris is correct.  I'll ensure that the docs make this clear.

b
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe