Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Justin Bailey
On Nov 7, 2007 2:21 PM, Bryan O'Sullivan <[EMAIL PROTECTED]> wrote:
> Chris mentioned that he did, but I haven't had time to write anything
> benchmarky yet.

I used the attached program to benchmark the various functions against
"endo.dna"[1], a 7 MB file that came with this year's ICFP contest. It
appends a pattern that occurs nowhere in the file to the end of that
file and then searches for it. Strict and lazy bytestring searches
using KMP are performed, plus a search using the existing bytestring
searches and using a List.

You'll have to change the import from Data.ByteString.KMP for it to
compile but otherwise it should work out of the box..

Justin

[1] http://www.icfpcontest.org/endo.zip
module Main

where

import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import qualified Data.ByteString.KMP as K
import Data.List (isInfixOf)

main =
  do
testStr <- readFile "endo.dna" >>= \s -> return $ s ++ searchStr
lazyTestStr <- L.readFile "endo.dna" >>= \s -> return $ L.append s lazySearchStr
strictTestStr <- S.readFile "endo.dna" >>= \s -> return $ S.append s strictSearchStr
putStrLn $ ("(kmpMatchLL): " ++ show ({-# SCC "kmpMatchLL" #-} K.kmpMatchLL lazySearchStr lazyTestStr))
putStrLn $ ("(kmpMatchSS): " ++ show ({-# SCC "main_kmpMatchSS" #-} K.kmpMatchSS strictSearchStr strictTestStr))
putStrLn $ ("(strict): " ++ show ({-# SCC "main_findStrict" #-} S.findSubstring strictSearchStr strictTestStr))
putStrLn $ ("(naive): " ++ show ({-# SCC "main_findSubstringLazy" #-} findSubstringLazy lazySearchStr lazyTestStr))
putStrLn $ ("(list) found: " ++ show ({-# SCC "main_findList" #-} searchStr `isInfixOf` testStr))
putStrLn "Done!"


searchStr = "IFPIFPIFPIFPIFPIFPIFPIFP"
lazySearchStr = toLazyBS searchStr
strictSearchStr = toStrictBS searchStr

toLazyBS = L.pack . map (toEnum . fromEnum) 
toStrictBS = S.pack . map (toEnum . fromEnum)

findSubstringLazy :: L.ByteString -> L.ByteString -> Maybe Int
findSubstringLazy !test !big = go big 0
where
go !s !n | test `L.isPrefixOf` s = Just n
 | L.null s  = Nothing
 | otherwise = go (L.tail s) (n+1)

  
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Bryan O'Sullivan

Don Stewart wrote:


Do we have any benchmarks, for say, 1G files, versus linear, naive
(strict) search?


Chris mentioned that he did, but I haven't had time to write anything 
benchmarky yet.


http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Don Stewart
bos:
> I've packaged up the fast Boyer-Moore and Knuth-Morris-Pratt code that 
> Chris Kuklewicz posted a few months ago:
> 
>   http://article.gmane.org/gmane.comp.lang.haskell.libraries/7363
> 
> The consensus at the time was that the code was not ready for rolling 
> into the bytestring package, but now it's easy to install and start 
> working with.
> 
> API docs:
> 
>   http://darcs.serpentine.com/stringsearch/dist/doc/html/stringsearch/
> 
> Patches against the darcs repo welcome:
> 
>   darcs get http://darcs.serpentine.com/stringsearch
> 
> Credit to Justin Bailey, Daniel Fischer, and Chris Kuklewicz for their 
> hard work.
> 
> (Currently only tested against GHC 6.6.1, FYI.)

Do we have any benchmarks, for say, 1G files, versus linear, naive
(strict) search?

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ByteString search code available in easy-to-digest form

2007-11-07 Thread Bryan O'Sullivan
I've packaged up the fast Boyer-Moore and Knuth-Morris-Pratt code that 
Chris Kuklewicz posted a few months ago:


  http://article.gmane.org/gmane.comp.lang.haskell.libraries/7363

The consensus at the time was that the code was not ready for rolling 
into the bytestring package, but now it's easy to install and start 
working with.


API docs:

  http://darcs.serpentine.com/stringsearch/dist/doc/html/stringsearch/

Patches against the darcs repo welcome:

  darcs get http://darcs.serpentine.com/stringsearch

Credit to Justin Bailey, Daniel Fischer, and Chris Kuklewicz for their 
hard work.


(Currently only tested against GHC 6.6.1, FYI.)

http://www.haskell.org/mailman/listinfo/haskell-cafe