Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie: > From: "Branimir Maksimovic" <[EMAIL PROTECTED]> > > >To: [EMAIL PROTECTED] > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Thu, 15 Dec 2005 00:55:02 +0000 > > > >>From: Daniel Fischer <[EMAIL PROTECTED]> > >>To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >>CC: Haskell-Cafe@haskell.org > >>Subject: Re: [Haskell-cafe] Substring replacements > >>Date: Wed, 14 Dec 2005 20:40:06 +0100 > >> > >>Hi Bane, > >> > >>nice algorithm. Since comparing chars _is_ cheap, it is to be expected > >>that > >>all the hash-rotating is far more costly for short search patterns. The > >>longer the pattern, the better this gets, I think -- though nowhere near > >>KMP > >>(or would it?). > > > >Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore > >algorithm yet, though. But I think it would be > >difficult to implement it in Haskell efficiently as it searches backwards > >and jumps around, and we want memory savings. > >Though, I even didn't tried yet, but it is certainly very interesting. > > Forget what I've said. > Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes > forward, but when it finds last character in pattern, than starts to search > backwards. > This can be implemented easilly as Haskell lists naturaly reverse order > when putting from one list to other. > Heh, never say never :) > As I see from documents Boyer-Moore has best performance on average > and should be better than KMP. > > Greetings,Bane. > Well, I also thought that all the jumping around in Boyer-Moore wasn't too good (after each shift we must bite off a chunk from the remaining input, pushing that onto the stack, which costs something). But I gave it a try today and here's what I came up with:
import Data.List (tails) import Data.Map (Map) import qualified Data.Map as Map import Data.Array.Unboxed searchRep :: String -> String -> String -> String searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str where len = length src len1 = len-1 pat :: UArray Int Char pat = listArray (0,len1) src ch = pat!len1 badChar :: Map Char Int badChar = Map.fromList $ zip src [0 .. ] getBc c = case Map.lookup c badChar of Just n -> n Nothing -> -1 suffs :: UArray Int Int suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs where crs = reverse src pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys pr n _ _ = n bmGs0 :: UArray Int Int bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j <- [k' .. k-1]] help = [k | k <- [0 .. len], k == len || suffs!k == len-k] bmGs :: UArray Int Int bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]] run by "" = reverse by run by (c:cs) | c == ch = process (c:by) cs | otherwise = run (c:by) cs roll n xs ys | n <= 0 = (xs, ys) roll n xs (y:ys) = roll (n-1) (y:xs) ys roll _ xs "" = (xs, "") walk n "" = (n,"") walk n st@(c:cs) | n < 0 = (n,st) | c == pat!n = walk (n-1) cs | otherwise = (n,st) process con left | i < 0 = reverse pass ++ rp ++ run "" left | otherwise = {- bye ++ -} run ncon nleft where (i,pass) = walk len1 con d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass)) -- bye = reverse $! drop (len-d) con (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left it's not as fast as KMP for the tests, but not too bad. Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a match (if any), we'd be better off relieving our memory with 'bye', I think. Any improvements are welcome, certainly some of you can do much better. Cheers, Daniel P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat fussy. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe