Re: [Haskell-cafe] Substring replacements
Hello Bulat, I'm not sure what your point is, let's try to enlighten me. Am Mittwoch, 21. Dezember 2005 16:30 schrieben Sie: > Hello Daniel, > > Wednesday, December 21, 2005, 5:20:18 PM, you wrote: > > DF> ordinarily, on my computer, your version of straightforward is 10-15% > faster DF> than KMP (search-patterns are now supplied on the command line > -- which has DF> no big impact; > > of course. the search pattern can be compiled into the search > algorithm just at the time of program execution. just for example, if > you have code > > main = do [x] <- getArgs > y <- return $ map (\n -> fac $ read x) [1..10^6] > > then `fac $ read x` will be computed just one time. in this time, if > your perform many searches with the same pattern - compiler must > initialize array just one time and use it for all searches > Errrh, what here? If I want to replace each occurence of a pattern within a String, of course I want the arrays built only once, destroying and rebuilding them would be deliberate stupidity, that can't be your point, certainly. And also if we interactively search and replace, as in an editor, we'd hold on to the arrays until we get the message 'no more of that, next pattern'. > > DF> searched string is " able sea..."; all compiled with -O2; > DF> NOINLINE makes no difference -- at least with optimisations on -- > > why? i think that is the right way to check speed of optimized program > without pre-compiling search pattern to search/replace algorithm at > runtime ? What has NOINLINE to do with precompiling the search pattern? BTW, I think, the KMP-algorithm is too large to be inlined anyway (maybe, if we explicitly asked for it), so I wouldn't expect any NOINLINE-effect in the first place. And about RunTimeCompilation, I don't quite understand the Wiki-page, what I imagined might be the point there is that, if I know the search pattern before, I might import the general algorithm, write a function searchReplacePattern = searchReplace "pattern" and the arrays might then be built at compile-time and included in the object code -- but I've no idea whether compilers are clever enough to do that (I believe it should be possible to write compilers which would do that, but is it worth the trouble -- or is that sort of optimisation easy and routinely done?) -- but why that would be called RunTimeCompilation, I cannot imagine. So then, the thing that might make a difference, would be not to give the search pattern at compile-time, which I did. Of course, since we search a long String, the time needed to build the arrays is minute in relation to the time needed to traverse the String, but in the more realistic situation of a shorter String (50 kB instead of 48 MB), it would be significant. > > DF> ; without optimisations, KMP suffers far worse than straightforward). > > this test is meaningless Well, it's not really a test for the algorithm, but for ghc's optimiser and I've forgotten the exact numbers, but KMP compiled without optimisation is much much, much slower than with -O2, so the optimiser does a really great job. > > >> KMP is O(m) while straightforward is O(m*n). > > DF> Where m is the length of the input and n is the length of the > searched-for DF> pattern, I think? > DF> But these are worst-case complexities, I believe, ordinarily, > straightforward DF> will be O(m), too. > > and have longer init time (and take more space), so on typical > searches it will be worser than trivial algorithm I believe you've misread here. I'm saying that while the worst case complexity for the straighforward (or trivial) algorithm is O(n*m) -- as witnessed by my test, which was deliberately designed to be bad for that --, usually, in most real situations, that will have O(m) complexity, too. And I -- like you -- believe that for typical searches the straightforward algorithm will be faster (at least with a clever implementation, like yours). Cheers, Daniel P.S.: I'd really appreciate another attempt at explaining RunTimeCompilation to me (might well be an URL of a paper or something). ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin <[EMAIL PROTECTED]> Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: haskell-cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 23 Dec 2005 11:32:01 +0300 Hello Branimir, Wednesday, December 21, 2005, 10:18:43 AM, you wrote: >>try to add >> >>{-# NOINLINE replace #-} >> >>to both programs and repeat comparision BM> These are tests: BM> No optimisations (no -O): NOINLINE just prevents RunTimeCompilation (see wiki page for details), so this way you will test speed of "replace" on previously unknown string. disabling optimization says nothing about real speed of optimized program, which searches for the many different strings I got it. These tests were with NOINLINE in both cases but I didn;t saw any speed difference in results as actually replace (straight) and searchReplace (KMP) is just called for two differnet strings. Perhaps if I call that for long list of short patterns patterns on short string, test would display different results (INLINE wouldn't help). I'll try that next. Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Substring replacements
Hello Branimir, Wednesday, December 21, 2005, 10:18:43 AM, you wrote: >>try to add >> >>{-# NOINLINE replace #-} >> >>to both programs and repeat comparision BM> These are tests: BM> No optimisations (no -O): NOINLINE just prevents RunTimeCompilation (see wiki page for details), so this way you will test speed of "replace" on previously unknown string. disabling optimization says nothing about real speed of optimized program, which searches for the many different strings -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Am Mittwoch, 21. Dezember 2005 15:20 schrieb Daniel Fischer: > > >i'm 90% sure that straightforward method must be faster for one-time > > >searches. > > I'm far less sure of that. If you have a really short search-pattern, I > think that probably straightforward search is indeed faster (at least, if > the search-pattern isn't supplied at compile-time). But if you have a > pattern of length 100, say, and lots of relatively long partial matches, > chances are that KMP will move on something like 50 chars at once, which > would have to be re-checked by straightforward. I've no idea, when that > would outweigh the extra time of building the KMP-arrays, though. In my > test -- extremely unrealistic, but provides a more or less worst case > example -- > straightforward must make roughly half a million character comparisons to > pass each 'c', while KMP does 2000 (+/-2) (one way through the array and > back), so there are situations where KMP definitely is faster. But > ordinarily, on my computer, your version of straightforward is 10-15% > faster than KMP (search-patterns are now supplied on the command line -- > which has no big impact; searched string is " able sea..."; all compiled > with -O2; NOINLINE makes no difference -- at least with optimisations on > --; without optimisations, KMP suffers far worse than straightforward). > I've now tested with main = do args <- getArgs n <- case args of (ar:_) -> readIO ar `catch` (\_ -> return 400) _ -> return 500 let src = replicate n 'r' dst = " # " str = replicate (n-1) 'r' ++ 'c': replicate n 'r' k = if n < 200 then ceiling (5e6 / fromIntegral n) else ceiling (1e7 / fromIntegral n) out = searchReplace src dst $ concat $ replicate k str putStr "Working with " print n print $ length out putStrLn "Done" and KMP wins for n == 1 and n >= 12, also for " able seasea...", KMP wins for search-patterns of length 1 and patterns with no partial matches (and some others), but generally -- on my thing -- Bulat's version wins. Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Bulat Ziganshin <[EMAIL PROTECTED]>, Haskell-Cafe@haskell.org > KMP is O(m) while straightforward is O(m*n). Where m is the length of the input and n is the length of the searched-for pattern, I think? Yes. But these are worst-case complexities, I believe, ordinarily, straightforward will be O(m), too. Yes, those are worst cases for both algorithms. O(m) for KMP, O(m*n) for straightforward. > My test favors straightforward, in any other case KMP wins by order of > magnitude. Can you give example tests? Any example that has long search pattern say (many a's followed by b ) and searched string has many partial matches (many a's). Particularly, any example which exhibits O(m*n) or close to, case for straightforward search. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Substring replacements
Hello Daniel, Wednesday, December 21, 2005, 5:20:18 PM, you wrote: >> >BM> I've finally performed test on amd64 and result is a same as on intel. >> >BM> KMP always wins. So KMP is best suited for non indexed strings >> >BM> and I guess should be used in library as prefered search/replace >> >method. >> >BM> This test favors straightforward search. >> > >> >i'm 90% sure that straightforward method must be faster for one-time >> >searches. DF> I'm far less sure of that. If you have a really short search-pattern, I think DF> that probably straightforward search is indeed faster sorry, i mean exactly that, answering proposition to add this to library as a PREFFERED method. imho, typical usage of these routines is for short enough patterns and searched strings -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Am Mittwoch, 21. Dezember 2005 08:18 schrieb Branimir Maksimovic: > From: Bulat Ziganshin <[EMAIL PROTECTED]> > > >Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]> > >To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >CC: haskell-cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Tue, 20 Dec 2005 23:55:22 +0300 > > > >Hello Branimir, > > > >Tuesday, December 20, 2005, 9:48:48 PM, you wrote: > > > >BM> I've finally performed test on amd64 and result is a same as on intel. > >BM> KMP always wins. So KMP is best suited for non indexed strings > >BM> and I guess should be used in library as prefered search/replace > >method. > >BM> This test favors straightforward search. > > > >i'm 90% sure that straightforward method must be faster for one-time > >searches. I'm far less sure of that. If you have a really short search-pattern, I think that probably straightforward search is indeed faster (at least, if the search-pattern isn't supplied at compile-time). But if you have a pattern of length 100, say, and lots of relatively long partial matches, chances are that KMP will move on something like 50 chars at once, which would have to be re-checked by straightforward. I've no idea, when that would outweigh the extra time of building the KMP-arrays, though. In my test -- extremely unrealistic, but provides a more or less worst case example -- straightforward must make roughly half a million character comparisons to pass each 'c', while KMP does 2000 (+/-2) (one way through the array and back), so there are situations where KMP definitely is faster. But ordinarily, on my computer, your version of straightforward is 10-15% faster than KMP (search-patterns are now supplied on the command line -- which has no big impact; searched string is " able sea..."; all compiled with -O2; NOINLINE makes no difference -- at least with optimisations on --; without optimisations, KMP suffers far worse than straightforward). > > KMP is O(m) while straightforward is O(m*n). Where m is the length of the input and n is the length of the searched-for pattern, I think? But these are worst-case complexities, I believe, ordinarily, straightforward will be O(m), too. > > your test may give better results with KMP algorithm just > > >because you repeat the same search many times and it was automatically > >"run-time compiled" as described on the wiki page about KMP My results seem to witness against that. > > My test favors straightforward, in any other case KMP wins by order of > magnitude. Can you give example tests? > I think that straightfoirward is better then KMP with optimisations > off is due more complex program. > > >try to add > > > >{-# NOINLINE replace #-} > > > >to both programs and repeat comparision > > Greetings, Bane. > Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin <[EMAIL PROTECTED]> Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 20 Dec 2005 23:55:22 +0300 Hello Branimir, Tuesday, December 20, 2005, 9:48:48 PM, you wrote: BM> I've finally performed test on amd64 and result is a same as on intel. BM> KMP always wins. So KMP is best suited for non indexed strings BM> and I guess should be used in library as prefered search/replace method. BM> This test favors straightforward search. i'm 90% sure that straightforward method must be faster for one-time searches. KMP is O(m) while straightforward is O(m*n). your test may give better results with KMP algorithm just because you repeat the same search many times and it was automatically "run-time compiled" as described on the wiki page about KMP My test favors straightforward, in any other case KMP wins by order of magnitude. I think that straightfoirward is better then KMP with optimisations off is due more complex program. try to add {-# NOINLINE replace #-} to both programs and repeat comparision These are tests: No optimisations (no -O): Intel hyperthreaded,windows $ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m34.766s user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m14.719s user0m0.031s sys 0m0.000s AMD 64 bit: [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real1m58.066s user1m57.939s sys 0m0.128s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m41.565s user0m41.527s sys 0m0.040s with optimisations (-O): Intel hyperthreaded,windows $ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m8.625s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.735s user0m0.015s sys 0m0.000s AMD 64 bit, linux: [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.546s user0m10.529s sys 0m0.016s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.796s user0m11.785s sys 0m0.012s Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Hello Branimir, Tuesday, December 20, 2005, 9:48:48 PM, you wrote: BM> I've finally performed test on amd64 and result is a same as on intel. BM> KMP always wins. So KMP is best suited for non indexed strings BM> and I guess should be used in library as prefered search/replace method. BM> This test favors straightforward search. i'm 90% sure that straightforward method must be faster for one-time searches. your test may give better results with KMP algorithm just because you repeat the same search many times and it was automatically "run-time compiled" as described on the wiki page about KMP try to add {-# NOINLINE replace #-} to both programs and repeat comparision -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Substring replacements
I've finally performed test on amd64 and result is a same as on intel. KMP always wins. So KMP is best suited for non indexed strings and I guess should be used in library as prefered search/replace method. This test favors straightforward search. [EMAIL PROTECTED] myhaskell]$ time ./KMP Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.783s user0m10.769s sys 0m0.016s [EMAIL PROTECTED] myhaskell]$ time ./straightforward Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.769s user0m11.741s sys 0m0.028s [EMAIL PROTECTED] myhaskell]$ uname -a Linux devel64.office.kom 2.6.14-skas3-v8.2 #2 Fri Nov 11 21:19:36 CET 2005 x86_64 x86_64 x86_64 GNU/Linux Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin <[EMAIL PROTECTED]> Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 16 Dec 2005 16:51:32 +0300 Hello Branimir, Friday, December 16, 2005, 5:36:47 AM, you wrote: BM> I've also performed tests on dual Xeon linux box and results are just to let you know - GHC don't uses pentium4 hyperthreading, multiple cpus or multiple cores in these tests only way to make ghc using multiple processors is to use 6.5 beta version, compile with "-smp" and explicitly fork several threads You are right. I've double checked on linux there is just one thread executing and there is not such a big difference between KMP and straightforward search. Just about 10% KMP is faster with my test, but still faster. I've checked both SMP and non SMP linux (Intel). Hyperthreading effect is on windows only, I guess, as there are visible three threads per test process. I have one amd64 near (I'll check that one too, as soon as admin sets up account for me on that machine). Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: Re[2]: [Haskell-cafe] Substring replacements
From: Bulat Ziganshin <[EMAIL PROTECTED]> Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org Subject: Re[2]: [Haskell-cafe] Substring replacements Date: Fri, 16 Dec 2005 16:51:32 +0300 Hello Branimir, Friday, December 16, 2005, 5:36:47 AM, you wrote: BM> I've also performed tests on dual Xeon linux box and results are just to let you know - GHC don't uses pentium4 hyperthreading, multiple cpus or multiple cores in these tests Oh yes it does. I clearly see multiple threads that take unequal percentage on each virtual CPU. I guess that other thread is garbage collector thread. In case of hyperthreading, speed is gained by reduced memory latency by 30-60 % only way to make ghc using multiple processors is to use 6.5 beta version, compile with "-smp" and explicitly fork several threads This is not the case as I see. On windows search replace test programs spawn 3 threads and on linux I'm not sure, but I've checked program that calls Haskell from C++ and GHC spawns additional thread, which is not my thread, that also performs something constantly, and I didn't spawn any thread from Haskell. So hyperthreading helps as it helps to optimise when several threads accesses memory as is in this test case. I can't see any other explanation why KMP search is slower on AMD 20% , but faster on Intel 30%, then straightforward search with my test. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Substring replacements
Hello Branimir, Friday, December 16, 2005, 5:36:47 AM, you wrote: BM> I've also performed tests on dual Xeon linux box and results are just to let you know - GHC don't uses pentium4 hyperthreading, multiple cpus or multiple cores in these tests only way to make ghc using multiple processors is to use 6.5 beta version, compile with "-smp" and explicitly fork several threads -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Am Freitag, 16. Dezember 2005 03:36 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > >Any improvements are welcome, certainly some of you can do much better. > > It is fast on my machine except that you are using Map to lookup > for badChar which is O(log n). > I;ve placed this instead: > badChar :: UArray Int Int > badChar = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0) > proc [] _ = [] > proc (s:st) i = (ord s,i):proc st (i+1) > getBc c = badChar ! ord c > > which gaved it significant boost, O(1) lookup. Yes, but Char has 1114112 values, and I'm not sure whether such a large array would still be better, especially since, presumably, the Map will usually not be deeper than five layers, say. But if we restrict ourselves to extended ASCII Strings, an array certainly is better. And maybe, instead of using two arrays, bmGs0 and bmGs, a mutable array (those are DiffArrays, I think -- I'll check that out) would also improve it. > Now it's faster then brute force method but 10% slower then KMP > with my test. > I've also performed tests on dual Xeon linux box and results are > proportionally > the same as on my intel windows box. > KMP wins again 10% better then BM and 20-30% better then straightforward > search, > which means that KMP is well suited for non indexed strings. > > >Cheers, > >Daniel > > > >P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is > >somewhat > >fussy. > > Yes, BM is for indexed structures. > > Greetings, Bane. > Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 21:07:11 +0100 Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie: > From: "Branimir Maksimovic" <[EMAIL PROTECTED]> > > >To: [EMAIL PROTECTED] > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Thu, 15 Dec 2005 00:55:02 + > > > >>From: Daniel Fischer <[EMAIL PROTECTED]> > >>To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >>CC: Haskell-Cafe@haskell.org > >>Subject: Re: [Haskell-cafe] Substring replacements > >>Date: Wed, 14 Dec 2005 20:40:06 +0100 > >> > >>Hi Bane, > >> > >>nice algorithm. Since comparing chars _is_ cheap, it is to be expected > >>that > >>all the hash-rotating is far more costly for short search patterns. The > >>longer the pattern, the better this gets, I think -- though nowhere near > >>KMP > >>(or would it?). > > > >Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore > >algorithm yet, though. But I think it would be > >difficult to implement it in Haskell efficiently as it searches backwards > >and jumps around, and we want memory savings. > >Though, I even didn't tried yet, but it is certainly very interesting. > > Forget what I've said. > Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes > forward, but when it finds last character in pattern, than starts to search > backwards. > This can be implemented easilly as Haskell lists naturaly reverse order > when putting from one list to other. > Heh, never say never :) > As I see from documents Boyer-Moore has best performance on average > and should be better than KMP. > > Greetings,Bane. > Well, I also thought that all the jumping around in Boyer-Moore wasn't too good (after each shift we must bite off a chunk from the remaining input, pushing that onto the stack, which costs something). But I gave it a try today and here's what I came up with: import Data.List (tails) import Data.Map (Map) import qualified Data.Map as Map import Data.Array.Unboxed searchRep :: String -> String -> String -> String searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str where len = length src len1 = len-1 pat :: UArray Int Char pat = listArray (0,len1) src ch = pat!len1 badChar :: Map Char Int badChar = Map.fromList $ zip src [0 .. ] getBc c = case Map.lookup c badChar of Just n -> n Nothing -> -1 suffs :: UArray Int Int suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs where crs = reverse src pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys pr n _ _ = n bmGs0 :: UArray Int Int bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j <- [k' .. k-1]] help = [k | k <- [0 .. len], k == len || suffs!k == len-k] bmGs :: UArray Int Int bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]] run by "" = reverse by run by (c:cs) | c == ch = process (c:by) cs | otherwise = run (c:by) cs roll n xs ys | n <= 0 = (xs, ys) roll n xs (y:ys) = roll (n-1) (y:xs) ys roll _ xs "" = (xs, "") walk n "" = (n,"") walk n st@(c:cs) | n < 0 = (n,st) | c == pat!n = walk (n-1) cs | otherwise = (n,st) process con left | i < 0 = reverse pass ++ rp ++ run "" left | otherwise = {- bye ++ -} run ncon nleft where (i,pass) = walk len1 con d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass)) -- bye = reverse $! drop (len-d) con (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left it's not as fast as KMP for the tests, but not too bad. Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a match (if any), we'd be better off relieving our memory with 'bye', I think. Any improvements are welcome, certainly some of you can do much better. It is fast on my machine except that you are using Map to lookup for badChar which is O(log n). I;ve placed this instead: badChar :: UArray Int Int badChar = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0) proc [] _ = [] proc (s:st) i = (ord s,i):proc st (i+1) getBc c = badChar ! ord c which gaved it signi
Re: [Haskell-cafe] Substring replacements
Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie: > From: "Branimir Maksimovic" <[EMAIL PROTECTED]> > > >To: [EMAIL PROTECTED] > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Thu, 15 Dec 2005 00:55:02 + > > > >>From: Daniel Fischer <[EMAIL PROTECTED]> > >>To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >>CC: Haskell-Cafe@haskell.org > >>Subject: Re: [Haskell-cafe] Substring replacements > >>Date: Wed, 14 Dec 2005 20:40:06 +0100 > >> > >>Hi Bane, > >> > >>nice algorithm. Since comparing chars _is_ cheap, it is to be expected > >>that > >>all the hash-rotating is far more costly for short search patterns. The > >>longer the pattern, the better this gets, I think -- though nowhere near > >>KMP > >>(or would it?). > > > >Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore > >algorithm yet, though. But I think it would be > >difficult to implement it in Haskell efficiently as it searches backwards > >and jumps around, and we want memory savings. > >Though, I even didn't tried yet, but it is certainly very interesting. > > Forget what I've said. > Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes > forward, but when it finds last character in pattern, than starts to search > backwards. > This can be implemented easilly as Haskell lists naturaly reverse order > when putting from one list to other. > Heh, never say never :) > As I see from documents Boyer-Moore has best performance on average > and should be better than KMP. > > Greetings,Bane. > Well, I also thought that all the jumping around in Boyer-Moore wasn't too good (after each shift we must bite off a chunk from the remaining input, pushing that onto the stack, which costs something). But I gave it a try today and here's what I came up with: import Data.List (tails) import Data.Map (Map) import qualified Data.Map as Map import Data.Array.Unboxed searchRep :: String -> String -> String -> String searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str where len = length src len1 = len-1 pat :: UArray Int Char pat = listArray (0,len1) src ch = pat!len1 badChar :: Map Char Int badChar = Map.fromList $ zip src [0 .. ] getBc c = case Map.lookup c badChar of Just n -> n Nothing -> -1 suffs :: UArray Int Int suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs where crs = reverse src pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys pr n _ _ = n bmGs0 :: UArray Int Int bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j <- [k' .. k-1]] help = [k | k <- [0 .. len], k == len || suffs!k == len-k] bmGs :: UArray Int Int bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]] run by "" = reverse by run by (c:cs) | c == ch = process (c:by) cs | otherwise = run (c:by) cs roll n xs ys | n <= 0 = (xs, ys) roll n xs (y:ys) = roll (n-1) (y:xs) ys roll _ xs "" = (xs, "") walk n "" = (n,"") walk n st@(c:cs) | n < 0 = (n,st) | c == pat!n = walk (n-1) cs | otherwise = (n,st) process con left | i < 0 = reverse pass ++ rp ++ run "" left | otherwise = {- bye ++ -} run ncon nleft where (i,pass) = walk len1 con d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass)) -- bye = reverse $! drop (len-d) con (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left it's not as fast as KMP for the tests, but not too bad. Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a match (if any), we'd be better off relieving our memory with 'bye', I think. Any improvements are welcome, certainly some of you can do much better. Cheers, Daniel P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat fussy. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
This is what I got for BM. Performance dissapoints as BM is really suited for indexed strings like arrays.It mainly operates on indexes. This is simple BM, as I didn't want to go for more complex variant,becauses takes and drops and recalculation of next position is too pricey for non indexed structure. So, clear winner is KMP for non indexed strings. There is also finite automaton algorithm but this works well if search strings are precompiled, so I'll implement it only for education purposes. I hope my Haskell improves as I've learned how to reduce number of paramaters. searchReplaceBM :: String -> String -> String -> String searchReplaceBM "" _ str = str searchReplaceBM sr rp str = searchReplace str where table :: UArray Int Int table = array (0,255) ([(i,0) | i <- [0..255]] ++ proc sr 1) proc [] _ = [] proc (s:st) i = (ord s,i):proc st (i+1) len = length sr rsrch = reverse sr searchReplace str | null remaining = if found then rp else passed |found = rp ++ searchReplace remaining | otherwise = passed ++ searchReplace remaining where (passed,remaining,found) = searchReplace' str searchReplace' str = if j == 0 then ("",drop len str,True) else failed where failed = case drop (j-1) str of [] -> (str,"",False) (c:_) -> (take sk str, drop sk str, False) where md = j - table ! ord c sk = if md > 0 then md else 1 j = srch rsrch (reverse $ take len str) len where srch "" "" _ = 0 srch _ "" l = l srch (s:str) (s':str') l | s == s' = srch str str' (l-1) | otherwise = l Greetings, Bane. From: "Branimir Maksimovic" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 01:39:57 + From: "Branimir Maksimovic" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 + From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP. Greetings,Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: [EMAIL PROTECTED] To: haskell-cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:25:19 -0500 G'day all. Quoting Branimir Maksimovic <[EMAIL PROTECTED]>: > After seeing that your program is fastest (I've also tried one from > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not > that good in converting to search replace?) You probably did it right, but you could post your version to the list if you want me to take a look. Oh, here it is but just don;t laugh :) I've hacked with unsafePerformIO as I din't know how to remove IO from match any other way. searchReplaceKMP :: String->String->String -> String searchReplaceKMP sr rp s | not (null remaining) = found++rp ++ searchReplaceKMP sr rp remaining | otherwise = found where (found,remaining) = unsafePerformIO $ matchKMP sr s matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a])) matchKMP [] = error "Can't match empty list" matchKMP xs = matchfunc [] where matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs)) dofail = \ps xs -> case xs of [] -> fail "can't match" (y:ys) -> matchfunc (y:ps) ys type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a]) makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)] -> PartialMatchFunc m a makeMatchFunc prev [] = \ps xs -> return (reverse (drop ((length prev)-1) ps), xs) makeMatchFunc prev ((x,failstate):ms) = thisf where mf = makeMatchFunc (thisf:prev) ms failcont = prev !! (length prev - failstate - 1) thisf = \ps xs -> case xs of [] -> fail "can't match" (y:ys) -> if (x == y) then mf (y:ps) ys else failcont ps xs overlap :: (Eq a) => [a] -> [Int] overlap str = overlap' [0] str where overlap' prev [] = reverse prev overlap' prev (x:xs) = let get_o o | o <= 1 || str !! (o-2) == x = o | otherwise = get_o (1 + prev !! (length prev - o + 1)) in overlap' (get_o (head prev + 1):prev) xs -- These are timings (it's performance is about the same as Rabin-Karp): $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie searchr.exe: user error (can't match) real0m22.187s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 searchr.hs --make -o searchr.exe Chasing modules from: searchr.hs Compiling Main ( searchr.hs, searchr.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long True Done real0m8.110s user0m0.031s sys 0m0.016s When I wrote the RunTimeCompilation code, it wasn't intended to be a shining example of efficiency, merely an illustration. Remember that it's doing TWO things: compiling the pattern to code, and then performing the search. The compilation phase is likely to be much slower than the search, so the speedup (if any!) would only be realised the SECOND time that you searched a string using the same pattern. (Assuming you re-used the compiled match code, of course!) Oh, that explaines it. Actually this has to be converted to searchReplace in order to be fast, but I don;t know how (yet) as your program is pretty complicated to my humble Haskell skills. I think that your technique can be usefull with Aho-Corasick algorithm as it first constructs finite automaton from tree, then performs search. So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run time compilation, but this is too advanced for me for now. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
G'day all. Quoting Branimir Maksimovic <[EMAIL PROTECTED]>: > After seeing that your program is fastest (I've also tried one from > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not > that good in converting to search replace?) You probably did it right, but you could post your version to the list if you want me to take a look. When I wrote the RunTimeCompilation code, it wasn't intended to be a shining example of efficiency, merely an illustration. Remember that it's doing TWO things: compiling the pattern to code, and then performing the search. The compilation phase is likely to be much slower than the search, so the speedup (if any!) would only be realised the SECOND time that you searched a string using the same pattern. (Assuming you re-used the compiled match code, of course!) Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: "Branimir Maksimovic" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Thu, 15 Dec 2005 00:55:02 + From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. Forget what I've said. Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes forward, but when it finds last character in pattern, than starts to search backwards. This can be implemented easilly as Haskell lists naturaly reverse order when putting from one list to other. Heh, never say never :) As I see from documents Boyer-Moore has best performance on average and should be better than KMP. Greetings,Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 20:40:06 +0100 Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore algorithm yet, though. But I think it would be difficult to implement it in Haskell efficiently as it searches backwards and jumps around, and we want memory savings. Though, I even didn't tried yet, but it is certainly very interesting. However, I don't see how to (efficiently) do a multiple pattern search with KMP, so there -- if all patterns have the same length, otherwise I don't see -- Rabin-Karp would probably be the method of choice. Yes, this algorithm can search in parallel patterns of same length. Different search patterns have to be searched same way as with KMP. I tuned it up somewhat: import Data.List (isPrefixOf) import Data.Char (ord) -- using ord instead of fromEnum oddly makes it -- faster for my test, but slower for yours, but only a whiff. Wow, on my machine your version of Rabin-Karp gives 30% boost to my test. This helps me learn Haskell , too . Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Wed, 14 Dec 2005 17:10:20 +0100 > I think that's because on your machine Bulat's version have better > perfromance > with CPU cache. > I don;t know but now your version is 25% faster with my test on P4 > hyperthreaded. E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with computers. I think that i've figure it now. Hyperthreading is hardware CPU feature that single CPU core can speed up execution of two running threads. For example if one thread uses integer unit and other FP unit CPU executes that in parallel. But that's not important or significant. What is interestenting is memory latency. If one thread peeks and pokes around memory for , say 1 unit of time, with usual CPU two thread will execute 2 units of time. Hyperthreaded (I'm talking about intel implementation) CPU will execute that in 1.4 points of time giving 60% boost in terms of speed. I've tested some assembler and C program that launches two threads each roaming over memory to anulate impact of cache. What is noticable is that two threads have 60% less memory latency constantly then single thread. That means if single thread for each out of cache memory access waits 300-400 CPU cycles, two threads wait 60% less. Now what has that to do with our programs as they are single threaded? I think it's garbage collection. Our programs run with garbage collector in background and you feel that burden by 20% as your program probably pushes garbage collector to work more than Bulat's version. On hyperthreaded CPU impact of garbage collection is reduced by a factor of 30-60 % resulting in your program being 30% faster on my machine. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Hi Bane, nice algorithm. Since comparing chars _is_ cheap, it is to be expected that all the hash-rotating is far more costly for short search patterns. The longer the pattern, the better this gets, I think -- though nowhere near KMP (or would it?). However, I don't see how to (efficiently) do a multiple pattern search with KMP, so there -- if all patterns have the same length, otherwise I don't see -- Rabin-Karp would probably be the method of choice. Am Mittwoch, 14. Dezember 2005 10:16 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > > >To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Tue, 13 Dec 2005 11:23:29 +0100 > > After seeing that your program is fastest (I've also tried one from > http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not > that good in converting to search replace?) I've decided to > try with Rabin-Karp algorithm. > This algorithm performs same operation as straightforward search, > but compares hashes instead of chars. > With ability to rotate hash (remove first, add next) characters > there is also optimisation, that hash is calculated only for single > next character rather again for whole substring. > Unfortunatelly on my machine it is very cheap to compare > characters so with my test hashing overweights character compare, > except in your test when hash searching is faster then straightforward > search. > > This is best I can write in terms of performance and readability. > I've tried with getFst that returns Maybe but it was slower so I decided > to return '\0' in case that argument is empty list, which renders '\0' > unusable, but then I really doubt that 0 will be used in strings. > > -- Rabin-Karp string search algorithm, it is very effective in searching of > set > -- of patterns of length n on same string > -- this program is for single pattern search, but can be crafted > -- for multiple patterns of length m > I tuned it up somewhat: import Data.List (isPrefixOf) import Data.Char (ord) -- using ord instead of fromEnum oddly makes it -- faster for my test, but slower for yours, but only a whiff. searchrep :: String -> String -> String -> String searchrep "" _ str = str-- or cycle rp, or error? searchrep sr rp xs = hSearchRep xs -- don't carry more around than necessary where len = length sr -- we don't want that to be recomputed hsrch = hash sr -- neither that hSearchRep "" = "" hSearchRep xs | null remaining = passed | otherwise = passed ++ rp ++ hSearchRep (drop len remaining) where (passed,remaining) = hSearch xs -- ' xs (hash $ take len xs) "" hSearch xs = hSearch' xs hcmp "" -- since hSearch will be optimised where -- away anyway, we might hcmp = hash $ take len xs -- as well eliminate it ourselves hSearch' "" _ got = (reverse got, "") hSearch' xxs@(x:xs) hcd got | hcd == hsrch && (sr `isPrefixOf` xxs) = (reverse got, xxs) | otherwise = searchAgain -- one test less where searchAgain = case drop len xxs of [] -> (reverse got ++ xxs, "") -- then we know we're done (y:_) -> hSearch' xs (hashRotate x y hcd) (x:got) -- no need for fancy getFst anymore -- making hashRotate local eliminates one argument, makes it faster hashRotate :: Char -> Char -> Int -> Int hashRotate cout cin hsh = 101*(hsh - 101^(len-1)*ord cout) + ord cin -- using foldl for hash is an enormous boost hash :: String -> Int hash = foldl ((. ord) . (+) . (*101)) 0 -- hash str = foldl (\n c -> 101*n+ord c) 0 str -- this is equally fast as the point-free version, easier to read, probably, -- but I like an occasional pointless pointfreeness. Now this beats everything but KMP on my test very clearly. [EMAIL PROTECTED]:~/Documents/haskell/Allotria/Search> time myhash; time myhash2 Working: seasearch replace able seaseaseasearch baker ssseasearch charlie True Done real0m50.401s user0m49.990s sys 0m0.060s Working very long True Done real0m15.747s user0m15.630s sys 0m0.020s Still poor on your test, though. Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Hi, Bane and all, Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie: > > > In real world situation your KMP will always be fastest on average. > > > I like that we are not using C arrays as then we have advantage > > > of lazyness and save on memory usage. C++ program will be faster > > > on shorter strings but on this large strings will loose due memory > > > latency. and with your test, both programs are very fast. Yesterday, I did the unspeakable -- I wrote a C-version. Smashes Haskell-performance for short enough Strings (factor 10 for my test, factor 2.2 for Bane's), but once it starts swapping, we catch up, and for really large Strings I dare say we'd win far out. I also managed to get my KMP still faster, using take and drop instead of splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was what slowed it down, I reintroduced searchr''' and both are equal). I'm not quite sure, whether that indeed helps, but I also chose to use listArray for the boxed array of borders. Now it's searchReplace :: String -> String -> String -> String searchReplace "" _ str = str searchReplace src@(c:cs) dst str = process 0 str "" where len = length src pat :: UArray Int Char pat = listArray (0,len-1) src bord = A.listArray (0,len) $ (-1):0: [getBord (pat!i) i + 1 | i <- [1 .. len-1]] getBord s n | m < 0 = m | s == pat!m = m | otherwise = getBord s m where m = bord A.! n bor :: UArray Int Int bor = listArray (0,len) $ A.elems bord getBor s n | m < 0 || s == pat!m = m | otherwise = getBor s m where m = bor!n process n str _ | n >= len = dst ++ process 0 str "" process _ "" mat = reverse mat process 0 (s:st) _ | s == c= process 1 st [s] | otherwise = s:process 0 st "" process n str@(s:st) mat | s == pat!n = process (n+1) st (s:mat) | otherwise = case getBor s n of -1 -> reverse mat ++ process 0 str "" 0 -> reverse mat ++ process 1 st [s] j -> reverse (drop j mat) ++ process (j+1) st (s:take j mat) gives a speedup of roughly 10% on my box versus yesterday's version. > > > > > > Greetings, Bane. > > > >On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about > >20% > >faster than my KMP on your test -- btw, I unboxed the pat array, which > > gave a > >bit of extra speed, but not much. > > I think that's because on your machine Bulat's version have better > perfromance > with CPU cache. > I don;t know but now your version is 25% faster with my test on P4 > hyperthreaded. E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with computers. > > your new version: > $ time srchrep.exe > Working:seasearch replace able seaseasearch baker seasearch charlie > True > Done > > > real0m8.734s > user0m0.015s > sys 0m0.000s > > Bulat's version: > > [EMAIL PROTECTED] ~/tutorial > $ time replace1.exe > Working:seasearch replace able seaseasearch baker seasearch charlie > True > Done > > > real0m11.734s > user0m0.015s > sys 0m0.015s > > 3 secs difference now. > > >And apologies to Sebastian Sylvan, I also included an unboxed version of > >bord, > >built from the boxed version, and that sped things further up -- not much, > >again, but there it is. > > On my machine you got another 10-15% of boost with unboxed arrays. > > >I wonder about this difference, -10% on one system and +20% on another > >system, > >ist that normal? > > Different caching schemes on CPU's perhaps? different memory latencies? > hyperthreading helps your version? more code and data, perhaps because > of that it pays the price on your machine? > > Greetings, Bane. > Well, whatever. Upto now, on my box, Bulat's is still the fastest for your test -- though I've narrowed the gap quite a bit. Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 13 Dec 2005 11:23:29 +0100 After seeing that your program is fastest (I've also tried one from http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not that good in converting to search replace?) I've decided to try with Rabin-Karp algorithm. This algorithm performs same operation as straightforward search, but compares hashes instead of chars. With ability to rotate hash (remove first, add next) characters there is also optimisation, that hash is calculated only for single next character rather again for whole substring. Unfortunatelly on my machine it is very cheap to compare characters so with my test hashing overweights character compare, except in your test when hash searching is faster then straightforward search. This is best I can write in terms of performance and readability. I've tried with getFst that returns Maybe but it was slower so I decided to return '\0' in case that argument is empty list, which renders '\0' unusable, but then I really doubt that 0 will be used in strings. -- Rabin-Karp string search algorithm, it is very effective in searching of set -- of patterns of length n on same string -- this program is for single pattern search, but can be crafted -- for multiple patterns of length m hSearchReplace :: String -> String -> String -> String hSearchReplace sr rp xs | not (null remaining) = found ++ rp ++ hSearchReplace sr rp (drop (length sr) remaining) | otherwise = found where (found,remaining) = hSearch sr xs hSearch :: String -> String -> (String,String) hSearch sr xs = hSearch' sr xs hcmp "" where hsrch = hash sr hcmp = hash $ take ls xs cmp = take ls xs ls = length sr hSearch' [] xs _ _= (xs,[]) hSearch' sr [] _ fndFail = (reverse fndFail,[]) hSearch' srch xxs@(x:xs) hcmps fndFail = if hsrch == hcmps then if isPrefixOf srch xxs then (reverse fndFail,xxs) else searchAgain else searchAgain where searchAgain = hSearch' srch xs (hashRotate (getFst xxs) (getFst nextxxs) (ls-1) hcmps) (x:fndFail) nextxxs = drop ls xxs getFst :: String -> Char getFst [] = '\0' getFst (a:as) = a hash :: String -> Int hash str = hash' str (length str - 1) where hash' :: String -> Int -> Int hash' [] _ = 0 hash' (s:str) pow = (101 ^ pow) *(fromEnum s) + hash' str (pow-1) hashRotate :: Char -> Char -> Int -> Int -> Int hashRotate cout cin pow hsh = (hsh - ((101 ^ pow) * (fromEnum cout)))*101 + (fromEnum cin) Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Tue, 13 Dec 2005 11:23:29 +0100 Am Montag, 12. Dezember 2005 16:28 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > > >To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Mon, 12 Dec 2005 16:15:46 +0100 > > > >Earlier today: > > > Sorry, but > > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" > > > "abababaaba" > > > > > > I haven't analyzed the algorithm, so I don't know why exactly this > > > >fails. > > > > > I'll take a look sometime soon. > > > >I found the problem (one at least). > >Say the pattern to be replaced begins with 'a' and we have a sufficiently > >long > >match with the pattern starting at the first 'a' in the String. Upon > >encountering the second 'a', while the first pattern still matches, you > >start > >pushing onto the rollback-stack. But that isn't inspected anymore, so if > >the > >actual occurence of the pattern starts at the third (or fourth, n-th) > >occurence of 'a' and that is already pushed onto the rollback, you miss > > it. > > I've corrected this with adjusting rollback position. if rollBack is null > then > search for rollback starts at second character if not starts at same as > searhed > character because I skip what was searched. That's all. > Though I'm not so sure now when I read this. > Still not working: *New> searchReplace "abababc" "#" "ababababababc" "ababababababc" *New> searchReplace1 "abababc" "#" "ababababababc" "ababababababc" Yes, perhaps you've missed another post of mine. I've noticed that problem when pattern repeats more then 2 times and gave up because now whatever I do, your version is always fastest. > >So the question is, can we find a cheap test to decide whether to use KMP > >or > >Bulat's version? Just interleave string with search hits with one with no seacrh (that means partial too) hits, and your version will gain in speed. More partial matches and full search matches Bulat's version will gain in speed. Longer search strings, your version will have gains. > > In real world situation your KMP will always be fastest on average. > I like that we are not using C arrays as then we have advantage > of lazyness and save on memory usage. C++ program will be faster > on shorter strings but on this large strings will loose due memory > latency. and with your test, both programs are very fast. > > Greetings, Bane. > On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% faster than my KMP on your test -- btw, I unboxed the pat array, which gave a bit of extra speed, but not much. I think that's because on your machine Bulat's version have better perfromance with CPU cache. I don;t know but now your version is 25% faster with my test on P4 hyperthreaded. your new version: $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m8.734s user0m0.015s sys 0m0.000s Bulat's version: [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.734s user0m0.015s sys 0m0.015s 3 secs difference now. And apologies to Sebastian Sylvan, I also included an unboxed version of bord, built from the boxed version, and that sped things further up -- not much, again, but there it is. On my machine you got another 10-15% of boost with unboxed arrays. I wonder about this difference, -10% on one system and +20% on another system, ist that normal? Different caching schemes on CPU's perhaps? different memory latencies? hyperthreading helps your version? more code and data, perhaps because of that it pays the price on your machine? Greetings, Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Am Montag, 12. Dezember 2005 16:28 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > > >To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >CC: Haskell-Cafe@haskell.org > >Subject: Re: [Haskell-cafe] Substring replacements > >Date: Mon, 12 Dec 2005 16:15:46 +0100 > > > >Earlier today: > > > Sorry, but > > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" > > > "abababaaba" > > > > > > I haven't analyzed the algorithm, so I don't know why exactly this > > > >fails. > > > > > I'll take a look sometime soon. > > > >I found the problem (one at least). > >Say the pattern to be replaced begins with 'a' and we have a sufficiently > >long > >match with the pattern starting at the first 'a' in the String. Upon > >encountering the second 'a', while the first pattern still matches, you > >start > >pushing onto the rollback-stack. But that isn't inspected anymore, so if > >the > >actual occurence of the pattern starts at the third (or fourth, n-th) > >occurence of 'a' and that is already pushed onto the rollback, you miss > > it. > > I've corrected this with adjusting rollback position. if rollBack is null > then > search for rollback starts at second character if not starts at same as > searhed > character because I skip what was searched. That's all. > Though I'm not so sure now when I read this. > Still not working: *New> searchReplace "abababc" "#" "ababababababc" "ababababababc" *New> searchReplace1 "abababc" "#" "ababababababc" "ababababababc" > >So the question is, can we find a cheap test to decide whether to use KMP > >or > >Bulat's version? > > In real world situation your KMP will always be fastest on average. > I like that we are not using C arrays as then we have advantage > of lazyness and save on memory usage. C++ program will be faster > on shorter strings but on this large strings will loose due memory > latency. and with your test, both programs are very fast. > > Greetings, Bane. > On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% faster than my KMP on your test -- btw, I unboxed the pat array, which gave a bit of extra speed, but not much. And apologies to Sebastian Sylvan, I also included an unboxed version of bord, built from the boxed version, and that sped things further up -- not much, again, but there it is. I wonder about this difference, -10% on one system and +20% on another system, ist that normal? Cheers, Daniel -- Up-To-Date version of KMP: import Data.Array.Unboxed (UArray, listArray, (!)) import qualified Data.Array as A (array, (!), elems) searchReplace :: String -> String -> String -> String searchReplace "" _ str = str searchReplace src@(c:cs) dst str = process 0 str "" where len = {-# scc "len" #-} length src pat :: UArray Int Char pat = {-# scc "pat" #-} listArray (0,len-1) src bord ={-# scc "bord" #-} A.array (0,len) $ (0,-1):(1,0): [(i+1,getBord (pat!i) i + 1) | i <- [1 .. len-1]] getBord s n | m < 0 = m | s == pat!m = m | otherwise = getBord s m where m = bord A.! n bor :: UArray Int Int bor = listArray (0,len) $ A.elems bord getBor s n | m < 0 || s == pat!m = m | otherwise = getBor s m where m = bor!n process n str _ | n >= len = {-# scc "process" #-} dst ++ process 0 str "" process _ "" mat = {-# scc "process" #-} reverse mat process 0 (s:st) _ | s == c= {-# scc "process" #-} process 1 st [s] | otherwise = {-# scc "process" #-} s:process 0 st "" process n str@(s:st) mat | s == pat!n = {-# scc "process" #-} process (n+1) st (s:mat) | otherwise = {-# scc "process" #-} let j = getBor s n (ret,skip) = splitAt j mat in if j < 0 then reverse mat ++ process 0 str "" else reverse skip ++ process (j+1) st (s:ret) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today: > Sorry, but > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" > "abababaaba" > > I haven't analyzed the algorithm, so I don't know why exactly this fails. > I'll take a look sometime soon. > I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. let src = concat (replicate n "abc") ++ "d" let str = concat (replicate (n+k) "abc") ++ "d" then searchReplace src "Success!" str will work correctly iff k is congruent to 0 or 1 modulo (n+1). Oh, yes this seems the problem for searchr :( I have to look for efficient way in order to circumvent repeated searches. But since your KMP is fastest of all now, I am considering if there is any point now to correct this. And searchr ugly version that I've posted has a bug (not present in MyBane pretty version) . should be : else if sr'/=x Greetings, Bane. _ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 16:15:46 +0100 Earlier today: > Sorry, but > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" > "abababaaba" > > I haven't analyzed the algorithm, so I don't know why exactly this fails. > I'll take a look sometime soon. > I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. I've corrected this with adjusting rollback position. if rollBack is null then search for rollback starts at second character if not starts at same as searhed character because I skip what was searched. That's all. Though I'm not so sure now when I read this. So the question is, can we find a cheap test to decide whether to use KMP or Bulat's version? In real world situation your KMP will always be fastest on average. I like that we are not using C arrays as then we have advantage of lazyness and save on memory usage. C++ program will be faster on shorter strings but on this large strings will loose due memory latency. and with your test, both programs are very fast. Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: "Branimir Maksimovic" <[EMAIL PROTECTED]> CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 13:07:29 +0100 Sorry, but Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" "abababaaba" I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. It failed because I didn;t adjusted search string for rollBack when previous rollBack is not null. this is corrected version: (with your changes it looks much better) --- searchReplace :: String->String->String -> String searchReplace "" _ xs = xs searchReplace sr rp xs = searchr sr rp xs "" "" where searchr :: String->String->String->String->String -> String searchr _ _ "" _ _ = "" searchr sr rp xs retB rollB | found = rp ++ searchr sr rp rema ret roll | otherwise = reverse (proc ++ rollB) ++ searchr sr rp rema ret roll where (found, proc, rema, ret, roll) = searchr' sr sr (reverse retB ++ xs) "" rollB searchr' src@(s:sr) src'@(s':sr') xs soFar rollB = searchr'' (drop (length rollB) src) src' xs soFar (not (null rollB),"","") s searchr'' "" _ xs fnd _ _ = (True,fnd,xs,"","") searchr'' _ _ "" fnd (_,ret,roll) _ = (False,ret++roll++fnd,"","","") searchr'' src@(s:sr) src'@(s':sr') xxs@(x:xs) soFar (cnt,ret,roll) c | s == x = if s' == x && null ret && cnt then searchr'' sr sr' xs soFar (True, "", x:roll) c else if null ret && null roll then searchr'' sr src' xs (x:soFar) (True, "", "") c else searchr'' sr src' xs soFar (True, x:roll++ret, "") c | otherwise = if null roll && null ret then if c == x then (False, soFar, xxs, "", "") else let (from, pre) = break (==c) xs in (False, reverse from ++ x:soFar, pre, "", "") else if s'/=x then if null ret then (False, (x:roll) ++ soFar, xs,"","") else (False, soFar, xxs,ret,"") else if null ret then (False, soFar, xs, "", x:roll) else (False, soFar, xxs, ret, "") However it is significantly slower then previous ugly version: searchReplace :: String->String->String -> String searchReplace sr rp xs = searchr sr rp xs "" "" where searchr :: String->String->String->String->String -> String searchr [] _ xs _ _ = xs searchr _ _ [] _ _ = [] searchr sr rp xs retBack rollBack | isFound $ fnd rollBack = rp ++ searchr sr rp (remaining $ fnd rollBack ) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) | otherwise = reverse ((processed $ fnd rollBack) ++ rollBack) ++ searchr sr rp (remaining $ fnd rollBack) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) where fnd = searchr' sr sr (reverse retBack ++ xs) "" isFound = fst . fst remaining = snd . snd . fst getRollBack = snd . snd getRetBack = fst . snd processed = fst . snd . fst searchr' :: String->String->String->String->String -> ((Bool,(String,String)),(String,String)) searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack = searchr'' (drop (length rollBack) srch) srch' xs fndSoFar (not (isEmpty rollBack),"","") sr searchr'' :: String->String->String->String->(Bool,String,String)->Char -> ((Bool,
Re: [Haskell-cafe] Substring replacements
Earlier today: > Sorry, but > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" > "abababaaba" > > I haven't analyzed the algorithm, so I don't know why exactly this fails. > I'll take a look sometime soon. > I found the problem (one at least). Say the pattern to be replaced begins with 'a' and we have a sufficiently long match with the pattern starting at the first 'a' in the String. Upon encountering the second 'a', while the first pattern still matches, you start pushing onto the rollback-stack. But that isn't inspected anymore, so if the actual occurence of the pattern starts at the third (or fourth, n-th) occurence of 'a' and that is already pushed onto the rollback, you miss it. let src = concat (replicate n "abc") ++ "d" let str = concat (replicate (n+k) "abc") ++ "d" then searchReplace src "Success!" str will work correctly iff k is congruent to 0 or 1 modulo (n+1). Now to fix it, I see two possibilities 1. re-inspect the rollback, which brings you basically to my srchrep 2. introduce a hierarchy of rollback-stacks - but that would be rather horrible, I think, because you must keep count on which stack you have to push, how many matching patterns you currently have (and which ones they are) ... So the question is, can we find a cheap test to decide whether to use KMP or Bulat's version? Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] CC: Haskell-Cafe@haskell.org Subject: Re: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 10:31:49 +0100 Am Montag, 12. Dezember 2005 01:34 schrieben Sie: > On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote: > > Okay, I have looked up KMP and implemented it. > > Seems to work -- my first use of QuickCheck, too. > > It's slower than Bulat's and Tomasz' for Branimir's test :-(, > > but really fast for my test. > > Undoubtedly, one can still tune it. > > Perhaps by using unboxed arrays... > > /S > > > -- > Sebastian Sylvan > +46(0)736-818655 > UIN: 44640862 I'm afraid, unboxed arrays are out of the question, because bord is incrementally produced :-( Working very long test2: <> No worrie your test is now fastest with both your and mine test. I;ve forgot to change working function in your test:0) mine test: your program is is srchrep.exe [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m14.344s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.672s > your program is almost 1.5 secs faster then Bulat's user0m0.015s sys 0m0.000s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m12.016s user0m0.015s sys 0m0.015s now your test: [EMAIL PROTECTED] ~/tutorial $ time searchr.exe Working very long True Done real0m0.312s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working very long False Done real0m12.516s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working very long True Done real0m0.375s > yours is less then second as mine but is fastest in both tests user0m0.015s sys 0m0.015s I don;t know how you get lesser numbers with mine test, but on this machine your KMP algorithm performs best. Greetings ,Bane. _ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
Sorry, but Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba" "abababaaba" I haven't analyzed the algorithm, so I don't know why exactly this fails. I'll take a look sometime soon. As for times: a complete stat is attached, all compiled with -O2, as well as the modified KMP-version and my transcript of Branimir's new version. On my computer, KMP smashes everything else on my test (except Branimir's, only that doesn't yet work correctly), while Bulat's definitely is faster than anything for Branimir's test and faster than anything but KMP for mine. Branimir, isEmpty is the Prelude function 'null', so you needn't define it yourself. Cheers, Daniel Am Montag, 12. Dezember 2005 05:20 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > > >To: Haskell-Cafe@haskell.org > >Subject: [Haskell-cafe] Substring replacements > >Date: Mon, 12 Dec 2005 01:14:37 +0100 > > > >Okay, I have looked up KMP and implemented it. > >Seems to work -- my first use of QuickCheck, too. > >It's slower than Bulat's and Tomasz' for Branimir's test :-(, > >but really fast for my test. > > Strange I got completelly different results: > > [EMAIL PROTECTED] ~/tutorial > $ time srchrep.exe > Working very long > True > Done > > real0m16.407s > user0m0.015s > sys 0m0.015s > > [EMAIL PROTECTED] ~/tutorial > $ ghc -fglasgow-exts -O2 srchrep.hs --make -o srchrep.exe > Chasing modules from: srchrep.hs > Compiling Main ( srchrep.hs, srchrep.o ) > Linking ... > > [EMAIL PROTECTED] ~/tutorial > $ time srchrep.exe > Working:seasearch replace able seaseasearch baker seasearch charlie > True > Done > > > real0m10.156s > user0m0.015s > sys 0m0.015s > > [EMAIL PROTECTED] ~/tutorial > $ time replace1.exe > Working:seasearch replace able seaseasearch baker seasearch charlie > True > Done > > > real0m11.672s > user0m0.015s > sys 0m0.015s > > Now your version is fastest according to my machine, but it is not faster > with your test it's slower in compariton to replace1. > > I've corrected my code so it is fastest with your test,still less then a > second, > but slowest with mine. > Checked with your fail tests and compared results of these 2 tests. > Now should be ok. > I maintan now two lists one for successes and other for failures. > I also prettified code a bit . > > searchReplace :: String->String->String -> String > searchReplace sr rp xs = searchr sr rp xs "" "" >where > searchr :: String->String->String->String->String -> String > searchr [] _ xs _ _ = xs > searchr _ _ [] _ _ = [] > searchr sr rp xs retBack rollBack > > | isFound $ fnd rollBack = rp > > ++ searchr sr rp (remaining $ fnd > rollBack ) >( getRetBack $ fnd > rollBack) >( getRollBack $ fnd > rollBack) > > | otherwise = reverse ((processed $ fnd rollBack) ++ > > rollBack) >++ searchr sr rp (remaining $ fnd rollBack) > ( getRetBack $ fnd > rollBack) ( getRollBack $ fnd rollBack) > where fnd = searchr' sr sr (reverse retBack ++ xs) "" > > isFound = fst . fst > remaining = snd . snd . fst > getRollBack = snd . snd > getRetBack = fst . snd > processed = fst . snd . fst > > searchr' :: String->String->String->String->String > -> ((Bool,(String,String)),(String,String)) > searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack = >searchr'' (drop (length rollBack) srch) srch' xs > fndSoFar > (False,"","") sr > > searchr'' :: String->String->String->String->(Bool,String,String)->Char > -> ((Bool,(String,String)),(String,String)) > searchr'' [] _ xs fnd _ _ = ((True,(fnd,xs)),("","")) > searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ > rollBack ++ fnd,[])),("","")) > searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar > (cnt,retBack,rollBack) s > > | sr == x = if cnt && sr' == x && isEmpty retBack > >
Re: [Haskell-cafe] Substring replacements
Am Montag, 12. Dezember 2005 01:34 schrieben Sie: > On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote: > > Okay, I have looked up KMP and implemented it. > > Seems to work -- my first use of QuickCheck, too. > > It's slower than Bulat's and Tomasz' for Branimir's test :-(, > > but really fast for my test. > > Undoubtedly, one can still tune it. > > Perhaps by using unboxed arrays... > > /S > > > -- > Sebastian Sylvan > +46(0)736-818655 > UIN: 44640862 I'm afraid, unboxed arrays are out of the question, because bord is incrementally produced :-( Working very long test2: <> Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Substring replacements
From: Daniel Fischer <[EMAIL PROTECTED]> To: Haskell-Cafe@haskell.org Subject: [Haskell-cafe] Substring replacements Date: Mon, 12 Dec 2005 01:14:37 +0100 Okay, I have looked up KMP and implemented it. Seems to work -- my first use of QuickCheck, too. It's slower than Bulat's and Tomasz' for Branimir's test :-(, but really fast for my test. Strange I got completelly different results: [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working very long True Done real0m16.407s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ ghc -fglasgow-exts -O2 srchrep.hs --make -o srchrep.exe Chasing modules from: srchrep.hs Compiling Main ( srchrep.hs, srchrep.o ) Linking ... [EMAIL PROTECTED] ~/tutorial $ time srchrep.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m10.156s user0m0.015s sys 0m0.015s [EMAIL PROTECTED] ~/tutorial $ time replace1.exe Working:seasearch replace able seaseasearch baker seasearch charlie True Done real0m11.672s user0m0.015s sys 0m0.015s Now your version is fastest according to my machine, but it is not faster with your test it's slower in compariton to replace1. I've corrected my code so it is fastest with your test,still less then a second, but slowest with mine. Checked with your fail tests and compared results of these 2 tests. Now should be ok. I maintan now two lists one for successes and other for failures. I also prettified code a bit . searchReplace :: String->String->String -> String searchReplace sr rp xs = searchr sr rp xs "" "" where searchr :: String->String->String->String->String -> String searchr [] _ xs _ _ = xs searchr _ _ [] _ _ = [] searchr sr rp xs retBack rollBack | isFound $ fnd rollBack = rp ++ searchr sr rp (remaining $ fnd rollBack ) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) | otherwise = reverse ((processed $ fnd rollBack) ++ rollBack) ++ searchr sr rp (remaining $ fnd rollBack) ( getRetBack $ fnd rollBack) ( getRollBack $ fnd rollBack) where fnd = searchr' sr sr (reverse retBack ++ xs) "" isFound = fst . fst remaining = snd . snd . fst getRollBack = snd . snd getRetBack = fst . snd processed = fst . snd . fst searchr' :: String->String->String->String->String -> ((Bool,(String,String)),(String,String)) searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack = searchr'' (drop (length rollBack) srch) srch' xs fndSoFar (False,"","") sr searchr'' :: String->String->String->String->(Bool,String,String)->Char -> ((Bool,(String,String)),(String,String)) searchr'' [] _ xs fnd _ _ = ((True,(fnd,xs)),("","")) searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ rollBack ++ fnd,[])),("","")) searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar (cnt,retBack,rollBack) s | sr == x = if cnt && sr' == x && isEmpty retBack then searchr'' srs srs' xs fndSoFar (True,"",x:rollBack) s else if not (isEmpty retBack) || not (isEmpty rollBack) then searchr'' srs srch' xs fndSoFar (True,(x:rollBack) ++ retBack,"") s else searchr'' srs srch' xs (x:fndSoFar) (True,"","") s | otherwise = if isEmpty rollBack && isEmpty retBack then if s == x then ((False,(fndSoFar,xxs)),("","")) else ((False,searchr''' s xs (x:fndSoFar)),("","")) else if sr' == x && isEmpty retBack then ((False,(fndSoFar, xs)), (retBack,x:rollBack)) else ((False,(fndSoFar, xxs)), (retBack,rollBack)) searchr''' :: Char->String->String -> (String,String) searchr''' sr [] fndSoFar = (fndSoFar,[]) searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs (x:fndSoFar) | otherwise = (fndSoFar,xxs) isEmpty [] = True isEmpty (a:as) = False --- Greetings, Bane. _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Substring replacements
On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote: > Okay, I have looked up KMP and implemented it. > Seems to work -- my first use of QuickCheck, too. > It's slower than Bulat's and Tomasz' for Branimir's test :-(, > but really fast for my test. > Undoubtedly, one can still tune it. Perhaps by using unboxed arrays... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Substring replacements
Okay, I have looked up KMP and implemented it. Seems to work -- my first use of QuickCheck, too. It's slower than Bulat's and Tomasz' for Branimir's test :-(, but really fast for my test. Undoubtedly, one can still tune it. Here's the code: module KMP where import Data.Array searchReplace :: String -> String -> String -> String searchReplace "" _ str = str searchReplace src@(c:cs) dst str = process 0 str "" where len = length src pat = listArray (0,len-1) src bord = array (0,len) $ (0,-1):(1,0):[(i+1,boun i (bord!i)) | i <- [1 .. len-1]] boun i j | j < 0 = 0 | pat!i == pat!j = j+1 | otherwise = boun i (bord!j) getBord s n | m < 1 = m | s == pat!m = m | otherwise = getBord s m where m = bord!n process n str _ | n >= len = dst ++ process 0 str "" process _ "" mat = reverse mat process 0 (s:st) _ | s == c= process 1 st [s] | otherwise = s:process 0 st "" process n str@(s:st) mat | s == pat!n = process (n+1) st (s:mat) | otherwise = let j = getBord s n (ret,skip) = splitAt j mat in if j < 0 then reverse mat ++ process 0 str "" else reverse skip ++ process j str ret Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Substring replacements (was: Differences in optimisiation ...)
Am Samstag, 10. Dezember 2005 02:51 schrieben Sie: > From: Daniel Fischer <[EMAIL PROTECTED]> > > >To: "Branimir Maksimovic" <[EMAIL PROTECTED]> > >CC: Haskell-Cafe@haskell.org > >Subject: Re: Differences in optimisiation with interactive and compiled mo > >Date: Fri, 9 Dec 2005 23:27:00 +0100 > > > >Still doesn't work, though: > > > >*Main> searchr "hahal" "jupp" "hahahalala" > >"hahahalala" > > > >The problem is that the string to replace may contain a repeated pattern > >and the pattern that begins the actual occurence might be consumed before > > a failure is detected. > > Yes, I've corrected it. Now it is just 25% faster and that is only with -O2 > flag. > Here is whole thing, I hope there are no more bugs left :) : > None that sprang to my eyes. However, on my machine, yours is not faster than Lemmih's. Now, using the new Strings, I get the following times: -O2 -O1 no opt Lemmih's: 38.9 sec38.9 sec76.7 sec Yours : 40.1 sec 41.5 sec 131.1 sec Mine : 32.9 sec 33.1 sec82.8 sec. However, there's a problem with Lemmih's replace: *Main> searchr "ababcab" "###" "ababcababcabab" "###abcab" *Main> replace "ababcab" "###" "ababcababcabab" "ababc###ab" due to the fact that Lemmih's version scans the input from right to left (that's easily changed by a few reverses, though -- but costly for long inputs), more serious is Prelude Main> replace "ja" "aja" "jjja" "ajajajajajajaja". The fastest -- and nicely simple above -- that I could come up with is replace :: String -> String -> String -> String replace _ _ "" = "" replace "" _ str = str replace src dst inp = process inp where n = length src process "" = "" process st@(c:cs) | src `isPrefixOf` st = dst ++ process (drop n st) | otherwise = c:process cs It's roughly 10% faster than my other version on "seasearch" ... and if you try it on main2 :: IO () main2 = let src = replicate 1000 'r' dst = " # " str = replicate 999 'r' ++ 'c': replicate 1000 'r' out = replace src dst $ concat $ replicate 500 str out1 = replace src dst $ concat $ replicate 501 str in do putStrLn $ "Working very long" putStrLn $ show (out == out1) ++ "\nDone" you'll see a real difference. I'm not sure, why your algorithm pays a so much higher penalty, though. Maybe, it'll be faster if you make searchr' &c local functions? I'll try. Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe