Re: [Haskell-cafe] Substring replacements

2005-12-23 Thread Daniel Fischer
Hello Bulat,

I'm not sure what your point is, let's try to enlighten me.

Am Mittwoch, 21. Dezember 2005 16:30 schrieben Sie:
> Hello Daniel,
>
> Wednesday, December 21, 2005, 5:20:18 PM, you wrote:
>
> DF> ordinarily, on my computer, your version of straightforward is 10-15%
> faster DF> than KMP (search-patterns are now supplied on the command line
> -- which has DF> no big impact;
>
> of course. the search pattern can be compiled into the search
> algorithm just at the time of program execution. just for example, if
> you have code
>
> main = do [x] <- getArgs
>   y <- return $ map (\n -> fac $ read x) [1..10^6]
>
> then `fac $ read x` will be computed just one time. in this time, if
> your perform many searches with the same pattern - compiler must
> initialize array just one time and use it for all searches
>
Errrh, what here? If I want to replace each occurence of a pattern within a 
String, of course I want the arrays built only once, destroying and 
rebuilding them would be deliberate stupidity, that can't be your point, 
certainly. And also if we interactively search and replace, as in an editor, 
we'd hold on to the arrays until we get the message 'no more of that, next 
pattern'.

>
> DF> searched string is " able sea..."; all compiled with -O2;
> DF> NOINLINE makes no difference -- at least with optimisations on --
>
> why? i think that is the right way to check speed of optimized program
> without pre-compiling search pattern to search/replace algorithm at
> runtime
? What has NOINLINE to do with precompiling the search pattern?
BTW, I think, the KMP-algorithm is too large to be inlined anyway (maybe, if 
we explicitly asked for it), so I wouldn't expect any NOINLINE-effect in the 
first place.

And about RunTimeCompilation, I don't quite understand the Wiki-page, what I 
imagined might be the point there is that, if I know the search pattern 
before, I might import the general algorithm, write a function
searchReplacePattern = searchReplace "pattern"
and the arrays might then be built at compile-time and included in the object 
code -- but I've no idea whether compilers are clever enough to do that (I 
believe it should be possible to write compilers which would do that, but is 
it worth the trouble -- or is that sort of optimisation easy and routinely 
done?) -- but why that would be called RunTimeCompilation, I cannot imagine.
So then, the thing that might make a difference, would be not to give the 
search pattern at compile-time, which I did. Of course, since we search a 
long String, the time needed to build the arrays is minute in relation to the 
time needed to traverse the String, but in the more realistic situation of a 
shorter String (50 kB instead of 48 MB), it would be significant.
>
> DF> ; without optimisations, KMP suffers far worse than straightforward).
>
> this test is meaningless

Well, it's not really a test for the algorithm, but for ghc's optimiser and 
I've forgotten the exact numbers, but KMP compiled without optimisation is 
much much, much slower than with -O2, so the optimiser does a really great 
job.
>
> >> KMP is O(m) while straightforward is O(m*n).
>
> DF> Where m is the length of the input and n is the length of the
> searched-for DF> pattern, I think?
> DF> But these are worst-case complexities, I believe, ordinarily,
> straightforward DF> will be O(m), too.
>
> and have longer init time (and take more space), so on typical
> searches it will be worser than trivial algorithm

I believe you've misread here. I'm saying that while the worst case complexity 
for the straighforward (or trivial) algorithm is O(n*m) -- as witnessed by my 
test, which was deliberately designed to be bad for that --, usually, in 
most real situations, that will have O(m) complexity, too.
And I -- like you -- believe that for typical searches the straightforward 
algorithm will be faster (at least with a clever implementation, like yours).

Cheers,
Daniel

P.S.: I'd really appreciate another attempt at explaining RunTimeCompilation 
to me (might well be an URL of a paper or something).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-23 Thread Branimir Maksimovic





From: Bulat Ziganshin <[EMAIL PROTECTED]>
Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: haskell-cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 23 Dec 2005 11:32:01 +0300

Hello Branimir,

Wednesday, December 21, 2005, 10:18:43 AM, you wrote:

>>try to add
>>
>>{-# NOINLINE replace #-}
>>
>>to both programs and repeat comparision

BM> These are tests:
BM> No optimisations (no -O):

NOINLINE just prevents RunTimeCompilation (see wiki page for details),
so this way you will test speed of "replace" on previously unknown
string. disabling optimization says nothing about real speed of
optimized program, which searches for the many different strings



I got it. These tests were with NOINLINE in both cases but I didn;t
saw any speed difference in results as actually replace (straight)
and searchReplace (KMP) is just called for two differnet strings.
Perhaps if I call that for long list of short patterns patterns on short 
string,

test would display different results (INLINE wouldn't help).
I'll try that next.

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Substring replacements

2005-12-23 Thread Bulat Ziganshin
Hello Branimir,

Wednesday, December 21, 2005, 10:18:43 AM, you wrote:

>>try to add
>>
>>{-# NOINLINE replace #-}
>>
>>to both programs and repeat comparision

BM> These are tests:
BM> No optimisations (no -O):

NOINLINE just prevents RunTimeCompilation (see wiki page for details),
so this way you will test speed of "replace" on previously unknown
string. disabling optimization says nothing about real speed of
optimized program, which searches for the many different strings

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-22 Thread Daniel Fischer
Am Mittwoch, 21. Dezember 2005 15:20 schrieb Daniel Fischer:
> > >i'm 90% sure that straightforward method must be faster for one-time
> > >searches.
>
> I'm far less sure of that. If you have a really short search-pattern, I
> think that probably straightforward search is indeed faster (at least, if
> the search-pattern isn't supplied at compile-time). But if you have a
> pattern of length 100, say, and lots of relatively long partial matches,
> chances are that KMP will move on something like 50 chars at once, which
> would have to be re-checked by straightforward. I've no idea, when that
> would outweigh the extra time of building the KMP-arrays, though. In my
> test -- extremely unrealistic, but provides a more or less worst case
> example --
> straightforward must make roughly half a million character comparisons to
> pass each 'c', while KMP does 2000 (+/-2) (one way through the array and
> back), so there are situations where KMP definitely is faster. But
> ordinarily, on my computer, your version of straightforward is 10-15%
> faster than KMP (search-patterns are now supplied on the command line --
> which has no big impact; searched string is " able sea..."; all compiled
> with -O2; NOINLINE makes no difference -- at least with optimisations on
> --; without optimisations, KMP suffers far worse than straightforward).
>

I've now tested with
main = do args <- getArgs
  n <- case args of
 (ar:_) -> readIO ar `catch` (\_ -> return 400)
 _  -> return 500
  let src = replicate n 'r'
  dst = " # "
  str = replicate (n-1) 'r' ++ 'c': replicate n 'r'
  k   = if n < 200 then ceiling (5e6 / fromIntegral n)
   else ceiling (1e7 / fromIntegral n)
  out = searchReplace src dst $ concat $ replicate k str
  putStr "Working with "
  print n
  print $ length out
  putStrLn "Done"

and KMP wins for n == 1 and n >= 12, also for " able seasea...", KMP wins for 
search-patterns of length 1 and patterns with no partial matches (and some 
others), but generally -- on my thing -- Bulat's version wins.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-21 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Bulat Ziganshin <[EMAIL PROTECTED]>, Haskell-Cafe@haskell.org
> KMP is O(m) while straightforward is O(m*n).

Where m is the length of the input and n is the length of the searched-for
pattern, I think?

Yes.
But these are worst-case complexities, I believe, ordinarily, 
straightforward

will be O(m), too.


Yes,  those are worst cases for both algorithms. O(m) for KMP,
O(m*n) for straightforward.



> My test favors straightforward, in any other case KMP wins by order of
> magnitude.
Can you give example tests?


Any example that has long search pattern say (many a's followed by b )
and searched string has many partial matches (many a's).
Particularly, any example which exhibits O(m*n) or close to, case for 
straightforward

search.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Substring replacements

2005-12-21 Thread Bulat Ziganshin
Hello Daniel,

Wednesday, December 21, 2005, 5:20:18 PM, you wrote:

>> >BM> I've finally performed test on amd64 and result is a same as on intel.
>> >BM> KMP always wins. So KMP is best suited for non indexed strings
>> >BM> and I guess should be used in library as prefered search/replace
>> >method.
>> >BM> This test favors straightforward search.
>> >
>> >i'm 90% sure that straightforward method must be faster for one-time
>> >searches.

DF> I'm far less sure of that. If you have a really short search-pattern, I 
think 
DF> that probably straightforward search is indeed faster

sorry, i mean exactly that, answering proposition to add this to
library as a PREFFERED method. imho, typical usage of these routines
is for short enough patterns and searched strings



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-21 Thread Daniel Fischer
Am Mittwoch, 21. Dezember 2005 08:18 schrieb Branimir Maksimovic:
> From: Bulat Ziganshin <[EMAIL PROTECTED]>
>
> >Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]>
> >To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >CC: haskell-cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Tue, 20 Dec 2005 23:55:22 +0300
> >
> >Hello Branimir,
> >
> >Tuesday, December 20, 2005, 9:48:48 PM, you wrote:
> >
> >BM> I've finally performed test on amd64 and result is a same as on intel.
> >BM> KMP always wins. So KMP is best suited for non indexed strings
> >BM> and I guess should be used in library as prefered search/replace
> >method.
> >BM> This test favors straightforward search.
> >
> >i'm 90% sure that straightforward method must be faster for one-time
> >searches.

I'm far less sure of that. If you have a really short search-pattern, I think 
that probably straightforward search is indeed faster (at least, if the 
search-pattern isn't supplied at compile-time). But if you have a pattern of 
length 100, say, and lots of relatively long partial matches, chances are 
that KMP will move on something like 50 chars at once, which would have to be 
re-checked by straightforward. I've no idea, when that would outweigh the 
extra time of building the KMP-arrays, though. In my test -- extremely 
unrealistic, but provides a more or less worst case example -- 
straightforward must make roughly half a million character comparisons to 
pass each 'c', while KMP does 2000 (+/-2) (one way through the array and 
back), so there are situations where KMP definitely is faster. But 
ordinarily, on my computer, your version of straightforward is 10-15% faster 
than KMP (search-patterns are now supplied on the command line -- which has 
no big impact; searched string is " able sea..."; all compiled with -O2; 
NOINLINE makes no difference -- at least with optimisations on --; without 
optimisations, KMP suffers far worse than straightforward).

>
> KMP is O(m) while straightforward is O(m*n).

Where m is the length of the input and n is the length of the searched-for 
pattern, I think?
But these are worst-case complexities, I believe, ordinarily, straightforward 
will be O(m), too.

>
> your test may give better results with KMP algorithm just
>
> >because you repeat the same search many times and it was automatically
> >"run-time compiled" as described on the wiki page about KMP

My results seem to witness against that.

>
> My test favors straightforward, in any other case KMP wins by order of
> magnitude.
Can you give example tests?

> I think that straightfoirward is better then KMP with optimisations
> off is due more complex program.
>
> >try to add
> >
> >{-# NOINLINE replace #-}
> >
> >to both programs and repeat comparision
>
> Greetings, Bane.
>
Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-20 Thread Branimir Maksimovic





From: Bulat Ziganshin <[EMAIL PROTECTED]>
Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 20 Dec 2005 23:55:22 +0300

Hello Branimir,

Tuesday, December 20, 2005, 9:48:48 PM, you wrote:

BM> I've finally performed test on amd64 and result is a same as on intel.
BM> KMP always wins. So KMP is best suited for non indexed strings
BM> and I guess should be used in library as prefered search/replace 
method.

BM> This test favors straightforward search.

i'm 90% sure that straightforward method must be faster for one-time
searches.


KMP is O(m) while straightforward is O(m*n).

your test may give better results with KMP algorithm just

because you repeat the same search many times and it was automatically
"run-time compiled" as described on the wiki page about KMP


My test favors straightforward, in any other case KMP wins by order of
magnitude.
I think that straightfoirward is better then KMP with optimisations
off is due more complex program.


try to add

{-# NOINLINE replace #-}

to both programs and repeat comparision


These are tests:
No optimisations (no -O):
Intel hyperthreaded,windows
$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m34.766s
user0m0.015s
sys 0m0.000s

[EMAIL PROTECTED] ~/tutorial
$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m14.719s
user0m0.031s
sys 0m0.000s

AMD 64 bit:
[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real1m58.066s
user1m57.939s
sys 0m0.128s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m41.565s
user0m41.527s
sys 0m0.040s

with optimisations (-O):

Intel hyperthreaded,windows
$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m8.625s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.735s
user0m0.015s
sys 0m0.000s

AMD 64 bit, linux:
[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.546s
user0m10.529s
sys 0m0.016s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.796s
user0m11.785s
sys 0m0.012s

Greetings, Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-20 Thread Bulat Ziganshin
Hello Branimir,

Tuesday, December 20, 2005, 9:48:48 PM, you wrote:

BM> I've finally performed test on amd64 and result is a same as on intel.
BM> KMP always wins. So KMP is best suited for non indexed strings
BM> and I guess should be used in library as prefered search/replace method.
BM> This test favors straightforward search.

i'm 90% sure that straightforward method must be faster for one-time
searches. your test may give better results with KMP algorithm just
because you repeat the same search many times and it was automatically
"run-time compiled" as described on the wiki page about KMP

try to add

{-# NOINLINE replace #-}

to both programs and repeat comparision




-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Substring replacements

2005-12-20 Thread Branimir Maksimovic

I've finally performed test on amd64 and result is a same as on intel.
KMP always wins. So KMP is best suited for non indexed strings
and I guess should be used in library as prefered search/replace method.
This test favors straightforward search.

[EMAIL PROTECTED] myhaskell]$ time ./KMP
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.783s
user0m10.769s
sys 0m0.016s
[EMAIL PROTECTED] myhaskell]$ time ./straightforward
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.769s
user0m11.741s
sys 0m0.028s
[EMAIL PROTECTED] myhaskell]$ uname -a
Linux devel64.office.kom 2.6.14-skas3-v8.2 #2 Fri Nov 11 21:19:36 CET 2005 
x86_64 x86_64 x86_64 GNU/Linux


Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-18 Thread Branimir Maksimovic





From: Bulat Ziganshin <[EMAIL PROTECTED]>
Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 16 Dec 2005 16:51:32 +0300

Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM> I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests

only way to make ghc using multiple processors is to use 6.5 beta
version, compile with "-smp" and explicitly fork several threads



You are right. I've double checked on linux there is just one thread
executing and there is not such a big difference between KMP and
straightforward search.
Just about 10% KMP is faster with my test, but still faster.
I've checked both SMP and non SMP linux (Intel).
Hyperthreading effect is on windows only, I guess, as there
are visible three threads per test process.
I have one amd64 near (I'll check that one too, as soon as admin
sets up account for me on that machine).

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: Re[2]: [Haskell-cafe] Substring replacements

2005-12-17 Thread Branimir Maksimovic




From: Bulat Ziganshin <[EMAIL PROTECTED]>
Reply-To: Bulat Ziganshin <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED], Haskell-Cafe@haskell.org
Subject: Re[2]: [Haskell-cafe] Substring replacements
Date: Fri, 16 Dec 2005 16:51:32 +0300

Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM> I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests
Oh yes it does. I clearly see multiple threads that take unequal percentage 
on

each virtual CPU. I guess that other thread is garbage collector
thread. In case of hyperthreading, speed is gained by reduced memory
latency by 30-60 %



only way to make ghc using multiple processors is to use 6.5 beta
version, compile with "-smp" and explicitly fork several threads


This is not the case as I see. On windows search replace test programs
spawn 3 threads and on linux I'm not sure, but I've checked program that 
calls Haskell

from C++ and GHC spawns additional thread, which is not my thread, that
also performs something constantly, and I didn't spawn any thread
from Haskell.
So hyperthreading helps as it helps to optimise when several threads
accesses memory as is in this test case.
I can't see any other explanation why KMP search is
slower on AMD 20% , but faster on Intel 30%, then straightforward search
with my test.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Substring replacements

2005-12-16 Thread Bulat Ziganshin
Hello Branimir,

Friday, December 16, 2005, 5:36:47 AM, you wrote:
BM> I've also performed tests on dual Xeon linux box and results are

just to let you know - GHC don't uses pentium4 hyperthreading,
multiple cpus or multiple cores in these tests

only way to make ghc using multiple processors is to use 6.5 beta
version, compile with "-smp" and explicitly fork several threads


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-16 Thread Daniel Fischer
Am Freitag, 16. Dezember 2005 03:36 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
> >Any improvements are welcome, certainly some of you can do much better.
>
> It is fast on my machine except that you are using Map to lookup
> for badChar which is O(log n).
> I;ve placed this instead:
>   badChar :: UArray Int Int
>   badChar  = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0)
>   proc [] _ = []
>   proc (s:st) i = (ord s,i):proc  st (i+1)
>   getBc c = badChar ! ord c
>
> which gaved it significant boost, O(1) lookup.

Yes, but Char has 1114112 values, and I'm not sure whether such a large array 
would still be better, especially since, presumably, the Map will usually not 
be deeper than five layers, say. But if we restrict ourselves to extended 
ASCII Strings, an array certainly is better.
And maybe, instead of using two arrays, bmGs0 and bmGs, a mutable array (those 
are DiffArrays, I think -- I'll check that out) would also improve it.

> Now it's faster then brute force method but 10% slower then KMP
> with my test.
> I've also performed tests on dual Xeon linux box and results are
> proportionally
> the same as on my intel windows box.
> KMP wins again 10% better then BM and 20-30% better then straightforward
> search,
> which means that KMP is well suited for non indexed strings.
>
> >Cheers,
> >Daniel
> >
> >P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is
> >somewhat
> >fussy.
>
> Yes, BM is for indexed structures.
>
> Greetings, Bane.
>
Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 21:07:11 +0100

Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie:
> From: "Branimir Maksimovic" <[EMAIL PROTECTED]>
>
> >To: [EMAIL PROTECTED]
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Thu, 15 Dec 2005 00:55:02 +
> >
> >>From: Daniel Fischer <[EMAIL PROTECTED]>
> >>To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >>CC: Haskell-Cafe@haskell.org
> >>Subject: Re: [Haskell-cafe] Substring replacements
> >>Date: Wed, 14 Dec 2005 20:40:06 +0100
> >>
> >>Hi Bane,
> >>
> >>nice algorithm. Since comparing chars _is_ cheap, it is to be expected
> >>that
> >>all the hash-rotating is far more costly for short search patterns. 
The
> >>longer the pattern, the better this gets, I think -- though nowhere 
near

> >>KMP
> >>(or would it?).
> >
> >Yes,KMP is superior in single pattern search. We didn't tried 
Boyer-Moore

> >algorithm yet, though. But I think it would be
> >difficult to implement it in Haskell efficiently as it searches 
backwards

> >and jumps around, and we want memory savings.
> >Though, I even didn't tried yet, but it is certainly very interesting.
>
> Forget what I've said.
> Boyer-Moore *can* be implemented efficiently, it is similar to KMP it 
goes
> forward, but when it finds last character in pattern, than starts to 
search

> backwards.
> This can be implemented easilly as Haskell lists naturaly reverse order
> when putting from one list to other.
> Heh, never say never :)
> As I see from documents Boyer-Moore has best performance on average
> and should be better than KMP.
>
> Greetings,Bane.
>
Well, I also thought that all the jumping around in Boyer-Moore wasn't too
good (after each shift we must bite off a chunk from the remaining input,
pushing that onto the stack, which costs something). But I gave it a try
today and here's what I came up with:

import Data.List (tails)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Array.Unboxed

searchRep :: String -> String -> String -> String
searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str
where
  len = length src
  len1 = len-1
  pat :: UArray Int Char
  pat = listArray (0,len1) src
  ch = pat!len1
  badChar :: Map Char Int
  badChar = Map.fromList $ zip src [0 .. ]
  getBc c = case Map.lookup c badChar of
   Just n  -> n
   Nothing -> -1
  suffs :: UArray Int Int
  suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs
  where
crs = reverse src
pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys
pr n _ _ = n
  bmGs0 :: UArray Int Int
  bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j 
<-

[k' .. k-1]]
  help = [k | k <- [0 .. len], k == len || suffs!k == len-k]
  bmGs :: UArray Int Int
  bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]]
  run by "" = reverse by
  run by (c:cs)
| c == ch   = process (c:by) cs
| otherwise = run (c:by) cs
  roll n xs ys | n <= 0 = (xs, ys)
  roll n xs (y:ys) = roll (n-1) (y:xs) ys
  roll _ xs "" = (xs, "")
  walk n "" = (n,"")
  walk n st@(c:cs)
| n < 0  = (n,st)
| c == pat!n = walk (n-1) cs
| otherwise  = (n,st)
  process con left
| i < 0 = reverse pass ++ rp ++ run "" left
| otherwise = {- bye ++ -} run ncon nleft
  where
 (i,pass) = walk len1 con
 d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass))
 -- bye = reverse $! drop (len-d) con
 (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left

it's not as fast as KMP for the tests, but not too bad.
Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before 
a
match (if any), we'd be better off relieving our memory with 'bye', I 
think.


Any improvements are welcome, certainly some of you can do much better.


It is fast on my machine except that you are using Map to lookup
for badChar which is O(log n).
I;ve placed this instead:
 badChar :: UArray Int Int
 badChar  = array (0,255) ([(i,-1) | i <- [0..255]] ++ proc src 0)
 proc [] _ = []
 proc (s:st) i = (ord s,i):proc  st (i+1)
 getBc c = badChar ! ord c

which gaved it signi

Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Daniel Fischer
Am Donnerstag, 15. Dezember 2005 02:39 schrieben Sie:
> From: "Branimir Maksimovic" <[EMAIL PROTECTED]>
>
> >To: [EMAIL PROTECTED]
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Thu, 15 Dec 2005 00:55:02 +
> >
> >>From: Daniel Fischer <[EMAIL PROTECTED]>
> >>To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >>CC: Haskell-Cafe@haskell.org
> >>Subject: Re: [Haskell-cafe] Substring replacements
> >>Date: Wed, 14 Dec 2005 20:40:06 +0100
> >>
> >>Hi Bane,
> >>
> >>nice algorithm. Since comparing chars _is_ cheap, it is to be expected
> >>that
> >>all the hash-rotating is far more costly for short search patterns. The
> >>longer the pattern, the better this gets, I think -- though nowhere near
> >>KMP
> >>(or would it?).
> >
> >Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
> >algorithm yet, though. But I think it would be
> >difficult to implement it in Haskell efficiently as it searches backwards
> >and jumps around, and we want memory savings.
> >Though, I even didn't tried yet, but it is certainly very interesting.
>
> Forget what I've said.
> Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
> forward, but when it finds last character in pattern, than starts to search
> backwards.
> This can be implemented easilly as Haskell lists naturaly reverse order
> when putting from one list to other.
> Heh, never say never :)
> As I see from documents Boyer-Moore has best performance on average
> and should be better than KMP.
>
> Greetings,Bane.
>
Well, I also thought that all the jumping around in Boyer-Moore wasn't too 
good (after each shift we must bite off a chunk from the remaining input, 
pushing that onto the stack, which costs something). But I gave it a try 
today and here's what I came up with:

import Data.List (tails)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Array.Unboxed

searchRep :: String -> String -> String -> String
searchRep src rp str = run (reverse $ take len1 str) $ drop len1 str
where
  len = length src
  len1 = len-1
  pat :: UArray Int Char
  pat = listArray (0,len1) src
  ch = pat!len1
  badChar :: Map Char Int
  badChar = Map.fromList $ zip src [0 .. ]
  getBc c = case Map.lookup c badChar of
   Just n  -> n
   Nothing -> -1
  suffs :: UArray Int Int
  suffs = listArray (0,len1) $! init $! map (pr 0 crs) $! tails crs
  where
crs = reverse src
pr n (x:xs) (y:ys) | x == y = pr (n+1) xs ys
pr n _ _ = n
  bmGs0 :: UArray Int Int
  bmGs0 = array (0,len1) [(j,k) | (k,k') <- zip (tail $! help) help, j <- 
[k' .. k-1]]
  help = [k | k <- [0 .. len], k == len || suffs!k == len-k]
  bmGs :: UArray Int Int
  bmGs = bmGs0 // [(len1-suffs!k,k) | k <- [len1,len-2 .. 1]]
  run by "" = reverse by
  run by (c:cs)
| c == ch   = process (c:by) cs
| otherwise = run (c:by) cs
  roll n xs ys | n <= 0 = (xs, ys)
  roll n xs (y:ys) = roll (n-1) (y:xs) ys
  roll _ xs "" = (xs, "")
  walk n "" = (n,"")
  walk n st@(c:cs)
| n < 0  = (n,st)
| c == pat!n = walk (n-1) cs
| otherwise  = (n,st)
  process con left
| i < 0 = reverse pass ++ rp ++ run "" left
| otherwise = {- bye ++ -} run ncon nleft
  where
 (i,pass) = walk len1 con
 d = if null pass then i+1 else max (bmGs!i) (i - getBc (head pass))
 -- bye = reverse $! drop (len-d) con
 (ncon,nleft) = roll (d-1) {- (take (len-d) con) -} con left

it's not as fast as KMP for the tests, but not too bad.
Commenting out 'bye' gives a bit of extra speed, but if it's _long_ before a 
match (if any), we'd be better off relieving our memory with 'bye', I think.

Any improvements are welcome, certainly some of you can do much better.

Cheers,
Daniel

P.S. As an algorithm, I prefer KMP, it's quite elegant whereas BM is somewhat 
fussy.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-15 Thread Branimir Maksimovic


This is what I got for BM. Performance dissapoints as BM is really
suited for indexed strings like arrays.It mainly operates on indexes.
This is simple BM, as I didn't want to go
for more complex variant,becauses takes and drops and recalculation of next 
position

is too pricey for non indexed structure. So, clear winner is KMP for
non indexed strings. There is also finite automaton algorithm
but this works well if search strings are precompiled, so I'll
implement it only for education purposes. I hope my Haskell improves
as I've learned how to reduce number of paramaters.

searchReplaceBM :: String -> String -> String -> String
searchReplaceBM "" _ str = str
searchReplaceBM sr rp str = searchReplace str
where
   table :: UArray Int Int
   table  = array (0,255) ([(i,0) | i <- [0..255]] ++ proc sr 1)
   proc [] _ = []
   proc (s:st) i = (ord s,i):proc  st (i+1)
   len = length sr
   rsrch = reverse sr
   searchReplace str
| null remaining = if found then rp
else passed
|found = rp ++ searchReplace remaining
| otherwise = passed ++ searchReplace remaining
where
   (passed,remaining,found) = searchReplace' str
   searchReplace' str
   = if j == 0
then ("",drop len str,True)
else failed
   where failed = case drop (j-1) str of
  [] -> (str,"",False)
  (c:_) -> (take sk str, drop sk str, False)
   where md = j - table ! ord c
 sk = if md > 0
  then md
  else 1

 j = srch rsrch (reverse $ take len str) len
   where srch "" "" _ = 0
 srch _ "" l = l
 srch (s:str) (s':str') l
   | s == s' = srch str str' (l-1)
   | otherwise = l


Greetings, Bane.


From: "Branimir Maksimovic" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 01:39:57 +





From: "Branimir Maksimovic" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:55:02 +





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected 
that

all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.



Forget what I've said.
Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
forward, but when it finds last character in pattern, than starts to search 
backwards.

This can be implemented easilly as Haskell lists naturaly reverse order
when putting from one list to other.
Heh, never say never :)
As I see from documents Boyer-Moore has best performance on average
and should be better than KMP.

Greetings,Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/




_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: [EMAIL PROTECTED]
To: haskell-cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:25:19 -0500

G'day all.

Quoting Branimir Maksimovic <[EMAIL PROTECTED]>:

> After seeing that your program is fastest (I've also tried one from
> http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> that good in converting to search replace?)

You probably did it right, but you could post your version to the
list if you want me to take a look.


Oh, here it is but just don;t laugh :)
I've hacked with unsafePerformIO as I din't know
how to remove IO from match any other way.

searchReplaceKMP :: String->String->String -> String
searchReplaceKMP sr rp s
   | not (null remaining) = found++rp
++ searchReplaceKMP sr rp remaining
   | otherwise = found
   where
   (found,remaining) = unsafePerformIO $ matchKMP sr s

matchKMP :: (Monad m, Eq a) => [a] -> ([a] -> m ([a],[a]))
matchKMP []
   = error "Can't match empty list"
matchKMP xs
   = matchfunc []
 where
   matchfunc = makeMatchFunc [dofail] (zip xs (overlap xs))
   dofail = \ps xs -> case xs of
   [] -> fail "can't match"
   (y:ys) -> matchfunc (y:ps) ys

type PartialMatchFunc m a = [a] -> [a] -> m ([a], [a])

makeMatchFunc :: (Monad m, Eq a) => [PartialMatchFunc m a] -> [(a, Int)]
   -> PartialMatchFunc m a
makeMatchFunc prev []
   = \ps xs -> return (reverse (drop ((length prev)-1) ps), xs)
makeMatchFunc prev ((x,failstate):ms)
   = thisf
 where
   mf = makeMatchFunc (thisf:prev) ms
   failcont = prev !! (length prev - failstate - 1)
   thisf = \ps xs -> case xs of
   [] -> fail "can't match"
   (y:ys) -> if (x == y) then mf (y:ps) ys
   else failcont ps xs

overlap :: (Eq a) => [a] -> [Int]
overlap str
   = overlap' [0] str
 where
   overlap' prev []
 = reverse prev
   overlap' prev (x:xs)
 = let get_o o
| o <= 1 || str !! (o-2) == x = o
| otherwise = get_o (1 + prev !! (length prev - o + 1))
   in overlap' (get_o (head prev + 1):prev) xs

--
These are timings (it's performance is about the same as Rabin-Karp):

$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
searchr.exe: user error (can't match)


real0m22.187s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ ghc -fglasgow-exts  -O2 searchr.hs --make  -o searchr.exe
Chasing modules from: searchr.hs
Compiling Main ( searchr.hs, searchr.o )
Linking ...

[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working very long
True
Done

real0m8.110s
user0m0.031s
sys 0m0.016s



When I wrote the RunTimeCompilation code, it wasn't intended to be a
shining example of efficiency, merely an illustration.  Remember
that it's doing TWO things: compiling the pattern to code, and then
performing the search.  The compilation phase is likely to be much
slower than the search, so the speedup (if any!) would only be realised
the SECOND time that you searched a string using the same pattern.
(Assuming you re-used the compiled match code, of course!)


Oh, that explaines it. Actually this has to be converted to searchReplace
in order to be fast, but I don;t know how (yet) as your program
is pretty complicated to my humble Haskell skills.
I think that your technique can be usefull with Aho-Corasick algorithm
as it first constructs finite automaton from tree, then performs search.
So, I'll guess I'll try first Boyer-Moore, then Aho-Corasick, eventually run
time compilation, but this is too advanced for me for now.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread ajb
G'day all.

Quoting Branimir Maksimovic <[EMAIL PROTECTED]>:

> After seeing that your program is fastest (I've also tried one from
> http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> that good in converting to search replace?)

You probably did it right, but you could post your version to the
list if you want me to take a look.

When I wrote the RunTimeCompilation code, it wasn't intended to be a
shining example of efficiency, merely an illustration.  Remember
that it's doing TWO things: compiling the pattern to code, and then
performing the search.  The compilation phase is likely to be much
slower than the search, so the speedup (if any!) would only be realised
the SECOND time that you searched a string using the same pattern.
(Assuming you re-used the compiled match code, of course!)

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: "Branimir Maksimovic" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Thu, 15 Dec 2005 00:55:02 +





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected 
that

all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.



Forget what I've said.
Boyer-Moore *can* be implemented efficiently, it is similar to KMP it goes
forward, but when it finds last character in pattern, than starts to search 
backwards.

This can be implemented easilly as Haskell lists naturaly reverse order
when putting from one list to other.
Heh, never say never :)
As I see from documents Boyer-Moore has best performance on average
and should be better than KMP.

Greetings,Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 20:40:06 +0100

Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected that
all the hash-rotating is far more costly for short search patterns. The
longer the pattern, the better this gets, I think -- though nowhere near 
KMP

(or would it?).


Yes,KMP is superior in single pattern search. We didn't tried Boyer-Moore
algorithm yet, though. But I think it would be
difficult to implement it in Haskell efficiently as it searches backwards
and jumps around, and we want memory savings.
Though, I even didn't tried yet, but it is certainly very interesting.

However, I don't see how to (efficiently) do a multiple

pattern search with KMP, so there -- if all patterns have the same length,
otherwise I don't see -- Rabin-Karp would probably be the method of choice.


Yes, this algorithm can search in parallel patterns of same length.
Different search patterns have to be searched same way as with KMP.



I tuned it up somewhat:
import Data.List (isPrefixOf)
import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
-- faster for my test, but slower for yours, but only a whiff.


Wow, on my machine your version of Rabin-Karp gives 30% boost to my test.
This helps me learn Haskell , too .

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Wed, 14 Dec 2005 17:10:20 +0100





> I think that's because on your machine Bulat's version have better
> perfromance
> with CPU cache.
> I don;t know but now your version is 25% faster with my test on P4
> hyperthreaded.

E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with
computers.


I think that i've figure it now.
Hyperthreading is hardware CPU feature that single CPU core
can speed up execution of two running threads.
For example if one thread uses integer unit and other FP unit
CPU executes that in parallel. But that's not important
or significant. What is interestenting is memory latency.
If one thread peeks and pokes around memory for , say 1
unit of time, with usual CPU two thread will execute
2 units of time. Hyperthreaded (I'm talking about intel implementation)
CPU will execute that in 1.4 points of time giving 60% boost
in terms of speed. I've tested some assembler and C program
that launches two threads each roaming over memory
to anulate impact of cache.
What is noticable is that two threads have 60% less memory
latency constantly then single thread. That means if
single thread for each out of cache memory access waits
300-400 CPU cycles, two threads wait 60% less.
Now what has that to do with our programs as they are single
threaded? I think it's garbage collection.
Our programs run with garbage collector in background and
you feel that burden by 20% as your program probably pushes
garbage collector to work more than Bulat's version.
On hyperthreaded CPU impact of garbage collection is reduced
by a factor of 30-60 % resulting in your program being 30% faster
on my machine.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Daniel Fischer
Hi Bane,

nice algorithm. Since comparing chars _is_ cheap, it is to be expected that 
all the hash-rotating is far more costly for short search patterns. The 
longer the pattern, the better this gets, I think -- though nowhere near KMP 
(or would it?). However, I don't see how to (efficiently) do a multiple 
pattern search with KMP, so there -- if all patterns have the same length, 
otherwise I don't see -- Rabin-Karp would probably be the method of choice.

Am Mittwoch, 14. Dezember 2005 10:16 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
>
> >To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Tue, 13 Dec 2005 11:23:29 +0100
>
> After seeing that your program is fastest (I've also tried one from
> http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
> that good in converting to search replace?) I've decided to
> try with Rabin-Karp algorithm.
> This algorithm performs same operation as straightforward search,
> but compares hashes instead of chars.
> With ability to rotate hash (remove first, add next) characters
> there is also optimisation, that hash is calculated only for single
> next character rather again for whole substring.
> Unfortunatelly on my machine it is very cheap to compare
> characters so with my test hashing overweights character compare,
> except in your test when hash searching is faster then straightforward
> search.
>
> This is best I can write in terms of performance and readability.
> I've tried with getFst that returns Maybe but it was slower so I decided
> to return '\0' in case that argument is empty list, which renders '\0'
> unusable, but then I really doubt that 0 will be used in strings.
>
> -- Rabin-Karp string search algorithm, it is very effective in searching of
> set
> -- of patterns of length n on same string
> -- this program is for single pattern search, but can be crafted
> -- for multiple patterns of length m
>

I tuned it up somewhat:
import Data.List (isPrefixOf)
import Data.Char (ord)  -- using ord instead of fromEnum oddly makes it
-- faster for my test, but slower for yours, but only a whiff.

searchrep :: String -> String -> String -> String
searchrep "" _ str = str-- or cycle rp, or error?
searchrep sr rp xs = hSearchRep xs  -- don't carry more around than necessary
where
   len = length sr   -- we don't want that to be recomputed 
   hsrch = hash sr -- neither that
   hSearchRep  "" = ""
   hSearchRep xs
   | null remaining = passed
   | otherwise  = passed ++ rp ++ hSearchRep (drop len remaining)
 where
   (passed,remaining) = hSearch xs  -- ' xs (hash $ take len xs) ""
   hSearch xs = hSearch' xs hcmp "" -- since hSearch will be optimised
   where -- away anyway, we might
  hcmp = hash $ take len xs   -- as well eliminate it ourselves
   hSearch' "" _ got = (reverse got, "")
   hSearch' xxs@(x:xs) hcd got
   | hcd == hsrch && (sr `isPrefixOf` xxs) = (reverse got, xxs)
   | otherwise = searchAgain -- one test 
less
 where
   searchAgain = case drop len xxs of
[] -> (reverse got ++ xxs, "")   -- then we know we're done
(y:_)  -> hSearch' xs (hashRotate x y hcd) (x:got)
-- no need for fancy getFst anymore
-- making hashRotate local eliminates one argument, makes it faster
   hashRotate :: Char -> Char -> Int -> Int
   hashRotate cout cin hsh = 101*(hsh - 101^(len-1)*ord cout) + ord cin
-- using foldl for hash is an enormous boost
   hash :: String -> Int
   hash = foldl ((. ord) . (+) . (*101)) 0
-- hash str = foldl (\n c -> 101*n+ord c) 0 str
-- this is equally fast as the point-free version, easier to read, probably,
-- but I like an occasional pointless pointfreeness.

Now this beats everything but KMP on my test very clearly.
[EMAIL PROTECTED]:~/Documents/haskell/Allotria/Search> time myhash; time myhash2
Working: seasearch replace  able seaseaseasearch baker ssseasearch charlie
True
Done


real0m50.401s
user0m49.990s
sys 0m0.060s
Working very long
True
Done

real0m15.747s
user0m15.630s
sys 0m0.020s

Still poor on your test, though.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Daniel Fischer
Hi, Bane and all,

Am Dienstag, 13. Dezember 2005 14:22 schrieben Sie:
> > > In real world situation your KMP will always be fastest on average.
> > > I like that we are not using C arrays as then we have advantage
> > > of lazyness and save on memory usage. C++ program will be faster
> > > on shorter strings but on this large strings will loose due memory
> > > latency. and with your test, both programs are very fast.

Yesterday, I did the unspeakable -- I wrote a C-version. Smashes 
Haskell-performance for short enough Strings (factor 10 for my test, factor 
2.2 for Bane's), but once it starts swapping, we catch up, and for really 
large Strings I dare say we'd win far out.

I also managed to get my KMP still faster, using take and drop instead of 
splitAt helps a lot (Bane, the use of 'break' in my transcript of yours was 
what slowed it down, I reintroduced searchr''' and both are equal).
I'm not quite sure, whether that indeed helps, but I also chose to use 
listArray for the boxed array of borders.

Now it's
searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
where
  len = length src
  pat :: UArray Int Char
  pat = listArray (0,len-1) src
  bord = A.listArray (0,len) $ (-1):0:
   [getBord (pat!i) i + 1 | i <- [1 .. len-1]]
  getBord s n
 | m < 0  = m
 | s == pat!m = m
 | otherwise  = getBord s m
   where
 m = bord A.! n
  bor :: UArray Int Int
  bor = listArray (0,len) $ A.elems bord
  getBor s n
 | m < 0 || s == pat!m = m
 | otherwise = getBor s m
   where
 m = bor!n
  process n str _ | n >= len = dst ++ process 0 str ""
  process _ "" mat = reverse mat
  process 0 (s:st) _
 | s == c= process 1 st [s]
 | otherwise = s:process 0 st ""
  process n str@(s:st) mat
 | s == pat!n = process (n+1) st (s:mat)
 | otherwise  =
case getBor s n of
  -1 -> reverse mat ++ process 0 str ""
  0  -> reverse mat ++ process 1 st [s]
  j  -> reverse (drop j mat) ++ process (j+1) st (s:take j mat)


gives a speedup of roughly 10% on my box versus yesterday's version.
> > >
> > > Greetings, Bane.
> >
> >On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about
> >20%
> >faster than my KMP on your test -- btw, I unboxed the pat array, which
> > gave a
> >bit of extra speed, but not much.
>
> I think that's because on your machine Bulat's version have better
> perfromance
> with CPU cache.
> I don;t know but now your version is 25% faster with my test on P4
> hyperthreaded.

E, what's 'hyperthreaded' ? Unfortunately, I'm completely useless with 
computers.

>
> your new version:
> $ time srchrep.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real0m8.734s
> user0m0.015s
> sys 0m0.000s
>
> Bulat's version:
>
> [EMAIL PROTECTED] ~/tutorial
> $ time replace1.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real0m11.734s
> user0m0.015s
> sys 0m0.015s
>
> 3 secs difference now.
>
> >And apologies to Sebastian Sylvan, I also included an unboxed version of
> >bord,
> >built from the boxed version, and that sped things further up -- not much,
> >again, but there it is.
>
> On my machine you got another 10-15% of boost with unboxed arrays.
>
> >I wonder about this difference, -10% on one system and +20% on another
> >system,
> >ist that normal?
>
> Different caching schemes on CPU's perhaps? different memory latencies?
> hyperthreading helps your version? more code and data, perhaps because
> of that it pays the price on your machine?
>
> Greetings, Bane.
>
Well, whatever. Upto now, on my box, Bulat's is still the fastest for your 
test -- though I've narrowed the gap quite a bit.

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-14 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 13 Dec 2005 11:23:29 +0100



After seeing that your program is fastest (I've also tried one from
http://haskell.org/hawiki/RunTimeCompilation but perhaps I'm not
that good in converting to search replace?) I've decided to
try with Rabin-Karp algorithm.
This algorithm performs same operation as straightforward search,
but compares hashes instead of chars.
With ability to rotate hash (remove first, add next) characters
there is also optimisation, that hash is calculated only for single
next character rather again for whole substring.
Unfortunatelly on my machine it is very cheap to compare
characters so with my test hashing overweights character compare,
except in your test when hash searching is faster then straightforward
search.

This is best I can write in terms of performance and readability.
I've tried with getFst that returns Maybe but it was slower so I decided
to return '\0' in case that argument is empty list, which renders '\0'
unusable, but then I really doubt that 0 will be used in strings.

-- Rabin-Karp string search algorithm, it is very effective in searching of 
set

-- of patterns of length n on same string
-- this program is for single pattern search, but can be crafted
-- for multiple patterns of length m

hSearchReplace :: String -> String -> String -> String
hSearchReplace sr rp xs
   | not (null remaining) = found ++ rp
++ hSearchReplace sr rp (drop (length sr) 
remaining)

   | otherwise = found
   where
   (found,remaining) = hSearch sr xs

hSearch :: String -> String -> (String,String)
hSearch sr xs = hSearch' sr xs hcmp ""
   where
   hsrch = hash sr
   hcmp = hash $ take ls xs
   cmp = take ls xs
   ls = length sr

   hSearch' [] xs _ _= (xs,[])
   hSearch' sr [] _ fndFail = (reverse fndFail,[])
   hSearch' srch xxs@(x:xs) hcmps fndFail
   = if hsrch == hcmps
then if isPrefixOf srch xxs
then (reverse fndFail,xxs)
else searchAgain
else searchAgain
   where
   searchAgain
= hSearch' srch xs
   (hashRotate (getFst xxs) (getFst nextxxs) (ls-1) 
hcmps)

   (x:fndFail)
   nextxxs = drop ls xxs

getFst :: String -> Char
getFst [] = '\0'
getFst (a:as) = a

hash :: String -> Int
hash str
   =  hash' str (length str - 1)
   where
   hash' :: String -> Int -> Int
   hash' [] _ = 0
   hash' (s:str) pow  = (101 ^ pow) *(fromEnum s)
+ hash' str (pow-1)

hashRotate :: Char -> Char -> Int -> Int -> Int
hashRotate cout cin pow hsh
   = (hsh - ((101 ^ pow) * (fromEnum cout)))*101
 + (fromEnum cin)



Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-13 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Tue, 13 Dec 2005 11:23:29 +0100

Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
>
> >To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 16:15:46 +0100
> >
> >Earlier today:
> > > Sorry, but
> > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > "abababaaba"
> > >
> > > I haven't analyzed the algorithm, so I don't know why exactly this
> >
> >fails.
> >
> > > I'll take a look sometime soon.
> >
> >I found the problem (one at least).
> >Say the pattern to be replaced begins with 'a' and we have a 
sufficiently

> >long
> >match with the pattern starting at the first 'a' in the String. Upon
> >encountering the second 'a', while the first pattern still matches, you
> >start
> >pushing onto the rollback-stack. But that isn't inspected anymore, so 
if

> >the
> >actual occurence of the pattern starts at the third (or fourth, n-th)
> >occurence of 'a' and that is already pushed onto the rollback, you miss
> > it.
>
> I've corrected this with adjusting rollback position. if rollBack is 
null

> then
> search for rollback starts at second character if not starts at same as
> searhed
> character because I skip what was searched. That's all.
> Though I'm not so sure now when I read this.
>
Still not working:

*New> searchReplace "abababc" "#" "ababababababc"
"ababababababc"
*New> searchReplace1 "abababc" "#" "ababababababc"
"ababababababc"



Yes, perhaps you've missed another post of mine. I've noticed
that problem when pattern repeats more then 2 times and gave up
because now whatever I do, your version is always fastest.



> >So the question is, can we find a cheap test to decide whether to use 
KMP

> >or
> >Bulat's version?


Just interleave string with  search hits with one with no seacrh (that means 
partial too)

hits, and your version will gain in speed.
More partial matches and full search matches Bulat's version will gain in
speed.
Longer search strings, your version will have gains.


>
> In real world situation your KMP will always be fastest on average.
> I like that we are not using C arrays as then we have advantage
> of lazyness and save on memory usage. C++ program will be faster
> on shorter strings but on this large strings will loose due memory
> latency. and with your test, both programs are very fast.
>
> Greetings, Bane.
>

On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 
20%
faster than my KMP on your test -- btw, I unboxed the pat array, which gave 
a

bit of extra speed, but not much.


I think that's because on your machine Bulat's version have better 
perfromance

with CPU cache.
I don;t know but now your version is 25% faster with my test on P4
hyperthreaded.

your new version:
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m8.734s
user0m0.015s
sys 0m0.000s

Bulat's version:

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.734s
user0m0.015s
sys 0m0.015s

3 secs difference now.

And apologies to Sebastian Sylvan, I also included an unboxed version of 
bord,

built from the boxed version, and that sped things further up -- not much,
again, but there it is.


On my machine you got another 10-15% of boost with unboxed arrays.

I wonder about this difference, -10% on one system and +20% on another 
system,

ist that normal?


Different caching schemes on CPU's perhaps? different memory latencies?
hyperthreading helps your version? more code and data, perhaps because
of that it pays the price on your machine?

Greetings, Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-13 Thread Daniel Fischer
Am Montag, 12. Dezember 2005 16:28 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
>
> >To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 16:15:46 +0100
> >
> >Earlier today:
> > > Sorry, but
> > > Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> > > "abababaaba"
> > >
> > > I haven't analyzed the algorithm, so I don't know why exactly this
> >
> >fails.
> >
> > > I'll take a look sometime soon.
> >
> >I found the problem (one at least).
> >Say the pattern to be replaced begins with 'a' and we have a sufficiently
> >long
> >match with the pattern starting at the first 'a' in the String. Upon
> >encountering the second 'a', while the first pattern still matches, you
> >start
> >pushing onto the rollback-stack. But that isn't inspected anymore, so if
> >the
> >actual occurence of the pattern starts at the third (or fourth, n-th)
> >occurence of 'a' and that is already pushed onto the rollback, you miss
> > it.
>
> I've corrected this with adjusting rollback position. if rollBack is null
> then
> search for rollback starts at second character if not starts at same as
> searhed
> character because I skip what was searched. That's all.
> Though I'm not so sure now when I read this.
>
Still not working:

*New> searchReplace "abababc" "#" "ababababababc"
"ababababababc"
*New> searchReplace1 "abababc" "#" "ababababababc"
"ababababababc"


> >So the question is, can we find a cheap test to decide whether to use KMP
> >or
> >Bulat's version?
>
> In real world situation your KMP will always be fastest on average.
> I like that we are not using C arrays as then we have advantage
> of lazyness and save on memory usage. C++ program will be faster
> on shorter strings but on this large strings will loose due memory
> latency. and with your test, both programs are very fast.
>
> Greetings, Bane.
>

On my 256MB RAM AMD Duron 1200 MHz, Bulat's version is consistently about 20% 
faster than my KMP on your test -- btw, I unboxed the pat array, which gave a 
bit of extra speed, but not much. 
And apologies to Sebastian Sylvan, I also included an unboxed version of bord, 
built from the boxed version, and that sped things further up -- not much, 
again, but there it is.
I wonder about this difference, -10% on one system and +20% on another system, 
ist that normal?

Cheers, Daniel
--
Up-To-Date version of KMP:

import Data.Array.Unboxed (UArray, listArray, (!))
import qualified Data.Array as A (array, (!), elems)

searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
where
  len = {-# scc "len" #-} length src
  pat :: UArray Int Char
  pat = {-# scc "pat" #-} listArray (0,len-1) src
  bord ={-# scc "bord" #-} A.array (0,len) $ (0,-1):(1,0):
 [(i+1,getBord (pat!i) i + 1) | i <- [1 .. len-1]]
  getBord s n
 | m < 0  = m
 | s == pat!m = m
 | otherwise  = getBord s m
   where
 m = bord A.! n
  bor :: UArray Int Int
  bor = listArray (0,len) $ A.elems bord
  getBor s n
 | m < 0 || s == pat!m = m
 | otherwise = getBor s m
   where
 m = bor!n
  process n str _ | n >= len = {-# scc "process" #-} dst ++ process 0 str 
""
  process _ "" mat = {-# scc "process" #-} reverse mat
  process 0 (s:st) _
 | s == c= {-# scc "process" #-} process 1 st [s]
 | otherwise = {-# scc "process" #-} s:process 0 st ""
  process n str@(s:st) mat
 | s == pat!n = {-# scc "process" #-} process (n+1) st (s:mat)
 | otherwise  = {-# scc "process" #-}
let j = getBor s n
(ret,skip) = splitAt j mat
in if j < 0 then reverse mat ++ process 0 str ""
   else reverse skip ++ process (j+1) st (s:ret)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 16:15:46 +0100

Earlier today:
> Sorry, but
> Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> "abababaaba"
>
> I haven't analyzed the algorithm, so I don't know why exactly this 
fails.

> I'll take a look sometime soon.
>

I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently 
long

match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you 
start
pushing onto the rollback-stack. But that isn't inspected anymore, so if 
the

actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss it.

let src = concat (replicate n "abc") ++ "d"
let str = concat (replicate (n+k) "abc") ++ "d"
then
searchReplace src "Success!" str
will work correctly iff k is congruent to 0 or 1 modulo (n+1).



Oh, yes this seems the problem for searchr :(
I have to look for efficient way in order to circumvent repeated searches.
But since your KMP is fastest of all now, I am considering if there
is any point now to correct this.

And searchr ugly version that I've posted has a bug (not present in MyBane 
pretty

version) .
should be :
else if sr'/=x

Greetings, Bane.

_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 16:15:46 +0100

Earlier today:
> Sorry, but
> Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> "abababaaba"
>
> I haven't analyzed the algorithm, so I don't know why exactly this 
fails.

> I'll take a look sometime soon.
>

I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently 
long

match with the pattern starting at the first 'a' in the String. Upon
encountering the second 'a', while the first pattern still matches, you 
start
pushing onto the rollback-stack. But that isn't inspected anymore, so if 
the

actual occurence of the pattern starts at the third (or fourth, n-th)
occurence of 'a' and that is already pushed onto the rollback, you miss it.


I've corrected this with adjusting rollback position. if rollBack is null 
then
search for rollback starts at second character if not starts at same as 
searhed

character because I skip what was searched. That's all.
Though I'm not so sure now when I read this.



So the question is, can we find a cheap test to decide whether to use KMP 
or

Bulat's version?


In real world situation your KMP will always be fastest on average.
I like that we are not using C arrays as then we have advantage
of lazyness and save on memory usage. C++ program will be faster
on shorter strings but on this large strings will loose due memory
latency. and with your test, both programs are very fast.

Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 13:07:29 +0100

Sorry, but
Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
"abababaaba"

I haven't analyzed the algorithm, so I don't know why exactly this fails.
I'll take a look sometime soon.


It failed because I didn;t adjusted search string for rollBack when previous 
rollBack is not null.

this is corrected version: (with your changes it looks much better)
---
searchReplace :: String->String->String -> String
searchReplace "" _ xs  = xs
searchReplace sr rp xs = searchr sr rp xs "" ""
  where
searchr :: String->String->String->String->String -> String
searchr _ _ "" _ _ = ""
searchr sr rp xs retB rollB
   | found = rp ++ searchr sr rp rema ret roll
| otherwise = reverse (proc ++ rollB) ++
  searchr sr rp rema ret roll
   where
 (found, proc, rema, ret, roll)
= searchr' sr sr (reverse retB ++ xs) "" rollB

searchr' src@(s:sr) src'@(s':sr') xs soFar rollB
   = searchr'' (drop (length rollB) src) src' xs soFar (not (null 
rollB),"","") s


searchr'' "" _ xs fnd _ _ = (True,fnd,xs,"","")
searchr'' _ _ "" fnd (_,ret,roll) _ = (False,ret++roll++fnd,"","","")
searchr'' src@(s:sr) src'@(s':sr') xxs@(x:xs) soFar (cnt,ret,roll) c
   | s == x = if s' == x && null ret && cnt
  then searchr'' sr sr' xs soFar (True, "", x:roll) c
  else
if null ret && null roll
   then searchr'' sr src' xs (x:soFar) (True, "", "") c
   else searchr'' sr src' xs soFar (True, x:roll++ret, 
"") c
| otherwise = if null roll && null ret
 then
if c == x
  then (False, soFar, xxs, "", "")
  else let (from, pre) = break (==c) xs
   in (False, reverse from ++ x:soFar, pre, "", 
"")
  else
if s'/=x
 then if null ret
  then (False, (x:roll) ++ soFar, xs,"","")
  else (False, soFar, xxs,ret,"")
 else if null ret
   then (False, soFar, xs, "", x:roll)
   else (False, soFar, xxs, ret, "")


However it is significantly slower then previous ugly version:

searchReplace :: String->String->String -> String
searchReplace sr rp xs = searchr sr rp xs "" ""
  where
   searchr :: String->String->String->String->String -> String
   searchr [] _ xs _ _ =  xs
   searchr _ _ [] _ _  = []
   searchr sr rp xs retBack rollBack
| isFound $ fnd rollBack = rp
 ++ searchr sr rp (remaining $ fnd 
rollBack )
  ( getRetBack $ fnd 
rollBack)
  ( getRollBack $ fnd 
rollBack)
| otherwise = reverse ((processed $ fnd rollBack) ++ 
rollBack)

  ++ searchr sr rp (remaining $ fnd rollBack)
   ( getRetBack $ fnd rollBack)
   ( getRollBack $ fnd 
rollBack)

   where fnd  = searchr' sr sr (reverse retBack ++ xs) ""

   isFound = fst . fst
   remaining = snd . snd . fst
   getRollBack = snd . snd
   getRetBack = fst . snd
   processed = fst . snd . fst

   searchr' :: String->String->String->String->String
   -> ((Bool,(String,String)),(String,String))
   searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
  searchr'' (drop (length rollBack) srch) srch' xs 
fndSoFar

(not (isEmpty rollBack),"","") sr

   searchr'' :: String->String->String->String->(Bool,String,String)->Char
-> ((Bool,

Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Daniel Fischer
Earlier today:
> Sorry, but
> Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
> "abababaaba"
>
> I haven't analyzed the algorithm, so I don't know why exactly this fails.
> I'll take a look sometime soon.
>

I found the problem (one at least).
Say the pattern to be replaced begins with 'a' and we have a sufficiently long 
match with the pattern starting at the first 'a' in the String. Upon 
encountering the second 'a', while the first pattern still matches, you start 
pushing onto the rollback-stack. But that isn't inspected anymore, so if the 
actual occurence of the pattern starts at the third (or fourth, n-th) 
occurence of 'a' and that is already pushed onto the rollback, you miss it.

let src = concat (replicate n "abc") ++ "d"
let str = concat (replicate (n+k) "abc") ++ "d"
then
searchReplace src "Success!" str
will work correctly iff k is congruent to 0 or 1 modulo (n+1).

Now to fix it, I see two possibilities
1. re-inspect the rollback, which brings you basically to my srchrep
2. introduce a hierarchy of rollback-stacks - but that would be rather 
horrible, I think, because you must keep count on which stack you have to 
push, how many matching patterns you currently have (and which ones they are) 
...

So the question is, can we find a cheap test to decide whether to use KMP or 
Bulat's version?

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: Haskell-Cafe@haskell.org
Subject: Re: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 10:31:49 +0100

Am Montag, 12. Dezember 2005 01:34 schrieben Sie:
> On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote:
> > Okay, I have looked up KMP and implemented it.
> > Seems to work -- my first use of QuickCheck, too.
> > It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> > but really fast for my test.
> > Undoubtedly, one can still tune it.
>
> Perhaps by using unboxed arrays...
>
> /S
>
>
> --
> Sebastian Sylvan
> +46(0)736-818655
> UIN: 44640862

I'm afraid, unboxed arrays are out of the question, because bord is
incrementally produced :-(


Working very long
test2: <>



No worrie your test is now fastest with both your and mine test.
I;ve forgot to change working function in your test:0)

mine test: your program is  is srchrep.exe
[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m14.344s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.672s > your program is almost 1.5 secs faster then Bulat's
user0m0.015s
sys 0m0.000s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m12.016s
user0m0.015s
sys 0m0.015s


now your test:

[EMAIL PROTECTED] ~/tutorial
$ time searchr.exe
Working very long
True
Done

real0m0.312s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working very long
False
Done

real0m12.516s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real0m0.375s > yours is less then second as mine but is fastest in both 
tests

user0m0.015s
sys 0m0.015s

I don;t know how you get lesser numbers with mine test, but on
this machine your KMP algorithm performs best.


Greetings ,Bane.

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Daniel Fischer
Sorry, but
Prelude SearchRep> searchReplace "abaaba" "##" "abababaaba"
"abababaaba"

I haven't analyzed the algorithm, so I don't know why exactly this fails.
I'll take a look sometime soon.

As for times: a complete stat is attached, all compiled with -O2, as well as 
the modified KMP-version and my transcript of Branimir's new version.
On my computer, KMP smashes everything else on my test (except Branimir's, 
only that doesn't yet work correctly), while Bulat's definitely is faster 
than anything for Branimir's test and faster than anything but KMP for mine.

Branimir, isEmpty is the Prelude function 'null', so you needn't define it 
yourself.

Cheers,
Daniel

Am Montag, 12. Dezember 2005 05:20 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
>
> >To: Haskell-Cafe@haskell.org
> >Subject: [Haskell-cafe] Substring replacements
> >Date: Mon, 12 Dec 2005 01:14:37 +0100
> >
> >Okay, I have looked up KMP and implemented it.
> >Seems to work -- my first use of QuickCheck, too.
> >It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> >but really fast for my test.
>
> Strange I got completelly different results:
>
> [EMAIL PROTECTED] ~/tutorial
> $ time srchrep.exe
> Working very long
> True
> Done
>
> real0m16.407s
> user0m0.015s
> sys 0m0.015s
>
> [EMAIL PROTECTED] ~/tutorial
> $ ghc -fglasgow-exts  -O2 srchrep.hs --make -o srchrep.exe
> Chasing modules from: srchrep.hs
> Compiling Main ( srchrep.hs, srchrep.o )
> Linking ...
>
> [EMAIL PROTECTED] ~/tutorial
> $ time srchrep.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real0m10.156s
> user0m0.015s
> sys 0m0.015s
>
> [EMAIL PROTECTED] ~/tutorial
> $ time replace1.exe
> Working:seasearch replace  able seaseasearch baker seasearch charlie
> True
> Done
>
>
> real0m11.672s
> user0m0.015s
> sys 0m0.015s
>
> Now your version is fastest according to my machine, but it is not faster
> with your test it's slower in compariton to replace1.
>
> I've corrected my code so it is fastest with your test,still less then a
> second,
> but slowest with mine.
> Checked with your fail tests and compared results of these 2 tests.
> Now should be ok.
> I maintan now two lists one for successes and other for failures.
> I also prettified code a bit .
>
> searchReplace :: String->String->String -> String
> searchReplace sr rp xs = searchr sr rp xs "" ""
>where
> searchr :: String->String->String->String->String -> String
> searchr [] _ xs _ _ =  xs
> searchr _ _ [] _ _  = []
> searchr sr rp xs retBack rollBack
>
>  | isFound $ fnd rollBack = rp
>
>   ++ searchr sr rp (remaining $ fnd
> rollBack )
>( getRetBack $ fnd
> rollBack)
>( getRollBack $ fnd
> rollBack)
>
>  | otherwise = reverse ((processed $ fnd rollBack) ++
>
> rollBack)
>++ searchr sr rp (remaining $ fnd rollBack)
> ( getRetBack $ fnd
> rollBack) ( getRollBack $ fnd rollBack)
> where fnd  = searchr' sr sr (reverse retBack ++ xs) ""
>
> isFound = fst . fst
> remaining = snd . snd . fst
> getRollBack = snd . snd
> getRetBack = fst . snd
> processed = fst . snd . fst
>
> searchr' :: String->String->String->String->String
> -> ((Bool,(String,String)),(String,String))
> searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
>searchr'' (drop (length rollBack) srch) srch' xs
> fndSoFar
>  (False,"","") sr
>
> searchr'' :: String->String->String->String->(Bool,String,String)->Char
>  -> ((Bool,(String,String)),(String,String))
> searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),("",""))
> searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++
> rollBack ++ fnd,[])),("",""))
> searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar
> (cnt,retBack,rollBack) s
>
>   | sr == x = if cnt && sr' == x && isEmpty retBack
>
> 

Re: [Haskell-cafe] Substring replacements

2005-12-12 Thread Daniel Fischer
Am Montag, 12. Dezember 2005 01:34 schrieben Sie:
> On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote:
> > Okay, I have looked up KMP and implemented it.
> > Seems to work -- my first use of QuickCheck, too.
> > It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> > but really fast for my test.
> > Undoubtedly, one can still tune it.
>
> Perhaps by using unboxed arrays...
>
> /S
>
>
> --
> Sebastian Sylvan
> +46(0)736-818655
> UIN: 44640862

I'm afraid, unboxed arrays are out of the question, because bord is 
incrementally produced :-(


Working very long
test2: <>


Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] Substring replacements

2005-12-11 Thread Branimir Maksimovic





From: Daniel Fischer <[EMAIL PROTECTED]>
To: Haskell-Cafe@haskell.org
Subject: [Haskell-cafe] Substring replacements
Date: Mon, 12 Dec 2005 01:14:37 +0100

Okay, I have looked up KMP and implemented it.
Seems to work -- my first use of QuickCheck, too.
It's slower than Bulat's and Tomasz' for Branimir's test :-(,
but really fast for my test.


Strange I got completelly different results:

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working very long
True
Done

real0m16.407s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ ghc -fglasgow-exts  -O2 srchrep.hs --make -o srchrep.exe
Chasing modules from: srchrep.hs
Compiling Main ( srchrep.hs, srchrep.o )
Linking ...

[EMAIL PROTECTED] ~/tutorial
$ time srchrep.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m10.156s
user0m0.015s
sys 0m0.015s

[EMAIL PROTECTED] ~/tutorial
$ time replace1.exe
Working:seasearch replace  able seaseasearch baker seasearch charlie
True
Done


real0m11.672s
user0m0.015s
sys 0m0.015s

Now your version is fastest according to my machine, but it is not faster
with your test it's slower in compariton to replace1.

I've corrected my code so it is fastest with your test,still less then a 
second,

but slowest with mine.
Checked with your fail tests and compared results of these 2 tests.
Now should be ok.
I maintan now two lists one for successes and other for failures.
I also prettified code a bit .

searchReplace :: String->String->String -> String
searchReplace sr rp xs = searchr sr rp xs "" ""
  where
   searchr :: String->String->String->String->String -> String
   searchr [] _ xs _ _ =  xs
   searchr _ _ [] _ _  = []
   searchr sr rp xs retBack rollBack
| isFound $ fnd rollBack = rp
 ++ searchr sr rp (remaining $ fnd 
rollBack )
  ( getRetBack $ fnd 
rollBack)
  ( getRollBack $ fnd 
rollBack)
| otherwise = reverse ((processed $ fnd rollBack) ++ 
rollBack)

  ++ searchr sr rp (remaining $ fnd rollBack)
   ( getRetBack $ fnd rollBack)
   ( getRollBack $ fnd 
rollBack)

   where fnd  = searchr' sr sr (reverse retBack ++ xs) ""

   isFound = fst . fst
   remaining = snd . snd . fst
   getRollBack = snd . snd
   getRetBack = fst . snd
   processed = fst . snd . fst

   searchr' :: String->String->String->String->String
   -> ((Bool,(String,String)),(String,String))
   searchr' srch@(sr:srs) srch'@(sr':srs') xs fndSoFar rollBack =
  searchr'' (drop (length rollBack) srch) srch' xs 
fndSoFar

(False,"","") sr

   searchr'' :: String->String->String->String->(Bool,String,String)->Char
-> ((Bool,(String,String)),(String,String))
   searchr'' [] _ xs fnd _ _  = ((True,(fnd,xs)),("",""))
   searchr'' _ _ [] fnd (_,retBack,rollBack) _ = ((False,(retBack ++ 
rollBack ++ fnd,[])),("",""))
   searchr'' srch@(sr:srs) srch'@(sr':srs') xxs@(x:xs) fndSoFar 
(cnt,retBack,rollBack) s

 | sr == x = if cnt && sr' == x && isEmpty retBack
then searchr'' srs srs' xs fndSoFar 
(True,"",x:rollBack) s
else if not (isEmpty retBack)  || not (isEmpty 
rollBack)

then searchr'' srs srch' xs fndSoFar
(True,(x:rollBack) ++ 
retBack,"") s
else searchr'' srs srch' xs (x:fndSoFar) 
(True,"","") s

 | otherwise = if isEmpty rollBack && isEmpty retBack
  then if s == x
  then ((False,(fndSoFar,xxs)),("",""))
  else ((False,searchr''' s xs 
(x:fndSoFar)),("",""))

  else if sr' == x && isEmpty retBack
  then ((False,(fndSoFar, xs)), 
(retBack,x:rollBack))
  else ((False,(fndSoFar, xxs)), 
(retBack,rollBack))



   searchr''' :: Char->String->String -> (String,String)
   searchr''' sr [] fndSoFar = (fndSoFar,[])
   searchr''' sr xxs@(x:xs) fndSoFar | sr/=x = searchr''' sr xs 
(x:fndSoFar)

| otherwise = (fndSoFar,xxs)
   isEmpty [] = True
   isEmpty (a:as) = False
---



Greetings, Bane.

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Substring replacements

2005-12-11 Thread Sebastian Sylvan
On 12/12/05, Daniel Fischer <[EMAIL PROTECTED]> wrote:
> Okay, I have looked up KMP and implemented it.
> Seems to work -- my first use of QuickCheck, too.
> It's slower than Bulat's and Tomasz' for Branimir's test :-(,
> but really fast for my test.
> Undoubtedly, one can still tune it.

Perhaps by using unboxed arrays...

/S


--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Substring replacements

2005-12-11 Thread Daniel Fischer
Okay, I have looked up KMP and implemented it.
Seems to work -- my first use of QuickCheck, too.
It's slower than Bulat's and Tomasz' for Branimir's test :-(,
but really fast for my test.
Undoubtedly, one can still tune it.
Here's the code:

module KMP where

import Data.Array

searchReplace :: String -> String -> String -> String
searchReplace "" _ str = str
searchReplace src@(c:cs) dst str = process 0 str ""
where
  len = length src
  pat = listArray (0,len-1) src
  bord = array (0,len) $
(0,-1):(1,0):[(i+1,boun i (bord!i)) | i <- [1 .. len-1]]
  boun i j
 | j < 0  = 0
 | pat!i == pat!j = j+1
 | otherwise  = boun i (bord!j)
  getBord s n
 | m < 1  = m
 | s == pat!m = m
 | otherwise  = getBord s m
   where
 m = bord!n
  process n str _ | n >= len = dst ++ process 0 str ""
  process _ "" mat = reverse mat
  process 0 (s:st) _
 | s == c= process 1 st [s]
 | otherwise = s:process 0 st ""
  process n str@(s:st) mat
 | s == pat!n = process (n+1) st (s:mat)
 | otherwise  = let j = getBord s n
(ret,skip) = splitAt j mat
in if j < 0 then reverse mat ++ process 0 str ""
   else reverse skip ++ process j str ret

Cheers,
Daniel
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Substring replacements (was: Differences in optimisiation ...)

2005-12-10 Thread Daniel Fischer
Am Samstag, 10. Dezember 2005 02:51 schrieben Sie:
> From: Daniel Fischer <[EMAIL PROTECTED]>
>
> >To: "Branimir Maksimovic" <[EMAIL PROTECTED]>
> >CC: Haskell-Cafe@haskell.org
> >Subject: Re: Differences in optimisiation with interactive and compiled mo
> >Date: Fri, 9 Dec 2005 23:27:00 +0100
> >
> >Still doesn't work, though:
> >
> >*Main> searchr "hahal" "jupp" "hahahalala"
> >"hahahalala"
> >
> >The problem is that the string to replace may contain a repeated pattern
> >and the pattern that begins the actual occurence might be consumed before
> > a failure is detected.
>
> Yes, I've corrected it. Now it is just 25% faster and that is only with -O2
> flag.
> Here is whole thing, I hope there are no more bugs left :) :
>
None that sprang to my eyes. However, on my machine, yours is not faster than 
Lemmih's.
Now, using the new Strings, I get the following times:
 -O2   -O1   no opt
Lemmih's: 38.9 sec38.9 sec76.7 sec
Yours : 40.1 sec 41.5 sec  131.1 sec
Mine   : 32.9 sec 33.1 sec82.8 sec.

However, there's a problem with Lemmih's replace:

*Main> searchr "ababcab" "###" "ababcababcabab"
"###abcab"
*Main> replace "ababcab" "###" "ababcababcabab"
"ababc###ab"

due to the fact that Lemmih's version scans the input from right to left
(that's easily changed by a few reverses, though -- but costly for long 
inputs), more serious is

Prelude Main> replace "ja" "aja" "jjja"
"ajajajajajajaja".


The fastest -- and nicely simple above -- that I could come up with is

replace :: String -> String -> String -> String
replace _ _ "" = ""
replace "" _ str = str
replace src dst inp
= process inp
  where
n = length src
process "" = ""
process st@(c:cs)
  | src `isPrefixOf` st = dst ++ process (drop n st)
  | otherwise   = c:process cs

It's roughly 10% faster than my other version on "seasearch" ...
and if you try it on

main2 :: IO ()
main2 = let src = replicate 1000 'r'
dst = " # "
str = replicate 999 'r' ++ 'c': replicate 1000 'r'
out = replace src dst $ concat $ replicate 500 str
out1 = replace src dst $ concat $ replicate 501 str
in do putStrLn $ "Working very long"
  putStrLn $ show (out == out1) ++ "\nDone"

you'll see a real difference. I'm not sure, why your algorithm pays a so much 
higher penalty, though. Maybe, it'll be faster if you make searchr' &c local 
functions? I'll try.

Cheers,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe