Re: More suggestions for find()

2016-06-21 Thread Wyatt via Digitalmars-d

On Monday, 20 June 2016 at 16:09:21 UTC, qznc wrote:

We cannot port it directly since it is GPL code.


Would it work to port from Musl instead?

-Wyatt


Re: More suggestions for find()

2016-06-21 Thread Guillaume Chatelet via Digitalmars-d

On Monday, 20 June 2016 at 16:09:21 UTC, qznc wrote:

On Monday, 20 June 2016 at 13:27:59 UTC, Jack Stouffer wrote:

On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:

On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei 
Alexandrescu wrote:

[...]


Compare with memmem. That is 4x faster than the current 
stuff. I guess vector instructions are key. There is a 
branch in my repo.


More like 2x after looking again


Cool :)

What are the chances of getting this in Phobos?


Low.

It requires the GNU libc to link against. We don't want that 
dependency.


We cannot port it directly since it is GPL code.

It is even more of a special case, since it only works for the 
== predicate.


I'm not sure about the vector instructions it requires.

What we need is a clean room implementation of the two way 
string matching algorithm with vector instructions?


Maybe there's some inspiration to get from the source code:
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/memmem.c;hb=HEAD
https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/str-two-way.h;hb=HEAD


Re: More suggestions for find()

2016-06-20 Thread qznc via Digitalmars-d

On Monday, 20 June 2016 at 13:27:59 UTC, Jack Stouffer wrote:

On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:

On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei 
Alexandrescu wrote:
Got this link from the reddit discussion around the blog 
article: http://effbot.org/zone/stringlib.htm. The 
Bloom-filter-style trick looks quite cool. Anyone interested 
in running some more experiments? Thx! -- Andrei


Compare with memmem. That is 4x faster than the current 
stuff. I guess vector instructions are key. There is a branch 
in my repo.


More like 2x after looking again


Cool :)

What are the chances of getting this in Phobos?


Low.

It requires the GNU libc to link against. We don't want that 
dependency.


We cannot port it directly since it is GPL code.

It is even more of a special case, since it only works for the == 
predicate.


I'm not sure about the vector instructions it requires.

What we need is a clean room implementation of the two way string 
matching algorithm with vector instructions?


Re: More suggestions for find()

2016-06-20 Thread Andrei Alexandrescu via Digitalmars-d

On 6/20/16 8:34 AM, qznc wrote:

On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:

On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu wrote:

Got this link from the reddit discussion around the blog article:
http://effbot.org/zone/stringlib.htm. The Bloom-filter-style trick
looks quite cool. Anyone interested in running some more experiments?
Thx! -- Andrei


Compare with memmem. That is 4x faster than the current stuff. I guess
vector instructions are key. There is a branch in my repo.



[snip]

Super cool! One thought - the current algorithm
computes a "skip". In the case of Alice in Wonderland, that skip is 
relatively small, I recall like 4. If the skip is large enough, the 
current algorithm is faster than memmem. So it would be best to compute 
the skip and then switch to memmem if the skip falls below a cutoff. -- 
Andrei


Re: More suggestions for find()

2016-06-20 Thread Jack Stouffer via Digitalmars-d

On Monday, 20 June 2016 at 12:34:55 UTC, qznc wrote:

On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
wrote:
Got this link from the reddit discussion around the blog 
article: http://effbot.org/zone/stringlib.htm. The 
Bloom-filter-style trick looks quite cool. Anyone interested 
in running some more experiments? Thx! -- Andrei


Compare with memmem. That is 4x faster than the current stuff. 
I guess vector instructions are key. There is a branch in my 
repo.


More like 2x after looking again


Cool :)

What are the chances of getting this in Phobos?



Re: More suggestions for find()

2016-06-20 Thread qznc via Digitalmars-d

On Sunday, 19 June 2016 at 10:38:27 UTC, qznc wrote:
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
wrote:
Got this link from the reddit discussion around the blog 
article: http://effbot.org/zone/stringlib.htm. The 
Bloom-filter-style trick looks quite cool. Anyone interested 
in running some more experiments? Thx! -- Andrei


Compare with memmem. That is 4x faster than the current stuff. 
I guess vector instructions are key. There is a branch in my 
repo.


More like 2x after looking again. For some more concrete numbers:

LDC:
Search in Alice in Wonderland
   std: 438 ±12
manual: 315 ±8
  A2Phobos: 254 ±7
 Chris: 325 ±9
Andrei: 434 ±11
   Andrei2: 259 ±6
memmem: 100 ±0
Search in random short strings
   std: 127 ±14
manual: 124 ±12
  A2Phobos: 107 ±8
 Chris: 130 ±15
Andrei: 110 ±7
   Andrei2: 106 ±7
memmem: 108 ±8
Mismatch in random long strings
   std: 534 ±432
manual: 587 ±428
  A2Phobos: 353 ±274
 Chris: 605 ±448
Andrei: 550 ±399
   Andrei2: 360 ±280
memmem: 136 ±50
Search random haystack with random needle
   std: 293 ±211
manual: 265 ±141
  A2Phobos: 212 ±163
 Chris: 271 ±133
Andrei: 285 ±212
   Andrei2: 215 ±164
memmem: 118 ±20
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

DMD:
Search in Alice in Wonderland
   std: 554 ±17
manual: 382 ±11
  A2Phobos: 346 ±10
 Chris: 501 ±18
Andrei: 498 ±14
   Andrei2: 342 ±9
memmem: 100 ±0
Search in random short strings
   std: 153 ±31
manual: 131 ±19
  A2Phobos: 108 ±9
 Chris: 146 ±29
Andrei: 109 ±10
   Andrei2: 106 ±7
memmem: 108 ±8
Mismatch in random long strings
   std: 674 ±542
manual: 651 ±469
  A2Phobos: 445 ±362
 Chris: 821 ±604
Andrei: 618 ±459
   Andrei2: 436 ±354
memmem: 134 ±50
Search random haystack with random needle
   std: 379 ±307
manual: 284 ±175
  A2Phobos: 233 ±177
 Chris: 370 ±245
Andrei: 299 ±238
   Andrei2: 238 ±188
memmem: 110 ±14
 (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz




Re: More suggestions for find()

2016-06-19 Thread deadalnix via Digitalmars-d
On Sunday, 19 June 2016 at 11:49:06 UTC, Andrei Alexandrescu 
wrote:

On 06/19/2016 06:38 AM, qznc wrote:
Unfortunately, Im currently traveling and lost my bag with 
laptop at the

airport.


Bummer. I hope you find it! -- Andrei


Not in D, but:

http://clang.llvm.org/doxygen/Lexer_8cpp_source.html

Line 2284, function Lexer::SkipBlockComment.


Re: More suggestions for find()

2016-06-19 Thread Andrei Alexandrescu via Digitalmars-d

On 06/19/2016 06:38 AM, qznc wrote:

Unfortunately, Im currently traveling and lost my bag with laptop at the
airport.


Bummer. I hope you find it! -- Andrei



Re: More suggestions for find()

2016-06-19 Thread qznc via Digitalmars-d
On Saturday, 18 June 2016 at 18:54:28 UTC, Andrei Alexandrescu 
wrote:
Got this link from the reddit discussion around the blog 
article: http://effbot.org/zone/stringlib.htm. The 
Bloom-filter-style trick looks quite cool. Anyone interested in 
running some more experiments? Thx! -- Andrei


Compare with memmem. That is 4x faster than the current stuff. I 
guess vector instructions are key. There is a branch in my repo.


Unfortunately, Im currently traveling and lost my bag with laptop 
at the airport. It will probably take a while before I can look 
into it again.


More suggestions for find()

2016-06-18 Thread Andrei Alexandrescu via Digitalmars-d
Got this link from the reddit discussion around the blog article: 
http://effbot.org/zone/stringlib.htm. The Bloom-filter-style trick looks 
quite cool. Anyone interested in running some more experiments? Thx! -- 
Andrei