Re: Fast string search

2010-12-16 Thread Leonid Volnitsky
bearophile Wrote:
 Look better at the code,

Oh, now I see.  Then this is not correct translation to D.   C++ returns 
correct results.   I've fixed bug, but not the one that you see.  You can 
verify this by running C++ version. 


Re: Fast string search

2010-12-15 Thread Leonid Volnitsky
Thanks for bug report!  Code is very new.
Bug manifests itself when substring is in 0 position.
Fix:

diff --git a/volnitsky.h b/volnitsky.h
index a7dbc1f..62d686d 100644
-- a/volnitsky.h
+++ b/volnitsky.h
@@ -26,7 +26,7 @@ const char* volnitsky(const char*  S,  const char* Se,
const char* SS,  con
 // step through text
-for (const char* p = S+step;   p = Se-W_size;  p+=step)  {
+for (const char* p = S+step-W_size;   p = Se-W_size;  p+=step)  {


Re: Fast string search

2010-12-15 Thread bearophile
Leonid Volnitsky:

 Thanks for bug report!  Code is very new.

You are welcome :-)


 Bug manifests itself when substring is in 0 position.
 Fix:

I may not understand the purpose of the function, but after your fix this code:

void main() {
string s1 = hello, how are you? I'm well, thanks;
writeln(s1.length);
string s2 = wello;
auto ptr = volnitsky(s1, s2);
writeln(ptr - s1.ptr);
writeln(s1[ptr - s1.ptr .. $]);
}

Prints:

36
24
well, thanks

But wello is not present in the s1 string, so returning 24 is not what I 
desire...

Bye,
bearophile


Re: Fast string search

2010-12-15 Thread Leonid Volnitsky
36 - is correct.   When not found volnitsky() return pointer to next byte after
last byte of s1  (same as std::search()).
24 - I don't know D to tell what it is (use of ptr past end of s1?)


Re: Fast string search

2010-12-15 Thread bearophile
Leonid Volnitsky:

 36 - is correct.   When not found volnitsky() return pointer to next byte 
 after
 last byte of s1  (same as std::search()).

Look better at the code, the main contains three writeln(). The first 36 it 
prints is the length of the s1 string, that's obviously correct. The other two 
writeln show a wrong result (it may be just a C++==D translation error of 
mine).

Bye,
bearophile


Fast string search

2010-12-12 Thread bearophile
Found through Reddit, it may be useful for Phobos:
http://volnitsky.com/project/str_search/

A possible partial D2 translation:


const(char)* volnitsky(string haystack, string needle) {
enum size_t wSize = ushort.sizeof;
enum size_t hSize = 64 * 1024;

immutable char* S = haystack.ptr;
immutable char* Se = haystack[$ - 1];
immutable char* SS = needle.ptr;
immutable char* SSe = needle[$ - 1];

immutable size_t SS_size = SSe - SS; // needle length
immutable size_t step = SS_size - wSize + 1;

// if S is too small or SS is too big, fallback to std::search()
if (step  wSize || SS_size = (1  8) - 1) {
throw new Exception(search!);
//return std::search(S, Se, SS, SSe);
}

// make SS hash
//auto H = new ubyte[hSize]; // 64 KB
ubyte[hSize] H; // 64 KB
for (size_t SSi = 0; SSi  (SS_size - wSize + 1); SSi++)  {
size_t h = *cast(ushort*)(SS + SSi) % hSize;
while (H[h])
h = (h + 1) % hSize; // find free cell
H[h] = cast(ubyte)(SSi + 1); // can't overflow!
}

// step through text
for (char* p = cast(char*)(S + step); p = Se - wSize; p += step) {
size_t h = *cast(ushort*)p % hSize;
if (H[h]) {
TRY_HASH_CELL:
int SSi = H[h] - 1;
const char* R = p - SSi;
for (size_t i = 0; i  SS_size; i++) {
if (R[i] != SS[i]) {
size_t h_next = (h + 1) % hSize;
if (H[h_next]) {
h = h_next;
goto TRY_HASH_CELL;
} else
goto NEXT_STEP;
}
}
return R;
}
NEXT_STEP:;
}
return Se;
}

//-

import std.stdio: writeln;

void main() {
string s1 = hello, how are you? I'm well, thanks;
string s2 = wello; // doesn't work with strings of odd lenght!
auto ptr = volnitsky(s1, s2);
writeln(ptr - s1.ptr);
writeln(s1[ptr - s1.ptr]);
}


But as the example shows it doesn't seem to work with strings of odd length. 
Also it lacks the call to std::search() still.

Bye,
bearophile