Re: Fast string search
bearophile Wrote: Look better at the code, Oh, now I see. Then this is not correct translation to D. C++ returns correct results. I've fixed bug, but not the one that you see. You can verify this by running C++ version.
Re: Fast string search
Thanks for bug report! Code is very new. Bug manifests itself when substring is in 0 position. Fix: diff --git a/volnitsky.h b/volnitsky.h index a7dbc1f..62d686d 100644 -- a/volnitsky.h +++ b/volnitsky.h @@ -26,7 +26,7 @@ const char* volnitsky(const char* S, const char* Se, const char* SS, con // step through text -for (const char* p = S+step; p = Se-W_size; p+=step) { +for (const char* p = S+step-W_size; p = Se-W_size; p+=step) {
Re: Fast string search
Leonid Volnitsky: Thanks for bug report! Code is very new. You are welcome :-) Bug manifests itself when substring is in 0 position. Fix: I may not understand the purpose of the function, but after your fix this code: void main() { string s1 = hello, how are you? I'm well, thanks; writeln(s1.length); string s2 = wello; auto ptr = volnitsky(s1, s2); writeln(ptr - s1.ptr); writeln(s1[ptr - s1.ptr .. $]); } Prints: 36 24 well, thanks But wello is not present in the s1 string, so returning 24 is not what I desire... Bye, bearophile
Re: Fast string search
36 - is correct. When not found volnitsky() return pointer to next byte after last byte of s1 (same as std::search()). 24 - I don't know D to tell what it is (use of ptr past end of s1?)
Re: Fast string search
Leonid Volnitsky: 36 - is correct. When not found volnitsky() return pointer to next byte after last byte of s1 (same as std::search()). Look better at the code, the main contains three writeln(). The first 36 it prints is the length of the s1 string, that's obviously correct. The other two writeln show a wrong result (it may be just a C++==D translation error of mine). Bye, bearophile
Fast string search
Found through Reddit, it may be useful for Phobos: http://volnitsky.com/project/str_search/ A possible partial D2 translation: const(char)* volnitsky(string haystack, string needle) { enum size_t wSize = ushort.sizeof; enum size_t hSize = 64 * 1024; immutable char* S = haystack.ptr; immutable char* Se = haystack[$ - 1]; immutable char* SS = needle.ptr; immutable char* SSe = needle[$ - 1]; immutable size_t SS_size = SSe - SS; // needle length immutable size_t step = SS_size - wSize + 1; // if S is too small or SS is too big, fallback to std::search() if (step wSize || SS_size = (1 8) - 1) { throw new Exception(search!); //return std::search(S, Se, SS, SSe); } // make SS hash //auto H = new ubyte[hSize]; // 64 KB ubyte[hSize] H; // 64 KB for (size_t SSi = 0; SSi (SS_size - wSize + 1); SSi++) { size_t h = *cast(ushort*)(SS + SSi) % hSize; while (H[h]) h = (h + 1) % hSize; // find free cell H[h] = cast(ubyte)(SSi + 1); // can't overflow! } // step through text for (char* p = cast(char*)(S + step); p = Se - wSize; p += step) { size_t h = *cast(ushort*)p % hSize; if (H[h]) { TRY_HASH_CELL: int SSi = H[h] - 1; const char* R = p - SSi; for (size_t i = 0; i SS_size; i++) { if (R[i] != SS[i]) { size_t h_next = (h + 1) % hSize; if (H[h_next]) { h = h_next; goto TRY_HASH_CELL; } else goto NEXT_STEP; } } return R; } NEXT_STEP:; } return Se; } //- import std.stdio: writeln; void main() { string s1 = hello, how are you? I'm well, thanks; string s2 = wello; // doesn't work with strings of odd lenght! auto ptr = volnitsky(s1, s2); writeln(ptr - s1.ptr); writeln(s1[ptr - s1.ptr]); } But as the example shows it doesn't seem to work with strings of odd length. Also it lacks the call to std::search() still. Bye, bearophile