Oskar Sandberg <oskar at freenetproject.org> wrote: > On Fri, Jun 22, 2001 at 03:52:06AM +0100, Theodore Hong wrote: > > Rolling hash value? I don't get what that has to do with splitting. > > I was told that the correct way to parse a text for a stopper line (like > you do when splitting a mine multipart) is to run a hash value over the > text that works so that you pop in every new value, and then pop out the > value at stopper.length back. When The rolling value matches the hash of > the stopper with the same function, you have found the stop code. > > I don't remember who told me this, I sort of thought it was you.
Nope, wasn't me. Although now that I think about it, the way I would do it is to use the Boyer-Moore string searching algorithm. (Not that I actually did it this way last night, but now I think I ought to.) You precompute a table of the locations of characters in your stopper line. Then (this is the clever part) instead of comparing forward byte-by-byte, you look ahead stopper.length bytes and compare backward. If the first character you read is not in stopper at all, you know right away that stopper cannot begin in the next stopper.length bytes, so you can jump right over that whole range without doing any more compares. If it's in the middle of stopper somewhere, you use your precomputed table of locations to determine the set of places where stopper could begin, and jump ahead to the first one. If it's at the end of stopper, in the worst case you compare backwards stopper.length bytes to confirm that you found it, which is the same number of compares as searching forwards. theo -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 174 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20010622/4e46ce7b/attachment.pgp>
