Hi devs, As mentioned in [1], I've created two branches to try out two different approaches for the diff optimizations of prefix/suffix scanning.
The first one, diff-optimizations-bytes, has a working implementation of the optimization. It still has some open todo items, but it basically works. The second one, diff-optimizations-tokens, takes a more high-level approach by working in the "token handling layer". It takes the extracted lines as a whole, and compares them, to scan for identical prefix and suffix. I preferred this "new" approach, because it seemed more elegant (and works both for diff_file and diff_memory (property diffs)). However, although the token-based prefix scanning works adequately, I'm now stuck with the suffix scanning. I am now considering to abandon the tokens-approach, for the following reasons: 1) There is still a lot of work. Scanning for identical suffix is quite difficult, because we now have to extract tokens (lines) in reverse. I've put in place a stub for that function (datasource_get_previous_token), but that still needs to be implemented. And that's the hardest part, IMHO. Not only that, but I just realized that I'll also have to implement a reverse variant of util.c#svn_diff__normalize_buffer (which contains the encouraging comment "It only took me forever to get this routine right,..." (added by ehu in r866123)), and maybe also of token_compare (not sure). 2) I'm beginning to see that token-based suffix scanning will not be as fast as byte-based suffix scanning. Simply because, in the case of byte-based suffix scanning, we can completely ignore line structure. We never have to compare characters with \n or \r, we just keep reading bytes and comparing them. So there is an extra overhead for token-based suffix scanning. So, unless someone can convince me otherwise, I'm probably going to stop with the token approach. Because of 2), I don't think it's worth it to spend the effort needed for 1), especially because the byte-based approach already works. Any thoughts? Cheers, -- Johan [1] http://svn.haxx.se/dev/archive-2010-11/0416.shtml