On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann <stefanfuhrm...@alice-dsl.de> wrote: > On 03.01.2011 02:14, Johan Corveleyn wrote: >> It would be interesting to see where the biggest gains >> are coming from (I'm guessing from the "per-machine-word" >> reading/comparing; I'd like to try that first, maybe together with the >> allocating of the file[] on the stack; I'd like to avoid >> special-casing file_len==2, splitting up functions in *_slow and >> *_fast, because it makes it a lot more difficult imho (so I'd only >> like to do that if it's a significant contributor)). But if you want >> to leave that as an exercise for the reader, that's fine as well :-). > > Exercise is certainly not a bad thing ;) > > But I think, the stack variable is certainly helpful > and easy to do. While "chunky" operation gives > a *big* boost, it is much more difficult to code if > you need to compare multiple sources. It should > be possible, though.
Ok, I finished the exercise :-). - File info structures on stack instead of on heap: committed in r1060625. Gives 10% boost. - chunky operation for multiple sources: committed in r1062532. Gives ~250% boost on my machine. For suffix scanning, I made a couple of changes to the logic you sent in your patch (as you remember, there was a problem with the implementation of suffix scanning [1]). It was (in scan_backward_fast): [[[ +#if SVN_UNALIGNED_ACCESS_IS_OK + + if ( (file[0].curp - sizeof(apr_size_t) < file[0].curp) + && (file[1].curp - sizeof(apr_size_t) < file[1].curp)) + { + do + { + file[0].curp -= sizeof(apr_size_t); + file[1].curp -= sizeof(apr_size_t); + } + while ( (file[0].curp >= min_curp0) + && (file[1].curp >= min_curp1) + && ( *(const apr_size_t*)(file[0].curp + 1) + == *(const apr_size_t*)(file[1].curp + 1))); + + file[0].curp += sizeof(apr_size_t); + file[1].curp += sizeof(apr_size_t); + } ]]] I changed this to (the N-file equivalent of): [[[ +#if SVN_UNALIGNED_ACCESS_IS_OK + + if ( (file[0].curp - sizeof(apr_size_t) >= min_curp0) + && (file[1].curp - sizeof(apr_size_t) >= min_curp1)) + { + do + { + file[0].curp -= sizeof(apr_size_t); + file[1].curp -= sizeof(apr_size_t); + } + while ( (file[0].curp - sizeof(apr_size_t) >= min_curp0) + && (file[1].curp - sizeof(apr_size_t) >= min_curp1) + && ( *(const apr_size_t*)(file[0].curp + 1) + == *(const apr_size_t*)(file[1].curp + 1))); + + file[0].curp += sizeof(apr_size_t); + file[1].curp += sizeof(apr_size_t); + } ]]] (so, changed the if-comparisons before the do-while-loop, to make sure there is room to subtract a sizeof(apr_size_t) (the original didn't really make sense to me), and changed the comparisons with min_curpX in the while condition (check if there is still room to subtract a sizeof(apr_size_t)). After that, all tests went fine. Now, you suggested in a previous post in this thread that hardcoding file_len to 2 should speed it up with up to 50%. And you are correct (on my machine, it's sped up by ~40% if I replace all file_len's in find_identical_* with 2). So, that leads me to conclude that I should probably throw away the generic code, and replace it with 3 variants: find_identical_*2, find_identical_*3 and find_identical_*4 (once I write *2, the rest should be mostly copy-n-paste). As you said, this should also vastly simplify the code (drop the for loops, simpler conditionals, ...). Cheers, -- Johan [1] http://svn.haxx.se/dev/archive-2011-01/0022.shtml