On Wed, Jan 19, 2011 at 1:19 AM, Stefan Fuhrmann <stefanfuhrm...@alice-dsl.de> wrote: > On 18.01.2011 12:56, Johan Corveleyn wrote: >> >> On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleyn<jcor...@gmail.com> >> wrote: >>> >>> On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann >>> <stefanfuhrm...@alice-dsl.de> wrote: >> >> [ ... snip ... ] >> >>>> But I think, the stack variable is certainly helpful >>>> and easy to do. >> >> Ok, I've done this (locally, still have to clean up a little and then >> commit). It gives a nice 10% speedup of prefix/suffix scanning, which >> is great. >> >> I have a couple of small questions though, about other small >> optimizations you did: >> >>> Index: subversion/libsvn_diff/diff_file.c >>> =================================================================== >>> --- subversion/libsvn_diff/diff_file.c (revision 1054044) >>> +++ subversion/libsvn_diff/diff_file.c (working copy) >> >> ... >>> >>> @@ -258,10 +259,10 @@ >>> >>> \ >>> for (i = 0; i< files_len; i++) >>> \ >>> { >>> \ >>> - if (all_files[i]->curp< all_files[i]->endp - 1) >>> \ >>> - all_files[i]->curp++; >>> \ >>> + if (all_files[i].curp + 1< all_files[i].endp) >>> \ >>> + all_files[i].curp++; >>> \ >> >> Just curious: why did you change that condition (from 'X< Y - 1' to >> 'X + 1< Y')? >> >> You mention in your log message: "minor re-arragements in arithmetic >> expressions to maximize reuse of results (e.g. in >> INCREMENT_POINTERS)". >> >> Why does this help, and what's the potential impact? > > There is a "hidden" common sub-expression. > Variant 1 in pseudo-code: > > lhs = all_files[i].curp; > temp1 = all_files[i].endp; > rhs = temp1 - 1; > if (lhs < rhs) > { > temp2 = lhs + 1; > all_files[i].curp = temp2; > } > > Variant 2: > > lhs = all_files[i].curp; > temp = lhs + 1; > rhs = all_files[i].endp; > if (lhs < rhs) > all_files[i].curp = temp; > > The problem is that the compiler cannot figure that out > on its own because (x+1) and (y-1) don't overflow / wrap > around for the same values. In particular 0 < (0-1) is true > and (0+1) < 0 is false in unsigned arithmetic.
Thanks for the explanation. So, I understand why this should be more efficient, except ... that it's not, for some reason. At least not after testing it on my x86 machine. It's about 1% slower, tested with two test files generated with the gen-big-files.py script (sent in the other thread), with 1,000,000 prefix and suffix lines, and 500 mismatching lines in the middle: $ ./gen-big-files.py -1 big1.txt -2 big2.txt -p 1000000 -s 1000000 So for now, I didn't commit this change. Maybe it's better on other platforms, so it may still make sense to do it, but then again, at around 1% it's probably not that important ... >> >> Second question: >> >>> +find_identical_prefix_slow(svn_boolean_t *reached_one_eof, >>> + apr_off_t *prefix_lines, >>> + struct file_info file[], apr_size_t file_len, >>> + apr_pool_t *pool) >>> +{ >>> + svn_boolean_t had_cr = FALSE; >>> svn_boolean_t is_match, reached_all_eof; >>> apr_size_t i; >>> + apr_off_t lines = 0; >>> >>> *prefix_lines = 0; >>> for (i = 1, is_match = TRUE; i< file_len; i++) >>> - is_match = is_match&& *file[0]->curp == *file[i]->curp; >>> + is_match = is_match&& *file[0].curp == *file[i].curp; >>> + >>> while (is_match) >>> { >>> /* ### TODO: see if we can take advantage of >>> diff options like ignore_eol_style or ignore_space. */ >>> /* check for eol, and count */ >>> - if (*file[0]->curp == '\r') >>> + if (*file[0].curp == '\r') >>> { >>> - (*prefix_lines)++; >>> + lines++; >> >> Here you added a local variable 'lines', which is incremented in the >> function, and copied to *prefix_lines at the end (you do the same in >> fine_identical_prefix_fast). The original code just sets and >> increments *prefix_lines directly. So, out of curiousness: why does >> this help, and how much? > > Incrementing the temporary variable requires one > indirection less than the original code and uses only > one instead of two registers (in typical cases). > > The 'how much?' part depends on compiler / optimizer > and even more on the target platform (x86 has very > few registers to keep data in fly; x64 has about twice > as many). Ok, this apparently sped it up for around 1% on my machine. So I committed this in r1062275. Thanks. Cheers, -- Johan