Re: My take on the diff-optimizations-bytes branch

Johan Corveleyn Sat, 22 Jan 2011 13:20:39 -0800

On Wed, Jan 19, 2011 at 1:19 AM, Stefan Fuhrmann
<[email protected]> wrote:
> On 18.01.2011 12:56, Johan Corveleyn wrote:
>>
>> On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleyn<[email protected]>
>>  wrote:
>>>
>>> On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
>>> <[email protected]>  wrote:
>>
>> [ ... snip ... ]
>>
>>>> But I think, the stack variable is certainly helpful
>>>> and easy to do.
>>
>> Ok, I've done this (locally, still have to clean up a little and then
>> commit). It gives a nice 10% speedup of prefix/suffix scanning, which
>> is great.
>>
>> I have a couple of small questions though, about other small
>> optimizations you did:
>>
>>> Index: subversion/libsvn_diff/diff_file.c
>>> ===================================================================
>>> --- subversion/libsvn_diff/diff_file.c  (revision 1054044)
>>> +++ subversion/libsvn_diff/diff_file.c  (working copy)
>>
>> ...
>>>
>>> @@ -258,10 +259,10 @@
>>>
>>>     \
>>>      for (i = 0; i<  files_len; i++)
>>>      \
>>>      {
>>>      \
>>> -      if (all_files[i]->curp<  all_files[i]->endp - 1)
>>>     \
>>> -        all_files[i]->curp++;
>>>      \
>>> +      if (all_files[i].curp + 1<  all_files[i].endp)
>>>     \
>>> +        all_files[i].curp++;
>>>     \
>>
>> Just curious: why did you change that condition (from 'X<  Y - 1' to
>> 'X + 1<  Y')?
>>
>> You mention in your log message: "minor re-arragements in arithmetic
>> expressions to maximize reuse of results (e.g. in
>> INCREMENT_POINTERS)".
>>
>> Why does this help, and what's the potential impact?
>
> There is a "hidden" common sub-expression.
> Variant 1 in pseudo-code:
>
> lhs = all_files[i].curp;
> temp1 = all_files[i].endp;
> rhs = temp1 - 1;
> if (lhs < rhs)
> {
>  temp2 = lhs + 1;
>  all_files[i].curp = temp2;
> }
>
> Variant 2:
>
> lhs = all_files[i].curp;
> temp = lhs + 1;
> rhs = all_files[i].endp;
> if (lhs < rhs)
>  all_files[i].curp = temp;
>
> The problem is that the compiler cannot figure that out
> on its own because (x+1) and (y-1) don't overflow / wrap
> around for the same values. In particular 0 < (0-1) is true
> and (0+1) < 0 is false in unsigned arithmetic.


Thanks for the explanation. So, I understand why this should be more
efficient, except ... that it's not, for some reason. At least not
after testing it on my x86 machine.

It's about 1% slower, tested with two test files generated with the
gen-big-files.py script (sent in the other thread), with 1,000,000
prefix and suffix lines, and 500 mismatching lines in the middle:

    $ ./gen-big-files.py -1 big1.txt -2 big2.txt -p 1000000 -s 1000000

So for now, I didn't commit this change. Maybe it's better on other
platforms, so it may still make sense to do it, but then again, at
around 1% it's probably not that important ...

>>
>> Second question:
>>
>>> +find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
>>> +                           apr_off_t *prefix_lines,
>>> +                           struct file_info file[], apr_size_t file_len,
>>> +                           apr_pool_t *pool)
>>> +{
>>> +  svn_boolean_t had_cr = FALSE;
>>>    svn_boolean_t is_match, reached_all_eof;
>>>    apr_size_t i;
>>> +  apr_off_t lines = 0;
>>>
>>>    *prefix_lines = 0;
>>>    for (i = 1, is_match = TRUE; i<  file_len; i++)
>>> -    is_match = is_match&&  *file[0]->curp == *file[i]->curp;
>>> +    is_match = is_match&&  *file[0].curp == *file[i].curp;
>>> +
>>>    while (is_match)
>>>      {
>>>        /* ### TODO: see if we can take advantage of
>>>           diff options like ignore_eol_style or ignore_space. */
>>>        /* check for eol, and count */
>>> -      if (*file[0]->curp == '\r')
>>> +      if (*file[0].curp == '\r')
>>>          {
>>> -          (*prefix_lines)++;
>>> +          lines++;
>>
>> Here you added a local variable 'lines', which is incremented in the
>> function, and copied to *prefix_lines at the end (you do the same in
>> fine_identical_prefix_fast). The original code just sets and
>> increments *prefix_lines directly. So, out of curiousness: why does
>> this help, and how much?
>
> Incrementing the temporary variable requires one
> indirection less than the original code and uses only
> one instead of two registers (in typical cases).
>
> The 'how much?' part depends on compiler / optimizer
> and even more on the target platform (x86 has very
> few registers to keep data in fly; x64 has about twice
> as many).

Ok, this apparently sped it up for around 1% on my machine. So I
committed this in r1062275.

Thanks.
Cheers,
-- 
Johan

Re: My take on the diff-optimizations-bytes branch

Reply via email to