On 11/17/2016 07:53 PM, Gregory Szorc wrote:
On Thu, Nov 17, 2016 at 9:16 AM, Mads Kiilerich <m...@kiilerich.com <mailto:m...@kiilerich.com>> wrote:

    # HG changeset patch
    # User Mads Kiilerich <mad...@unity3d.com <mailto:mad...@unity3d.com>>
    # Date 1479321935 -3600
    #      Wed Nov 16 19:45:35 2016 +0100
    # Node ID 7f39bccc1c96bffc83f3c6e774da57ecd38982a7
    # Parent  fe6a3576b863955a6c40ca46bd1d6c8e5384dedf
    bdiff: early pruning of common prefix before doing expensive
    computations

    It seems quite common that files don't change completely. New
    lines are often
    pretty much appended, and modifications will often only change a
    small section
    of the file which on average will be in the middle.

    There can thus be a big win by pruning a common prefix before
    starting the more
    expensive search for longest common substrings.

    Worst case, it will scan through a long sequence of similar bytes
    without
    encountering a newline. Splitlines will then have to do the same
    again ...
    twice for each side. If similar lines are found, splitlines will
    save the
    double iteration and hashing of the lines ... plus there will be
    less lines to
    find common substrings in.

    This change might in some cases make the algorith pick shorter or
    less optimal
    common substrings. We can't have the cake and eat it.

    This make hg --time bundle --base null -r 4.0 go from 14.5 to 15 s
    - a 3%
    increase.

    On mozilla-unified:
    perfbdiff -m 3041e4d59df2
    ! wall 0.053088 comb 0.060000 user 0.060000 sys 0.000000 (best of
    100) to
    ! wall 0.024618 comb 0.020000 user 0.020000 sys 0.000000 (best of 116)
    perfbdiff 0e9928989e9c --alldata --count 10
    ! wall 0.702075 comb 0.700000 user 0.700000 sys 0.000000 (best of
    15) to
    ! wall 0.579235 comb 0.580000 user 0.580000 sys 0.000000 (best of 18)


\o/

Thank you for implementing this! This is likely the single biggest "easy" perf win in bdiff to be found.


    diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
    --- a/mercurial/bdiff_module.c
    +++ b/mercurial/bdiff_module.c
    @@ -61,12 +61,12 @@ nomem:

     static PyObject *bdiff(PyObject *self, PyObject *args)


Implementing this in the Python module means that CFFI/PyPy won't be able to realize the perf wins :(

How do you feel about implementing this in bdiff.c?

I guess other logic also should move from bdiff_module to bdiff.c? This was just the easy way to hook in before the two sides got split into lines.

     {
    -       char *sa, *sb, *rb;
    +       char *sa, *sb, *rb, *ia, *ib;
            PyObject *result = NULL;
            struct bdiff_line *al, *bl;
            struct bdiff_hunk l, *h;
            int an, bn, count;
    -       Py_ssize_t len = 0, la, lb;
    +       Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
            PyThreadState *_save;

            l.next = NULL;
    @@ -80,8 +80,17 @@ static PyObject *bdiff(PyObject *self, P
            }

            _save = PyEval_SaveThread();
    -       an = bdiff_splitlines(sa, la, &al);
    -       bn = bdiff_splitlines(sb, lb, &bl);
    +
    +       lmax = la > lb ? lb : la;
    +       for (ia = sa, ib = sb;
    +            li < lmax && *ia == *ib;
    +            ++li, ++ia, ++ib)
    +               if (*ia == '\n')
    +                       lcommon = li + 1;
    +       /* we can almost add: if (li == lmax) lcommon = li; */


I can't get this to apply cleanly locally against the committed repo, so I can't look at the assembly. But, my guess is there are still some performance optimizations to be found.

Absolutely. This is the "simple" way of doing it. With this as baseline, we can try to optimize it.


If we compare words instead of bytes, this will require fewer loop iterations and fewer executed instructions.

My work with bdiff_splitlines() tells me that the \n check inside the tight loop is likely adding a bit of overhead. It might be worth backtracking after first mismatch to find the last newline. Alternatively, if you iterate on word boundaries, you can do a bitwise AND against 0x0a0a0a0a... and compare against 0x0 and store the last known word offset containing a newline. On 64 bit machines, the cost of that newline check is reduced by 8x.

Yes, that sound like good ideas.

I guess the biggest win is to first find common prefix on words while ignoring newlines, then adjust and backtrack.

When backtracking and looking for newlines, we could perhaps also use something along the lines of this draft:

x = w ^ 0x0a0a0a0a
x = x | x >> 4
x = x | x >> 2
x = x | x >> 1
x = x & x >> 16
x = x & x >> 8

if x&1==0, then we have a 0a in one of the bytes. But I have no idea how efficient all this bit shuffling would be compared to a simpler approach ;-)

But also, we could probably win a lot by having different handling for "source code" and "binary files". The backtracking cost is probably not relevant for "source code". And splitting on newline is quite arbitrary and rarely optimal for "binary files".

(Somewhat related: Mercurial sometimes determine binary files from looking for 0 bytes. Sometimes, it might be a more convenient heuristic to check that the N first lines are shorter than a few hundred characters. That would be a better match for this line splitting.)


/Mads
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to