On 11/17/2016 07:53 PM, Gregory Szorc wrote:
On Thu, Nov 17, 2016 at 9:16 AM, Mads Kiilerich <m...@kiilerich.com
<mailto:m...@kiilerich.com>> wrote:
# HG changeset patch
# User Mads Kiilerich <mad...@unity3d.com <mailto:mad...@unity3d.com>>
# Date 1479321935 -3600
# Wed Nov 16 19:45:35 2016 +0100
# Node ID 7f39bccc1c96bffc83f3c6e774da57ecd38982a7
# Parent fe6a3576b863955a6c40ca46bd1d6c8e5384dedf
bdiff: early pruning of common prefix before doing expensive
computations
It seems quite common that files don't change completely. New
lines are often
pretty much appended, and modifications will often only change a
small section
of the file which on average will be in the middle.
There can thus be a big win by pruning a common prefix before
starting the more
expensive search for longest common substrings.
Worst case, it will scan through a long sequence of similar bytes
without
encountering a newline. Splitlines will then have to do the same
again ...
twice for each side. If similar lines are found, splitlines will
save the
double iteration and hashing of the lines ... plus there will be
less lines to
find common substrings in.
This change might in some cases make the algorith pick shorter or
less optimal
common substrings. We can't have the cake and eat it.
This make hg --time bundle --base null -r 4.0 go from 14.5 to 15 s
- a 3%
increase.
On mozilla-unified:
perfbdiff -m 3041e4d59df2
! wall 0.053088 comb 0.060000 user 0.060000 sys 0.000000 (best of
100) to
! wall 0.024618 comb 0.020000 user 0.020000 sys 0.000000 (best of 116)
perfbdiff 0e9928989e9c --alldata --count 10
! wall 0.702075 comb 0.700000 user 0.700000 sys 0.000000 (best of
15) to
! wall 0.579235 comb 0.580000 user 0.580000 sys 0.000000 (best of 18)
\o/
Thank you for implementing this! This is likely the single biggest
"easy" perf win in bdiff to be found.
diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
--- a/mercurial/bdiff_module.c
+++ b/mercurial/bdiff_module.c
@@ -61,12 +61,12 @@ nomem:
static PyObject *bdiff(PyObject *self, PyObject *args)
Implementing this in the Python module means that CFFI/PyPy won't be
able to realize the perf wins :(
How do you feel about implementing this in bdiff.c?
I guess other logic also should move from bdiff_module to bdiff.c? This
was just the easy way to hook in before the two sides got split into lines.
{
- char *sa, *sb, *rb;
+ char *sa, *sb, *rb, *ia, *ib;
PyObject *result = NULL;
struct bdiff_line *al, *bl;
struct bdiff_hunk l, *h;
int an, bn, count;
- Py_ssize_t len = 0, la, lb;
+ Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
PyThreadState *_save;
l.next = NULL;
@@ -80,8 +80,17 @@ static PyObject *bdiff(PyObject *self, P
}
_save = PyEval_SaveThread();
- an = bdiff_splitlines(sa, la, &al);
- bn = bdiff_splitlines(sb, lb, &bl);
+
+ lmax = la > lb ? lb : la;
+ for (ia = sa, ib = sb;
+ li < lmax && *ia == *ib;
+ ++li, ++ia, ++ib)
+ if (*ia == '\n')
+ lcommon = li + 1;
+ /* we can almost add: if (li == lmax) lcommon = li; */
I can't get this to apply cleanly locally against the committed repo,
so I can't look at the assembly. But, my guess is there are still some
performance optimizations to be found.
Absolutely. This is the "simple" way of doing it. With this as baseline,
we can try to optimize it.
If we compare words instead of bytes, this will require fewer loop
iterations and fewer executed instructions.
My work with bdiff_splitlines() tells me that the \n check inside the
tight loop is likely adding a bit of overhead. It might be worth
backtracking after first mismatch to find the last newline.
Alternatively, if you iterate on word boundaries, you can do a bitwise
AND against 0x0a0a0a0a... and compare against 0x0 and store the last
known word offset containing a newline. On 64 bit machines, the cost
of that newline check is reduced by 8x.
Yes, that sound like good ideas.
I guess the biggest win is to first find common prefix on words while
ignoring newlines, then adjust and backtrack.
When backtracking and looking for newlines, we could perhaps also use
something along the lines of this draft:
x = w ^ 0x0a0a0a0a
x = x | x >> 4
x = x | x >> 2
x = x | x >> 1
x = x & x >> 16
x = x & x >> 8
if x&1==0, then we have a 0a in one of the bytes. But I have no idea how
efficient all this bit shuffling would be compared to a simpler approach ;-)
But also, we could probably win a lot by having different handling for
"source code" and "binary files". The backtracking cost is probably not
relevant for "source code". And splitting on newline is quite arbitrary
and rarely optimal for "binary files".
(Somewhat related: Mercurial sometimes determine binary files from
looking for 0 bytes. Sometimes, it might be a more convenient heuristic
to check that the N first lines are shorter than a few hundred
characters. That would be a better match for this line splitting.)
/Mads
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel