Re: [PATCH 1 of 2] bdiff: early pruning of common prefix before doing expensive computations

2016-11-27 Thread Augie Fackler

> On Nov 24, 2016, at 1:26 PM, Pierre-Yves David 
>  wrote:
> 
   diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
   --- a/mercurial/bdiff_module.c
   +++ b/mercurial/bdiff_module.c
   @@ -61,12 +61,12 @@ nomem:
 
static PyObject *bdiff(PyObject *self, PyObject *args)
 
 
 Implementing this in the Python module means that CFFI/PyPy won't be able
 to realize the perf wins :(
 
 How do you feel about implementing this in bdiff.c?
>>> 
>>> I guess other logic also should move from bdiff_module to bdiff.c? This was
>>> just the easy way to hook in before the two sides got split into lines.
>> 
>> If you're willing, I'd be a big fan of this change happening in such a
>> way that pypy gets the wins too. What say you?
> 
> So, what is the status of this? Should we expect a V2 with the code living in 
> bdiff.c?

I’m happy for it to land as-is, but was hoping we could get a version that is 
factored out for pypy use.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] bdiff: early pruning of common prefix before doing expensive computations

2016-11-24 Thread Pierre-Yves David



On 11/17/2016 10:30 PM, Augie Fackler wrote:

On Thu, Nov 17, 2016 at 08:29:46PM +0100, Mads Kiilerich wrote:

On 11/17/2016 07:53 PM, Gregory Szorc wrote:

On Thu, Nov 17, 2016 at 9:16 AM, Mads Kiilerich > wrote:

   # HG changeset patch
   # User Mads Kiilerich >
   # Date 1479321935 -3600
   #  Wed Nov 16 19:45:35 2016 +0100
   # Node ID 7f39bccc1c96bffc83f3c6e774da57ecd38982a7
   # Parent  fe6a3576b863955a6c40ca46bd1d6c8e5384dedf
   bdiff: early pruning of common prefix before doing expensive
   computations


[...]


   diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
   --- a/mercurial/bdiff_module.c
   +++ b/mercurial/bdiff_module.c
   @@ -61,12 +61,12 @@ nomem:

static PyObject *bdiff(PyObject *self, PyObject *args)


Implementing this in the Python module means that CFFI/PyPy won't be able
to realize the perf wins :(

How do you feel about implementing this in bdiff.c?


I guess other logic also should move from bdiff_module to bdiff.c? This was
just the easy way to hook in before the two sides got split into lines.


If you're willing, I'd be a big fan of this change happening in such a
way that pypy gets the wins too. What say you?


So, what is the status of this? Should we expect a V2 with the code 
living in bdiff.c?


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] bdiff: early pruning of common prefix before doing expensive computations

2016-11-17 Thread Mads Kiilerich

On 11/17/2016 07:53 PM, Gregory Szorc wrote:
On Thu, Nov 17, 2016 at 9:16 AM, Mads Kiilerich > wrote:


# HG changeset patch
# User Mads Kiilerich >
# Date 1479321935 -3600
#  Wed Nov 16 19:45:35 2016 +0100
# Node ID 7f39bccc1c96bffc83f3c6e774da57ecd38982a7
# Parent  fe6a3576b863955a6c40ca46bd1d6c8e5384dedf
bdiff: early pruning of common prefix before doing expensive
computations

It seems quite common that files don't change completely. New
lines are often
pretty much appended, and modifications will often only change a
small section
of the file which on average will be in the middle.

There can thus be a big win by pruning a common prefix before
starting the more
expensive search for longest common substrings.

Worst case, it will scan through a long sequence of similar bytes
without
encountering a newline. Splitlines will then have to do the same
again ...
twice for each side. If similar lines are found, splitlines will
save the
double iteration and hashing of the lines ... plus there will be
less lines to
find common substrings in.

This change might in some cases make the algorith pick shorter or
less optimal
common substrings. We can't have the cake and eat it.

This make hg --time bundle --base null -r 4.0 go from 14.5 to 15 s
- a 3%
increase.

On mozilla-unified:
perfbdiff -m 3041e4d59df2
! wall 0.053088 comb 0.06 user 0.06 sys 0.00 (best of
100) to
! wall 0.024618 comb 0.02 user 0.02 sys 0.00 (best of 116)
perfbdiff 0e9928989e9c --alldata --count 10
! wall 0.702075 comb 0.70 user 0.70 sys 0.00 (best of
15) to
! wall 0.579235 comb 0.58 user 0.58 sys 0.00 (best of 18)


\o/

Thank you for implementing this! This is likely the single biggest 
"easy" perf win in bdiff to be found.



diff --git a/mercurial/bdiff_module.c b/mercurial/bdiff_module.c
--- a/mercurial/bdiff_module.c
+++ b/mercurial/bdiff_module.c
@@ -61,12 +61,12 @@ nomem:

 static PyObject *bdiff(PyObject *self, PyObject *args)


Implementing this in the Python module means that CFFI/PyPy won't be 
able to realize the perf wins :(


How do you feel about implementing this in bdiff.c?


I guess other logic also should move from bdiff_module to bdiff.c? This 
was just the easy way to hook in before the two sides got split into lines.



 {
-   char *sa, *sb, *rb;
+   char *sa, *sb, *rb, *ia, *ib;
PyObject *result = NULL;
struct bdiff_line *al, *bl;
struct bdiff_hunk l, *h;
int an, bn, count;
-   Py_ssize_t len = 0, la, lb;
+   Py_ssize_t len = 0, la, lb, li = 0, lcommon = 0, lmax;
PyThreadState *_save;

l.next = NULL;
@@ -80,8 +80,17 @@ static PyObject *bdiff(PyObject *self, P
}

_save = PyEval_SaveThread();
-   an = bdiff_splitlines(sa, la, );
-   bn = bdiff_splitlines(sb, lb, );
+
+   lmax = la > lb ? lb : la;
+   for (ia = sa, ib = sb;
+li < lmax && *ia == *ib;
+++li, ++ia, ++ib)
+   if (*ia == '\n')
+   lcommon = li + 1;
+   /* we can almost add: if (li == lmax) lcommon = li; */


I can't get this to apply cleanly locally against the committed repo, 
so I can't look at the assembly. But, my guess is there are still some 
performance optimizations to be found.


Absolutely. This is the "simple" way of doing it. With this as baseline, 
we can try to optimize it.




If we compare words instead of bytes, this will require fewer loop 
iterations and fewer executed instructions.


My work with bdiff_splitlines() tells me that the \n check inside the 
tight loop is likely adding a bit of overhead. It might be worth 
backtracking after first mismatch to find the last newline. 
Alternatively, if you iterate on word boundaries, you can do a bitwise 
AND against 0x0a0a0a0a... and compare against 0x0 and store the last 
known word offset containing a newline. On 64 bit machines, the cost 
of that newline check is reduced by 8x.


Yes, that sound like good ideas.

I guess the biggest win is to first find common prefix on words while 
ignoring newlines, then adjust and backtrack.


When backtracking and looking for newlines, we could perhaps also use 
something along the lines of this draft:


x = w ^ 0x0a0a0a0a
x = x | x >> 4
x = x | x >> 2
x = x | x >> 1
x = x & x >> 16
x = x & x >> 8

if x&1==0, then we have a 0a in one of the bytes. But I have no idea how 
efficient all this bit shuffling would be compared to a simpler approach ;-)


But also, we could probably win a lot by having different handling for 
"source code" and "binary files". The backtracking cost is probably not