Re: [PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-18 Thread Junio C Hamano
Kirill Smelkov k...@navytux.spb.ru writes:

  2) alloca(), for small arrays, is used for the same reason - if we change
  it to xmalloc()/free() the timings get worse
 
 Do you see any use of it outside compat/?
 
 I thought we specifically avoid alloca() for portability.  Also we
 do not use variable-length-arrays on the stack either, I think.

 No, no usage outside compat/ and I knew alloca and VLAs are not used in
 Git codebase for portability, and I understand alloca will be
 criticized, but wanted to start the discussion rolling.

 I've actually started without alloca, and used xmalloc/free for
 [nparent] vectors, but the impact was measurable, so it just had to be
 changed to something more optimal.

 For me, personally, alloca is ok, but I understand there could be
 portability issues (by the way, what compiler/system Git cares about
 does not have working alloca?). Thats why I propose we do the following

 1. at configure time, determine, do we have working alloca, and define

 #define HAVE_ALLOCA

if yes.

 2. in code

 #ifdef HAVE_ALLOCA
 # define xalloca(size)  (alloca(size))
 # define xalloca_free(p)do {} while(0)
 #else
 # define xalloca(size)  (xmalloc(size))
 # define xalloca_free(p)(free(p))
 #endif

and use it like

func() {
p = xalloca(size);
...

xalloca_free(p);
}

 This way, for systems, where alloca is available, we'll have optimal
 on-stack allocations with fast executions. On the other hand, on
 systems, where alloca is not available, this gracefully fallbacks to
 xmalloc/free.

 Please tell me what you think.

I guess the above is clean enough, and we cannot do better than that,
if we want to use alloca() on platforms where we can.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-16 Thread Kirill Smelkov
On Fri, Feb 14, 2014 at 09:37:00AM -0800, Junio C Hamano wrote:
 Kirill Smelkov k...@mns.spb.ru writes:
 
  Previously diff_tree(), which is now named __diff_tree_sha1(), was
 
 That name with two leading underscores is a rather unfortunate,
 especially for a function that is not a file scope static.  No way
 to rename this to something more sensible?

I agree. In preparatory patches I thought this will go away, but it
stayed. I'll try to come up with something reasonable.


  That impedance mismatch *hurts* *performance* *badly* for generating
  combined diffs - in c839f1bd (combine-diff: optimize combine_diff_path
 
 Please avoid referring to a commit that is not in 'master' by its
 object name.  It can be reworked later and get a different name.

I agree, this makes sense. Is it ok to refer to nearby commits, by say
HEAD~3, if we know we are referring to 3-times previous commit?


  That slowness comes from the fact that currently, while generating
  combined diff, a lot of time is spent computing diff(commit,commit^2)
  just to only then intersect that huge diff to almost small set of files
  from diff(commit,commit^1).
 
 Good observation.
 
 |a|   |b|a  b   -  a ∉ B   -   D(A,B) +=  +aa↓
 |-|   |-|a  b   -  b ∉ A   -   D(A,B) +=  -bb↓
 | |   | |a = b   -  investigate δ(a,b)a↓ b↓
 
 In both the n-parallel and diff-tree, when an entry 'a' is a
 tree, I take this D(A,B) += +a to mean (recursively) adding all
 the paths within 'a' to the result as addition.  Sounds sensible.

Correct.


  D(A,B)
 
  is by definition the same as combined diff
 
  D(A,[B]),
 
  so if we could rework the code for common case and make it be not slower
  for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
  multiparent diff tree-walker would greatly benefit generating
  combine-diff.
 
 OK.

Thanks. My first goal was to demonstrate it is doable - i.e. we could
join two diff tree-walkers into generalized one, and that approach would
be sound and have chances to be accepted.


  What we do is as follows:
 
  1) diff tree-walker __diff_tree_sha1() is internally reworked to be
 a paths generator (new name diff_tree_paths()), with each generated path
 being `struct combine_diff_path` with info for path, new sha1,mode and 
  for
 every parent which sha1,mode it was in it.
 
  2) From that info, we can still generate usual diff queue with
 struct diff_filepairs, via exporting generated
 combine_diff_path, if we know we run for nparent=1 case.
 (see emit_diff() which is now named emit_diff_p0only())
 
 s/p0/first_parent_/; perhaps?

Ok.


  3) In order for diff_can_quit_early(), which checks
 
 DIFF_OPT_TST(opt, HAS_CHANGES))
 
 to work, that exporting have to be happening not in bulk, but
 incrementally, one diff path at a time.
 
 Good thinking.

Yes. This requirement made the diff-paths producer be real generator,
which emits paths incrementally, which is imho, could be a good design
for making the component more reusable.


  Some notes(*):
 
  1) For loops,
 
   i = 0; do { ... } while (++i  nparent);
 
  is used instead of
 
   for (i = 0; i  nparent; ++i)
   ...
 
  because for the former case, the compiler have to emit additional
  prologue code which checks for i = nparent case before entering the
  loop.
 
  As we require nparent must be 0, that additional overhead
  conflicts with the runs not slower for nparent=1 case than before
  goal.
 
 Unfortunate.  I'd rather see us stick to more readable and familiar
 form for maintainability if this were not measurable.

The most effect on performance were avoiding mallocs and reduce register
pressure. This too, was measurable, but of lower impact. If giving away
some part of percent for nparent=1 case is ok, this could be back to
usual for loops.

By the way, I find it a bit unfortunate, we don't have some for-loop
form with post-conditions in C, only do-while. Another observation, is
that it would be good to say to compiler e.g.

assume nparent  0;

and then in e.g.

for (i = 0; i  nparent; i++)
...

the compiler could know it should not do pre-conditions checks before
entering the loop.

I've verified, that such behaviour, at least with gcc, could be achieved
with

if (nparent = 0)
return;

in function prologue - then the compiler deduces, at least with O2, that
after return point, nparent is 0, but this adds slight overhead on each
entry to function, and we have as many calls as there would be
recursions.

To me, the do-while is still readable, but if fors are preferred, I'll
re-measure what it costs and come back.


  2) alloca(), for small arrays, is used for the same reason - if we change
  it to xmalloc()/free() the timings get worse
 
 Do you see any use of it outside compat/?
 
 I thought we specifically avoid alloca() for portability.  Also we
 do not use variable-length-arrays on the stack either, I think.

No, no usage outside 

Re: [PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-14 Thread Kirill Smelkov
On Thu, Feb 13, 2014 at 11:51:19AM -0800, Junio C Hamano wrote:
 Kirill Smelkov k...@mns.spb.ru writes:
 
  +   /* until we go to it next round, .next holds how many bytes we
  +* allocated (for faster realloc - we don't need copying old 
  data).
  +*/
  +   p-next = (struct combine_diff_path *)alloclen;
 
 I am getting this here:
 
 tree-diff.c: In function '__path_appendnew':
 tree-diff.c:140:13: error: cast to pointer from integer of different size 
 [-Werror=int-to-pointer-cast]

Ah, sorry, I only tested on 32 bits, and this indeed could be a valid
warning on systems where sizeof(ptr) != sizeof(int).

In our case, alloclen is small enough that the code should be valid,
and we can avoid the warning, via proper usage of intptr_t.

Please find corrected patch below.

Thanks,
Kirill

 8 
From: Kirill Smelkov k...@mns.spb.ru
Subject: [PATCH v2 1/2] tree-diff: rework diff_tree() to generate diffs for
 multiparent cases as well
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previously diff_tree(), which is now named __diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in c839f1bd (combine-diff: optimize combine_diff_path
sets intersection) I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that every parent touches, we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)  (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(A,X1...Xn) calculation scheme
---

D(A,X1...Xn) = D(A,X1) ^ ... ^ D(A,Xn)   (regarding resulting paths set)

 D(A,Xj) - diff between A..Xj
 D(A,X1...Xn)- combined diff from A to parents X1,...,Xn

We start from all trees, which are sorted, and compare their entries in
lock-step:

  A X1   Xn
  - --
 |a|   |x1| |xn|
 |-|   |--| ... |--|  i = argmin(x1...xn)
 | |   |  | |  |
 |-|   |--| |--|
 |.|   |. | |. |
  . ..
  . ..

at any time there could be 3 cases:

 1)  a  xi;
 2)  a  xi;
 3)  a = xi.

Schematic deduction of what every case means, and what to do, follows:

1)  a  xi  -  ∀j a ∉ Xj  -  +a ∈ D(A,Xj)  -  D += 

Re: [PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-14 Thread Junio C Hamano
Kirill Smelkov k...@mns.spb.ru writes:

  8 
 From: Kirill Smelkov k...@mns.spb.ru
 Subject: [PATCH v2 1/2] tree-diff: rework diff_tree() to generate diffs for
  multiparent cases as well
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit

The last three do not belong here.  Only From, Date and Subject are
relevant and taken as in-body headers.

For that matter, as the From and Subject above are exactly the same
as what you have on the e-mail header, you do not want any of them.

I'll strip them out here, so no need to resend.  Thanks.

 Previously diff_tree(), which is now named __diff_tree_sha1(), was

That name with two leading underscores is a rather unfortunate,
especially for a function that is not a file scope static.  No way
to rename this to something more sensible?

 That impedance mismatch *hurts* *performance* *badly* for generating
 combined diffs - in c839f1bd (combine-diff: optimize combine_diff_path

Please avoid referring to a commit that is not in 'master' by its
object name.  It can be reworked later and get a different name.

 That slowness comes from the fact that currently, while generating
 combined diff, a lot of time is spent computing diff(commit,commit^2)
 just to only then intersect that huge diff to almost small set of files
 from diff(commit,commit^1).

Good observation.

|a|   |b|a  b   -  a ∉ B   -   D(A,B) +=  +aa↓
|-|   |-|a  b   -  b ∉ A   -   D(A,B) +=  -bb↓
| |   | |a = b   -  investigate δ(a,b)a↓ b↓

In both the n-parallel and diff-tree, when an entry 'a' is a
tree, I take this D(A,B) += +a to mean (recursively) adding all
the paths within 'a' to the result as addition.  Sounds sensible.

 D(A,B)

 is by definition the same as combined diff

 D(A,[B]),

 so if we could rework the code for common case and make it be not slower
 for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
 multiparent diff tree-walker would greatly benefit generating
 combine-diff.

OK.

 What we do is as follows:

 1) diff tree-walker __diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.

 2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via exporting generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_p0only())

s/p0/first_parent_/; perhaps?

 3) In order for diff_can_quit_early(), which checks

DIFF_OPT_TST(opt, HAS_CHANGES))

to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.

Good thinking.

 Some notes(*):

 1) For loops,

  i = 0; do { ... } while (++i  nparent);

 is used instead of

  for (i = 0; i  nparent; ++i)
  ...

 because for the former case, the compiler have to emit additional
 prologue code which checks for i = nparent case before entering the
 loop.

 As we require nparent must be 0, that additional overhead
 conflicts with the runs not slower for nparent=1 case than before
 goal.

Unfortunate.  I'd rather see us stick to more readable and familiar
form for maintainability if this were not measurable.

 2) alloca(), for small arrays, is used for the same reason - if we change
 it to xmalloc()/free() the timings get worse

Do you see any use of it outside compat/?

I thought we specifically avoid alloca() for portability.  Also we
do not use variable-length-arrays on the stack either, I think.

 3) For every parent tree, we need to keep a tag, whether entry from that
 parent equals to entry from minimal parent. For performance reasons I'm
 keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.

Unfortunate, but I do not see another place to keep this
information offhand (nor implement this approach without keeping
that piece of information).

 P.S. and combined diff is not some exotic/for-play-only stuff - for

No need to convince us about that ;-)

 example for a program I write to represent Git archives as readonly
 filesystem, there is initial scan with

 `git log --reverse --raw --no-abbrev --no-renames -c`

 to extract log of what was created/changed when, as a result building a
 map

 {}  sha1-  in which commit (and date) a content was added

 that `-c` means also show combined diff for merges, and without them, if
 a merge is non-trivial (merges changes from two parents with both having
 separate changes to a file), or an evil one, the map will not be full,
 i.e. some valid sha1 would be absent from it.

 That case was my initial motivation for combined diffs speedup.

I wonder if this machinery can be reused for log -m as well (or
perhaps you do that already?).  After all, by performing a single
parallel scan, you are gathering all the necessary 

[PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-13 Thread Kirill Smelkov
Previously diff_tree(), which is now named __diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in c839f1bd (combine-diff: optimize combine_diff_path
sets intersection) I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that every parent touches, we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)  (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(A,X1...Xn) calculation scheme
---

D(A,X1...Xn) = D(A,X1) ^ ... ^ D(A,Xn)   (regarding resulting paths set)

 D(A,Xj) - diff between A..Xj
 D(A,X1...Xn)- combined diff from A to parents X1,...,Xn

We start from all trees, which are sorted, and compare their entries in
lock-step:

  A X1   Xn
  - --
 |a|   |x1| |xn|
 |-|   |--| ... |--|  i = argmin(x1...xn)
 | |   |  | |  |
 |-|   |--| |--|
 |.|   |. | |. |
  . ..
  . ..

at any time there could be 3 cases:

 1)  a  xi;
 2)  a  xi;
 3)  a = xi.

Schematic deduction of what every case means, and what to do, follows:

1)  a  xi  -  ∀j a ∉ Xj  -  +a ∈ D(A,Xj)  -  D += +a;  a↓

2)  a  xi

2.1) ∃j: xj  xi  -  -xi ∉ D(A,Xj)  -  D += ø;  ∀ xk=xi  xk↓
2.2) ∀j  xj = xi  -  xj ∉ A  -  -xj ∈ D(A,Xj)  -  D += -xi;  ∀j xj↓

3)  a = xi

3.1) ∃j: xj  xi  -  +a ∈ D(A,Xj)  -  only xk=xi remains to investigate
3.2) xj = xi  -  investigate δ(a,xj)
 |
 |
 v

3.1+3.2) looking at δ(a,xk) ∀k: xk=xi - if all != ø  -

  ⎧δ(a,xk)  - if xk=xi
 -  D += ⎨
  ⎩+a - if xkxi

in any case a↓  ∀ xk=xi  xk↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-

A B
- -
   |a|   |b|a  b   -  a ∉ B   -   D(A,B) +=  +aa↓
   |-|   |-|a  b   -  b ∉ A   -   D(A,B) +=  -bb↓
   | |   | |a = b   -  investigate δ(a,b)a↓ b↓
   |-|   |-|
   |.|   |.|
. .
. .



This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization 

Re: [PATCH 1/2] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well

2014-02-13 Thread Junio C Hamano
Kirill Smelkov k...@mns.spb.ru writes:

 + /* until we go to it next round, .next holds how many bytes we
 +  * allocated (for faster realloc - we don't need copying old 
 data).
 +  */
 + p-next = (struct combine_diff_path *)alloclen;

I am getting this here:

tree-diff.c: In function '__path_appendnew':
tree-diff.c:140:13: error: cast to pointer from integer of different size 
[-Werror=int-to-pointer-cast]
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html