On 06/05/2013 23:36, Junio C Hamano wrote:
Kevin Bracey <ke...@bracey.fi> writes:
+# ,---E--. *H----------. * marks !TREESAME parent paths
+# / \ / \*
+# *A--*B---D--*F-*G---------K-*L-*M
+# \ /* \ /
+# `-C-' `-*I-*J
+#
+# A creates "file", B and F change it.
+# Odd merge G takes the old version from B.
+# I changes it, but J reverts it.
+# H and L both change it, and M merges those changes.
+ ...
+check_outcome failure 'M H L J I G F B A' --simplify-merges -- file # drops G
Even though E simplifies down to B, making G a merge between B and F (the
former is an ancestor of the latter), G is kept, because...?
Because the path at B and F are different?
G's simplified parent B is marked for deletion by the redundant parent scan; it would be redundant in any
normal merge scenario. But here, because of the odd merge, it gets a reprieve from the "don't drop all
TREESAME parents" rule. So merge G remains with 2 parents: B (its "content" parent) and F (its
"topological" parent).
I have to wonder what should happen if D makes path different from B, and then
F makes path the same as B. B and F should still be kept.
That would happen. G would be TREESAME to F, so the "keep 1 TREESAME" rule
wouldn't need to kick in, parent B would be dropped from G, and G would simplify to F.
+# Check that odd merge G remains shown when F is the bottom.
I did not have a chance to say this when you responded in the
previous round with that "bottom" thing before you sent this new round,
but I am not sure if it is a good idea to pay special attention to the
"bottom"-ness. Very often, instead of running "git log -- path" and
stopping when seeing enough by killing the pager, people run
"git log HEAD~200.. -- path" to pick "recent enough"
commits, but the number 200 is very much arbitrary.
True, the bottom can be arbitrary. But that actually turns out to be the key argument in
favour of what I'm doing. The reason for treating the bottom "specially" is
actually to prevent side branches that are immediately above the bottom from being
treated specially. In early tests of the side branch logic, it became clear that whether
a side branch was displayed or not depended on whether it was exactly adjacent to the
bottom. The behaviour was arbitrary and annoying.
Considering your arbitrary bottom example, observe what happens if there's an irrelevant
topic merge coming in at HEAD~199, and you try various bottoms around that point, without
treating the bottom "specially". You get a transient simplification failure as
the bottom passes:
* Bottom=HEAD~300 - below the topic branch point:
Normal simplification will work fine for the topic branch; it is unaffected by
limiting, so the entire topic branch and merge is simplified away. This worked
fine before this patch series (assuming a normal merge).
Now we move forward to put the bottom between the topic branch point and merge.
* Bottom=HEAD~201: (B=bottom, o=other UNINTERESTING,
x=INTERESTING, *=!TREESAME)
B--*x---x--*x simplifies to B--*x---x--*x
/* /*
o---x---x o
New rules reduce to B--*x--*x (1 TREESAME INTERESTING parent,
so ignore UNINTERESTING side)
* Move bottom up one to HEAD~200:
B---x--*x simplifies to B---x--*x
/* /*
o---x---x o
(Merge can't be dropped -
2 UNINTERESTING parents that differ)
If I can't treat bottoms specially, then I can't distinguish between the
merge's two UNINTERESTING parents, so I have to show the merge.
* Move bottom up one again to HEAD~199, and the branch is excluded by the
path-spec anyway:
B--*x
So the net effect there is that we have a side branch merge that is shown for
_exactly_ one choice of bottom commit on the mainline. It's not shown for
'HEAD~5000..' to 'HEAD~201..', and it's not shown for 'HEAD~199..' to 'HEAD..'.
But it pops up in 'HEAD~200..'. Something's wrong with that picture, and
treating the bottom as interesting removes the effect. Fuller discussion below.
+check_result 'M L H' F..M -- file
OK. It is still one possible history to explain what we have in the path at
the end.
Yes, but I don't really like it much. It's not clear that we came from an ancestor of F,
not F itself. But there's nothing you can do about that in this most basic of commands,
and at least adding any option does reveal the truth that we're not based on F. Even just
"--parents" (implied in gitk) reveals that H came from B.
+check_result 'M L K J I H G' F..M --full-history --parents -- file
OK.
It almost makes me suspect that the --full-history option should flip
revs->rewrite_parents on (but not revs->print_parents, unless the --parents
option asks us to) when the option is parsed.
I don't think so. Why do we want to show K unless we have to? It would just be a lot of
noise in a normal "git log" - this tree isn't typical. In a normal tree, most
merges will be more like K - so when requesting a log of a specific file, the mainline
won't have modified the relevant file during the life of most irrelevant topic branches,
so those branches' merges will be fully TREESAME and can be omitted.
I assume --simplify-merges had to be invented to get rid of the noise of all the
pointless fully-TREESAME merges from "--full-history --parents", which keeps
all merges. But without --parents, a file-specific --full-history isn't currently that
bad.
+check_result 'M L J I H' E..M --ancestry-path -- file
Because F is outside our graph with --ancestry-path, G does not bring
anything new to path relative to its remaining parent E and should not be
shown. OK.
Yep. And this works because we treat bottom E as a parent of G for TREESAME, despite it
being UNINTERESTING. If we didn't, then E and F would be "equals", and G would
have to be shown.
+check_outcome failure 'M L K J I H' E..M --ancestry-path --parents --
+file
This fails in what way? G is shown, because the !treesame with now irrelevant
F comes into play?
Yes, that's right. Sorry, missing comment.
+check_outcome failure 'M H L J I' E..M --ancestry-path
+--simplify-merges -- file # includes G
Makes me wonder how "--ancestry-path --full-history" would work with this range
and with/without the pathspec.
--ancestry-path implies full-history, as far as I can see.
+check_outcome failure 'M L H' G..M -- file # includes J I
+check_outcome failure 'M L H' G..M --parents -- file # includes J I
I am not sure if it should be a failure or your expectation is wrong.
G is outside the graph, so as far as the remainder of the graph is concerned,
J is the sole remaining parent of K and I and J did change the path in question.
What makes you think I and J should be excluded in these cases?
Because it's the simplest answer to the question "what happened in M since G", which is
what "G..M" is supposed to mean. And because I J _would_ have been excluded and we would
have got 'M L H' if we added an extra null commit G1 between G and K. That suggests an instability
in our logic. Why should adding a totally irrelevant commit affect whether I-J is hidden? I J is
hidden in every case apart from when you specify K^ as bottom. Why is K^ a special bottom?
Graph theory time!
This all comes about because the formal graph definition doesn't match the user
interface. The question "B..C" currently generates a graph of all commits in C
since B, and the connections between those commits. It turns out to be problematic that
the graph doesn't include the connection to B itself. It would be fine if only worrying
about nodes in the graph. But it's not fine when you start doing graph operations that
care about edge connections to parents.
Thinking linearly, consider:
A..C: o--o--B--o--o--C
The two queries "A..B" and "B..C" should, between them, cover "A..C", surely?
But with the current logic, they don't:
A..B: o--o--B
B..C: o--o--C
Wherever you place B, the following edge gets "deleted" from consideration. Not
a problem for A..B, because we never consider graph commits' children. But it's a problem
for B..C - it affects the graph operations on the first commit of the graph. Which is
what triggers this G..M problem here. In all normal circumstances, the default graph
following would always choose to take K's first TREESAME parent, and ignore IJ. But when
the bottom is specifically K^, it breaks, because that mainline TREESAME edge is removed
from consideration. K is handled differently because it's still on the graph, but it's
lost its mainline parent.
Looking at the graph above, with my proposed addition of G1:
K~3..M:
*G---G1--<-K-*L-*M
\ /
`-*I-*J
K~2..M:
G1--<-K-*L-*M
/
*I-*J
K^..M:
K-*L-*M - whoops. Get diverted onto IJ
/
*I-*J
K..M:
L-*M
So for exactly one position of bottom along the mainline, we "forget" the path
of our mainline, and get diverted, because we're not including the bottom in the graph.
This is just like the HEAD~200 example - different graph optimisation, but same symptom
and root cause. Include the bottom in the graph, but unprinted, and it works:
K~3..M:
(F)---*G---G1--<-K-*L-*M
\ /
`-*I-*J
K~2..M:
(G)---G1--<-K-*L-*M
\ /
`-*I-*J
K^..M:
(G1)-<-K-*L-*M
/
*I-*J
K..M:
L-*M
What I'm effectively doing is extending the graph to actually include the
unshown bottom. I think it just makes more sense. We connect up the linear
history:
A..B: (A)--o--o--B
B..C: (B)--o--o--C
The two are now connected, with every commit and edge considered once between them. The A..B and B..C graphs
are now really connected at B. In the second range, we don't print B, but do treat it as if it was part of
the graph for graph-edge-based operations that want to distinguish between relevant and irrelevant parents.
Which then means that the choice of split point B on the mainline between A and C doesn't perturb the side
branch logic, because we always consider the same edges, regardless of where B is. B doesn't
"delete" its following edge. And "B..C" now really is "how did we get from B to
C" - it includes that first step.
(In a non-linear graph, in this scheme, A..B and B..C aren't fully connected - all other commit
boundaries between the sets are disjoint, and they only connect at B. But that's what makes the
"side branch" logic work out - we can tell which path the user cares about at the
bottom boundary, and hence which paths are the sides. Anything UNINTERESTING && !BOTTOM
will be an ancestor of a bottom, and thus can be candidate for side-branch simplification, which
is what we want to happen with 'HEAD~200..' above).
Shouldn't the same logic as "--ancestry-path [EF]..M" we saw earlier apply here?
That is, "-a-p E..M" makes F the sole remaining parent of G and G does change
the
path from F so it should be shown, while "-a-p F..M" makes E the sole parent of
G,
and G does not change the path from E, so it should not be shown.
That's muddled. I assume you mean:
That is, "-a-p F..M" makes F the sole remaining parent of G and G does change
the
path from F so it should be shown, while "-a-p E..M" makes E the sole parent of
G,
and G does not change the path from E, so it should not be shown.
Which is the way the logic works - we treat F and E as interesting/priority
parents when they're specified as a bottom in each case. Without doing that, G
would have 2 differing and equally (un)important parents in each case, and thus
would be shown in both cases.
In this case, the same logic says that G is treated as an interesting parent of
K because it is the specified bottom. Which then enables the default following
to follow that path direct to G, rather than having to go down the IJ path
(which leads to G anyway).
Kevin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html