Thomas Rast <tr...@student.ethz.ch> writes:

>> Why not turn the behavior on its head:
>>
>> * Change the default behavior to be something well-defined, easy to
>> document, and convenient for humans, such as "topological order with
>> ties broken by timestamp" or "approximate timestamp order, but
>> respecting dependencies".
>>
>> * Add a new option, --arbitrary-order, that explicitly chooses
>> efficiency instead of a defined order.
>
> I think that would be a rather bad decision, largely because (taking my
> git.git as an example):
>
>   $ time git log | head -1
>   $ time git log --date-order | head -1

You are correct to point out that introducing "--arbitrary-order"
and force sorting by default is stupid for one reason, but forgot to
stress the other equally important reason, I think.  Even though you
came close to it here:

> ... if you just use 'git log' to quickly see the last
> few commits.  At least to me, the optimization makes perfect sense.

you may not have fully internalized that other reason yourself, I
suspect, for the reason I mention in my last two paragraphs below.

When you run "git log", you are asking only to see "the last few"
commits.  The size of "few" actually depends on the occasion and the
user, but the important thing to notice is that the definition of
"the last" is fuzzily defined.  In such a request over a history
like this:

      ---A---B---C---D
                      \
    ---1---2---3---4---* = HEAD

the user does not care the exact order, as long as the ones "closer"
(again, a fuzzy definition) to HEAD come out earlier than the ones
"farther" (and all of them have to come out eventually but that goes
without saying).

In the case of the above sample history, even with clock skews, the
ones labeled with alphabets appear in the expected order among
themselves, and the ones labeled with numbers also do, with or
without sorting.  And all three orders match the use case of "git
log" to view "the last few" commits just fine.  When we have the
default, topo and date orders, all of which would give us output
suitable for the purpose of showing "the last few", picking the one
with the least latency is the right thing to do.

In other words, latency is important, but a short-latency solution
is acceptable and preferred only if it gives a reasonable result.
Giving useless output with small latency is not what we are aiming
to do.

> The right fix would be to dig up Peff's work on generation number
> caching, and modify the algorithm to take generation numbers into
> account.

I think you are totally wrong here, unless you are talking about a
generation number that is different from what I recall from the
older discussion.  Think of the sample history above, and imagine
that the numbered ones are based on the current 'master', but that
the alphabet ones are based on an ancient maintenance release that
is 1000 generations behind (think of me running the command after
finishing the day's integration cycle, sitting at the tip of 'pu',
where the last topic merged is meant to be eventually merged to
maint-1.7.9).  All of the commits depicted in the picture will have
the commit timestamps in the past few hours.  Ancestors of A and 1,
not drawn in the picture, were made yesterday or before.

The current "default" output will give all of the depicted commits
before showing the older commits that are not in the picture, and
that is absolutely the right thing to do when you ask "git log" to
view "the last few" commits.  Imagine what you will see if you used
generation numbers instead of the commit timestamp.  You will see
commits on the maintenance topic that can later be merged to an
older codebase, only after you saw all the development history on
the master branch since 1.7.9 release.  I do not think we want to
call such an output "the right fix".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to