Re: [PATCH 3 of 3 V3] changelog: disable delta chains

2016-10-13 Thread Pierre-Yves David

On 10/13/2016 09:44 PM, Gregory Szorc wrote:

# HG changeset patch
# User Gregory Szorc 
# Date 1476355827 -7200
#  Thu Oct 13 12:50:27 2016 +0200
# Node ID 231e6b5206857a809198f5535fac32a004686bf1
# Parent  aed87a8bed99b373eec5fb09dd2f76d166af59e8
changelog: disable delta chains


Theses are pushed, thanks.

Patch 1 inspired me a small change that should make you happy.

Cheers

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 3 of 3 V3] changelog: disable delta chains

2016-10-13 Thread Gregory Szorc
# HG changeset patch
# User Gregory Szorc 
# Date 1476355827 -7200
#  Thu Oct 13 12:50:27 2016 +0200
# Node ID 231e6b5206857a809198f5535fac32a004686bf1
# Parent  aed87a8bed99b373eec5fb09dd2f76d166af59e8
changelog: disable delta chains

This patch disables delta chains on changelogs. After this patch, new
entries on changelogs - including existing changelogs - will be stored
as the fulltext of that data (likely compressed). No delta computation
will be performed.

An overview of delta chains and data justifying this change follows.

Revlogs try to store entries as a delta against a previous entry (either
a parent revision in the case of generaldelta or the previous physical
revision when not using generaldelta). Most of the time this is the
correct thing to do: it frequently results in less CPU usage and smaller
storage.

Delta chains are most effective when the base revision being deltad
against is similar to the current data. This tends to occur naturally
for manifests and file data, since only small parts of each tend to
change with each revision. Changelogs, however, are a different story.

Changelog entries represent changesets/commits. And unless commits in a
repository are homogonous (same author, changing same files, similar
commit messages, etc), a delta from one entry to the next tends to be
relatively large compared to the size of the entry. This means that
delta chains tend to be short. How short? Here is the full vs delta
revision breakdown on some real world repos:

Repo % Full% Delta   Max Length
hg45.8   54.26
mozilla-central   42.4   57.68
mozilla-unified   42.5   57.5   17
pypy  46.1   53.96
python-zstandard  46.1   53.93

(I threw in python-zstandard as an example of a repo that is homogonous.
It contains a small Python project with changes all from the same
author.)

Contrast this with the manifest revlog for these repos, where 99+% of
revisions are deltas and delta chains run into the thousands.

So delta chains aren't as useful on changelogs. But even a short delta
chain may provide benefits. Let's measure that.

Delta chains may require less CPU to read revisions if the CPU time
spent reading smaller deltas is less than the CPU time used to
decompress larger individual entries. We can measure this via
`hg perfrevlog -c -d 1` to iterate a revlog to resolve each revision's
fulltext. Here are the results of that command on a repo using delta
chains in its changelog and on a repo without delta chains:

hg (forward)
! wall 0.407008 comb 0.41 user 0.41 sys 0.00 (best of 25)
! wall 0.390061 comb 0.39 user 0.39 sys 0.00 (best of 26)

hg (reverse)
! wall 0.515221 comb 0.52 user 0.52 sys 0.00 (best of 19)
! wall 0.400018 comb 0.40 user 0.39 sys 0.01 (best of 25)

mozilla-central (forward)
! wall 4.508296 comb 4.49 user 4.49 sys 0.00 (best of 3)
! wall 4.370222 comb 4.37 user 4.35 sys 0.02 (best of 3)

mozilla-central (reverse)
! wall 5.758995 comb 5.76 user 5.72 sys 0.04 (best of 3)
! wall 4.346503 comb 4.34 user 4.32 sys 0.02 (best of 3)

mozilla-unified (forward)
! wall 4.957088 comb 4.95 user 4.94 sys 0.01 (best of 3)
! wall 4.660528 comb 4.65 user 4.63 sys 0.02 (best of 3)

mozilla-unified (reverse)
! wall 6.119827 comb 6.11 user 6.09 sys 0.02 (best of 3)
! wall 4.675136 comb 4.67 user 4.67 sys 0.00 (best of 3)

pypy (forward)
! wall 1.231122 comb 1.24 user 1.23 sys 0.01 (best of 8)
! wall 1.164896 comb 1.16 user 1.16 sys 0.00 (best of 9)

pypy (reverse)
! wall 1.467049 comb 1.46 user 1.46 sys 0.00 (best of 7)
! wall 1.160200 comb 1.17 user 1.16 sys 0.01 (best of 9)

The data clearly shows that it takes less wall and CPU time to resolve
revisions when there are no delta chains in the changelogs, regardless
of the direction of traversal. Furthermore, not using a delta chain
means that fulltext resolution in reverse is as fast as iterating
forward. So not using delta chains on the changelog is a clear CPU win
for reading operations.

An example of a user-visible operation showing this speed-up is revset
evaluation. Here are results for
`hg perfrevset 'author(gps) or author(mpm)'`:

hg
! wall 1.655506 comb 1.66 user 1.65 sys 0.01 (best of 6)
! wall 1.612723 comb 1.61 user 1.60 sys 0.01 (best of 7)

mozilla-central
! wall 17.629826 comb 17.64 user 17.60 sys 0.04 (best of 3)
! wall 17.311033 comb 17.30 user 17.26 sys 0.04 (best of 3)

What about 00changelog.i size?

RepoDelta Chains No Delta Chains
hg7,033,250 6,976,771
mozilla-central  82,978,74881,574,623
mozilla-unified  88,112,34986,702,162
pypy 20,740,69920,659,741

The data shows