Re: [PATCH] discovery: slowly increase sampling size

Martin von Zweigbergk via Mercurial-devel Tue, 21 May 2019 09:09:29 -0700

On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:


> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.da...@octobus.net>
> # Date 1558436902 -7200
> #      Tue May 21 13:08:22 2019 +0200
> # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
> # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
> # EXP-Topic discovery
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 123267b121ea
> discovery: slowly increase sampling size
>
> Some pathological discovery runs can requires many roundtrip. When this
> happens
> things can get very slow.
>
> To make the algorithm more resilience again such pathological case. We
> slowly
> increase the sample size with each roundtrip (+5%). This will have a
> negligible
> impact on "normal" discovery with few roundtrips, but a large positive
> impact of
> case with many roundtrips. Asking more question per roundtrip helps to
> reduce
> the undecided set faster. Instead of reducing the undecided set a linear
> speed
> (in the worst case), we reduce it as a guaranteed (small) exponential
> rate. The
> data below show this slow ramp up in sample size:
>
> round trip    |    1 |     5 |    10 |    20 |     50 |     100 |
>  130 |
> sample size   |  200 |   254 |   321 |   517 |  2 199 |  25 123 |   108
> 549 |
> covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42 658 | 524 530 | 2 276
> 755 |
>
> To be a bit more concrete, lets take a very pathological case as an
> example. We
> are doing discovery from a copy of Mozilla-try to a more recent version of
> mozilla-unified. Mozilla-unified heads are unknown to the mozilla-try repo
> and
> there are over 1 million "missing" changesets. (the discovery is "local" to
> avoid network interference)
>

What makes it slow in that case? If the 1 million changesets are mostly
linear, it should still be fast, right? So I assume they're spread out
across branches. Is it slow because the remote side has many branches that
the local side does not, or the other way around?


> Without this change, the discovery:
> - last 1858 seconds (31 minutes),
> - does 1700 round trip,
> - asking about 340 000 nodes.
>
> With this change, the discovery:
> - last 218 seconds (3 minutes, 38 seconds a -88% improvement),
> - does 94 round trip (-94%),
> - asking about 344 211 nodes (+1%).
>
> Of course, this is an extreme case (and 3 minutes is still slow). However
> this
> give a good example of how this sample size increase act as a safety net
> catching any bad situations.
>
> We could image a steeper increase than 5%. For example 10% would give the
> following number:
>
> round trip    |    1 |     5 |    10 |     20 |     50  |        75 |
>   100 |
> sample size   |  200 |   321 |   514 |  1 326 |  23 060 |   249 812 |  2
> 706 594 |
> covered nodes |  200 | 1 541 | 3 690 | 12 671 | 251 871 | 2 746 254 | 29
> 770 966 |
>
> In parallel, it is useful to understand these pathological cases and
> improve
> them. However the current change provides a general purpose safety net to
> smooth
> the impact of pathological cases.
>
> To avoid issue with older http server, the increase in sample size only
> occurs
> if the protocol has not limit on command argument size.
>
> diff --git a/mercurial/setdiscovery.py b/mercurial/setdiscovery.py
> --- a/mercurial/setdiscovery.py
> +++ b/mercurial/setdiscovery.py
> @@ -256,7 +256,8 @@ def findcommonheads(ui, local, remote,
>                      initialsamplesize=100,
>                      fullsamplesize=200,
>                      abortwhenunrelated=True,
> -                    ancestorsof=None):
> +                    ancestorsof=None,
> +                    samplegrowth=1.05):
>      '''Return a tuple (common, anyincoming, remoteheads) used to identify
>      missing nodes from or in remote.
>      '''
> @@ -389,6 +390,8 @@ def findcommonheads(ui, local, remote,
>                  ui.debug("taking initial sample\n")
>              samplefunc = disco.takefullsample
>              targetsize = fullsamplesize
> +            if not remote.limitedarguments:
> +                fullsamplesize = int(fullsamplesize * samplegrowth)
>          else:
>              # use even cheaper initial sample
>              ui.debug("taking quick initial sample\n")
> diff --git a/tests/test-setdiscovery.t b/tests/test-setdiscovery.t
> --- a/tests/test-setdiscovery.t
> +++ b/tests/test-setdiscovery.t
> @@ -980,10 +980,10 @@ One with >200 heads. We now switch to se
>    query 3; still undecided: 980, sample size is: 200
>    sampling from both directions
>    searching: 4 queries
> -  query 4; still undecided: \d+, sample size is: 200 (re)
> +  query 4; still undecided: 435, sample size is: 210
>    sampling from both directions
>    searching: 5 queries
> -  query 5; still undecided: 195, sample size is: 195
> +  query 5; still undecided: 185, sample size is: 185
>    5 total queries in *.????s (glob)
>    elapsed time:  * seconds (glob)
>    heads summary:
>

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH] discovery: slowly increase sampling size

Reply via email to