Re: [PATCH] discovery: slowly increase sampling size

Pierre-Yves David Tue, 21 May 2019 16:14:57 -0700


On 5/21/19 10:40 PM, Martin von Zweigbergk wrote:

On Tue, May 21, 2019 at 1:24 PM Pierre-Yves David<pierre-yves.da...@ens-lyon.org <mailto:pierre-yves.da...@ens-lyon.org>>wrote:




    On 5/21/19 10:00 PM, Martin von Zweigbergk wrote:
     >
     >
     > On Tue, May 21, 2019 at 11:10 AM Martin von Zweigbergk
     > <martinv...@google.com <mailto:martinv...@google.com>
    <mailto:martinv...@google.com <mailto:martinv...@google.com>>> wrote:
     >
     >
     >
     >     On Tue, May 21, 2019 at 10:09 AM Pierre-Yves David
     >     <pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>
     >     <mailto:pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>>> wrote:
     >
     >
     >
     >         On 5/21/19 6:05 PM, Martin von Zweigbergk wrote:
     >          >
     >          >
     >          > On Tue, May 21, 2019 at 4:22 AM Pierre-Yves David
     >          > <pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>
     >         <mailto:pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>>
     >         <mailto:pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>
     >         <mailto:pierre-yves.da...@ens-lyon.org
    <mailto:pierre-yves.da...@ens-lyon.org>>>>
     >          > wrote:
     >          >
     >          >     # HG changeset patch
     >          >     # User Pierre-Yves David
    <pierre-yves.da...@octobus.net <mailto:pierre-yves.da...@octobus.net>
     >         <mailto:pierre-yves.da...@octobus.net
    <mailto:pierre-yves.da...@octobus.net>>
     >          >     <mailto:pierre-yves.da...@octobus.net
    <mailto:pierre-yves.da...@octobus.net>
     >         <mailto:pierre-yves.da...@octobus.net
    <mailto:pierre-yves.da...@octobus.net>>>>
     >          >     # Date 1558436902 -7200
     >          >     #      Tue May 21 13:08:22 2019 +0200
     >          >     # Node ID 123267b121ea268b3bc488c6dde68a8d93ee4f4c
     >          >     # Parent  86f17fc31aa8a0c26c11db6a532293463ee6428a
     >          >     # EXP-Topic discovery
     >          >     # Available At
    https://bitbucket.org/octobus/mercurial-devel/
     >          >     #              hg pull
     >          > https://bitbucket.org/octobus/mercurial-devel/ -r
    123267b121ea
     >          >     discovery: slowly increase sampling size
     >          >
     >          >     Some pathological discovery runs can requires many
     >         roundtrip. When
     >          >     this happens
     >          >     things can get very slow.
     >          >
     >          >     To make the algorithm more resilience again such
     >         pathological case.
     >          >     We slowly
     >          >     increase the sample size with each roundtrip
    (+5%). This
     >         will have a
     >          >     negligible
     >          >     impact on "normal" discovery with few roundtrips,
    but a large
     >          >     positive impact of
     >          >     case with many roundtrips. Asking more question per
     >         roundtrip helps
     >          >     to reduce
     >          >     the undecided set faster. Instead of reducing the
     >         undecided set a
     >          >     linear speed
     >          >     (in the worst case), we reduce it as a guaranteed
    (small)
     >          >     exponential rate. The
     >          >     data below show this slow ramp up in sample size:
     >          >

> > round trip | 1 | 5 | 10 | 20 | 50 |

     >             100 |
     >          >         130 |
     >          >     sample size   |  200 |   254 |   321 |   517 |  2
    199 |
     >         25 123 |
     >          >       108 549 |
     >          >     covered nodes |  200 | 1 357 | 2 821 | 7 031 | 42
    658 |
     >         524 530 | 2
     >          >     276 755 |
     >          >
     >          >     To be a bit more concrete, lets take a very
    pathological
     >         case as an
     >          >     example. We
     >          >     are doing discovery from a copy of Mozilla-try to
    a more
     >         recent
     >          >     version of
     >          >     mozilla-unified. Mozilla-unified heads are unknown
    to the
     >          >     mozilla-try repo and
     >          >     there are over 1 million "missing" changesets. (the
     >         discovery is
     >          >     "local" to
     >          >     avoid network interference)
     >          >
     >          >
     >          > What makes it slow in that case? If the 1 million
    changesets
     >         are mostly
     >          > linear, it should still be fast, right? So I assume
    they're
     >         spread out
     >          > across branches. Is it slow because the remote side
    has many
     >         branches
     >          > that the local side does not, or the other way around?
     >
     >         The million changeset are on many different branches. (the
     >         discovery is
     >         run from mozilla-try (many changesets, tree like) toward
     >         mozilla-unified
     >         (mostly linear).
     >
     >
     >     I think that means that there are many local heads. Maybe
    it's the
     >     case I tried to address with
    https://phab.mercurial-scm.org/D2647. I
     >     just re-opened that. I'll see if can get timing numbers for the
     >     repos you mentioned here.

    I had a look at this commit. The idea in your commit message is
    interesting, but I see a different one implemented in the code. See my
    comment for details.

     > That's now done. I've added you as reviewer. I feel like that's a
    better
     > solution for this particular case. It results in significantly more
     > bytes transferred (+25%), but it's also much faster.

    As said before in the scope of this series, I am not interest in any
    better solution any particular case. I am working on a general purpose
    safety net that improves the upper bound round trip complexity for
    every
    pathological cases.

    The good point of this safety net is that it is not mutually exclusive
    with improving for particular cases. For example, you change and my
    change could peacefully coexist.

    Can we move forward with this patch before discussing more specific
    change?

I'm not sure what other pathological cases remain after that one isfixed, but sure, I'll queue this anyway.

The nice thing with this patch, is that it does not matters, whateverother pathological cases lurk in the shadow got better ;-)


--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH] discovery: slowly increase sampling size

Reply via email to