On Tue, Oct 11, 2016 at 09:34:28PM -0400, Jeff King wrote:
> > Ok, time to present data... Let's assume a degenerate case first:
> > "up-to-date with all remotes" because that is easy to reproduce.
> >
> > I have 14 remotes currently:
> >
> > $ time git fetch --all
> > real 0m18.016s
> > user 0m2.027s
> > sys 0m1.235s
> >
> > $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
> > -P 14 -I % git fetch %
> > real 0m5.168s
> > user 0m2.312s
> > sys 0m1.167s
>
> So first, thank you (and Ævar) for providing real numbers. It's clear
> that I was talking nonsense.
>
> Second, I wonder where all that time is going. Clearly there's an
> end-to-end latency issue, but I'm not sure where it is. Is it startup
> time for git-fetch? Is it in getting and processing the ref
> advertisement from the other side? What I'm wondering is if there are
> opportunities to speed up the serial case (but nobody really cared
> before because it doesn't matter unless you're doing 14 of them back to
> back).
Hmm. I think it really might be just network latency. Here's my fetch
time:
$ git config remote.origin.url
git://github.com/gitster/git.git
$ time git fetch origin
real 0m0.183s
user 0m0.072s
sys 0m0.008s
14 of those in a row shouldn't take more than about 2.5 seconds, which
is still twice as fast as your parallel case. So what's going on?
One is that I live about a hundred miles from GitHub's data center, and
my ping time there is ~13ms. The other side of the country, let alone
Europe, is going to be noticeably slower just for the TCP handshake.
The second is that git:// is really cheap and simple. git-over-ssh is
over twice as slow:
$ time git fetch [email protected]:gitster/git
...
real 0m0.432s
user 0m0.100s
sys 0m0.032s
HTTP fares better than I would have thought, but is also slower:
$ time git fetch https://github.com/gitster/git
...
real 0m0.258s
user 0m0.080s
sys 0m0.032s
-Peff