Hi Jonathan,

On Sun, Jan 04, 2026 at 09:45:18PM +0000, Jonathan Wakely wrote:
> On Sun, 4 Jan 2026, 11:54 Mark Wielaard, <[email protected]> wrote:
> > We switched off the "smart protocol" for http(s) and enabled the "dumb
> > protocol". The dumb git protocol works, but is somewhat inefficient and
> > for really big repos like gcc.git requires thousands of fetches (and
> > if you start fresh any failure like you reported will result in you
> > having to start from scratch again).
> >
> > The git:// or ssh:// protocols still use the smart protocol and so
> > should be more realiable.
> >
> > The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and
> > https://git.sr.ht/~sourceware/gcc are up to date and could also be used.
> >
> > It seems to bots have lost interest so maybe we can reduce the anubis
> > paranoia and re-enable the smart protocol again. I'll try, but if the
> > bots return we might have to disable it again, so using different
> > protocol or a mirror if you have to use https might be a good idea for
> > now.
> 
> Maybe we should update the web page about git access to advise against
> fetching over https, since it's slow and inefficient (and might not work at
> all).

I hope it isn't that bad. The new year seems to have less aggessive AI
scraper bots so we dialed down the anubis "protections" (no more
javascript needed) and reenabled the "smart protocol":
https://inbox.sourceware.org/[email protected]
https://fosstodon.org/@sourceware

> The download pages request people to use mirrors to reduce load on the main
> server, it makes sense to give similar advice for obtaining the sources
> over git. We could give clear instructions for fetching from a different
> source and then switching the remote to point to gcc.gnu.org after the
> initial fetch.

But that isn't a bad idea in general. The
https://forge.sourceware.org/gcc/gcc-mirror and
https://git.sr.ht/~sourceware/gcc mirrors are normally monitored and
should be up to date with ~10 minutes delay.

In general gcc.git is just really, really big. Which makes all this
just slightly awkward (it doesn't help that git-http-backend seems to
try to create an optimal pack for each fetch instead of having
something generically cached). At 2.5G it is a couple of factors
bigger than anything else out there. binutils-gdb.git is 725M,
glibc.git is 330M and most others are < 100M.

We might want to explore offering something that just contains the
history from gcc-5 plus all release branches since then. Which should
be doable in ~750M. Or even just the supported release branches (from
gcc-13 up) which gets a normal development git down to ~250M?

Cheers,

Mark

Reply via email to