Hi Jonathan, On Sun, Jan 04, 2026 at 09:45:18PM +0000, Jonathan Wakely wrote: > On Sun, 4 Jan 2026, 11:54 Mark Wielaard, <[email protected]> wrote: > > We switched off the "smart protocol" for http(s) and enabled the "dumb > > protocol". The dumb git protocol works, but is somewhat inefficient and > > for really big repos like gcc.git requires thousands of fetches (and > > if you start fresh any failure like you reported will result in you > > having to start from scratch again). > > > > The git:// or ssh:// protocols still use the smart protocol and so > > should be more realiable. > > > > The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and > > https://git.sr.ht/~sourceware/gcc are up to date and could also be used. > > > > It seems to bots have lost interest so maybe we can reduce the anubis > > paranoia and re-enable the smart protocol again. I'll try, but if the > > bots return we might have to disable it again, so using different > > protocol or a mirror if you have to use https might be a good idea for > > now. > > Maybe we should update the web page about git access to advise against > fetching over https, since it's slow and inefficient (and might not work at > all).
I hope it isn't that bad. The new year seems to have less aggessive AI scraper bots so we dialed down the anubis "protections" (no more javascript needed) and reenabled the "smart protocol": https://inbox.sourceware.org/[email protected] https://fosstodon.org/@sourceware > The download pages request people to use mirrors to reduce load on the main > server, it makes sense to give similar advice for obtaining the sources > over git. We could give clear instructions for fetching from a different > source and then switching the remote to point to gcc.gnu.org after the > initial fetch. But that isn't a bad idea in general. The https://forge.sourceware.org/gcc/gcc-mirror and https://git.sr.ht/~sourceware/gcc mirrors are normally monitored and should be up to date with ~10 minutes delay. In general gcc.git is just really, really big. Which makes all this just slightly awkward (it doesn't help that git-http-backend seems to try to create an optimal pack for each fetch instead of having something generically cached). At 2.5G it is a couple of factors bigger than anything else out there. binutils-gdb.git is 725M, glibc.git is 330M and most others are < 100M. We might want to explore offering something that just contains the history from gcc-5 plus all release branches since then. Which should be doable in ~750M. Or even just the supported release branches (from gcc-13 up) which gets a normal development git down to ~250M? Cheers, Mark
