@Rich: if I understand the process correctly, the same commits are pushed to infra and GitHub by the CI bot?

I ask because prior to the GitHub incident, I didn't have signature verification enabled (I hadn't read about it and it didn't even occur to me). So my plan was to (whilst GitHub was being sorted out) switch to the gentoo git repo and enable verification and, once I'd seen that that was working (because I'd also seen intermediate emails on this list from people having issues getting signing keys working), perhaps switch back to GitHub to put less strain on the Gentoo servers.

So if the same commits are just pushed to two remotes (gentoo and GitHub), then I should (in theory) be able to change my repo.conf settings, fiddle the remote in /usr/portage, and switch seamlessly from gentoo to GitHub? Alternatively, I could start with a clean /usr/portage again, once I'm happy that I have signature verification working on my machine.

I do sync frequently (I'm a bit of an update enthusiast) -- at least once a week, though I prefer more often as I find that the longer I leave between syncs and world-updates, the more effort I have to overcome issues (few though they are). So git is a better fit for me, I think.

-d

------ Original Message ------
From: "Rich Freeman" <ri...@gentoo.org>
To: gentoo-user@lists.gentoo.org
Sent: 2018-07-06 13:47:11
Subject: Re: Re[2]: [gentoo-user] Re: Portage, git and shallow cloning

On Fri, Jul 6, 2018 at 4:34 AM Davyd McColl <dav...@gmail.com> wrote:

I understand that git history will build over time -- I'm less concerned with (eventual) disk usage than I am with the speed of `emerge --sync`, which (and perhaps I'm sorely mistaken) appeared to be faster using git
than rsync -- hence my choice of git over rsync (the discussion at
https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be
alone in this experience).


From what I've generally seen/heard git is much more efficient as long
as you sync frequently.

rsync has the advantage that it only transfers the minimum necessary
to get you from the tree you have now to the tree that is current.  To
do this it has to stat every file (using default settings - you can
make it even slower if you want to), which is a lot of file I/O.

git has the advantage that it can just read the current HEAD and from
that know exactly what commits are missing, so there is way less
effort spent figuring out what changed.  It has the disadvantage that
it sends everything that happened since your last sync, which could
include files that were created and subsequently removed.  If you sync
often there won't be much of that, but if you're syncing monthly or
even less frequently then you probably will spend a lot of time
transmitting churn.

It is possible to trim down a repository, and as long as nobody is
doing force pushes on the main repo you should still be able to sync.
However, that is not something that just involves a git one-liner.
Personally I don't mind the space tradeoff, especially in exchange for
the IO tradeoff.  A sync is always a VERY fast operation.

I'll also note that the stable branch (which is always free of obvious
issues caused by devs not running repoman) is only available via git.
There is no reason that couldn't be replicated via rsync, but right
now we only have one set of mirrors.

I'm still syncing from github after enabling signature checking.
There is a patch that will make that more secure but in the meantime
my scripts keep an eye on exit status when I sync.  IMO signature
checking is more important than where you sync from - as long as gpg
says I'm good it really doesn't matter who has the ability to play
with the data enroute.  But, it certainly doesn't hurt to sync from
infra (I do have concerns for whether infra could handle everybody
doing it though - github is MS's problem to worry about).

--
Rich



Reply via email to