Re[4]: [gentoo-user] Re: Portage, git and shallow cloning

Davyd McColl Fri, 06 Jul 2018 04:58:12 -0700

@Rich: if I understand the process correctly, the same commits arepushed to infra and GitHub by the CI bot?

I ask because prior to the GitHub incident, I didn't have signatureverification enabled (I hadn't read about it and it didn't even occur tome). So my plan was to (whilst GitHub was being sorted out) switch tothe gentoo git repo and enable verification and, once I'd seen that thatwas working (because I'd also seen intermediate emails on this list frompeople having issues getting signing keys working), perhaps switch backto GitHub to put less strain on the Gentoo servers.

So if the same commits are just pushed to two remotes (gentoo andGitHub), then I should (in theory) be able to change my repo.confsettings, fiddle the remote in /usr/portage, and switch seamlessly fromgentoo to GitHub? Alternatively, I could start with a clean /usr/portageagain, once I'm happy that I have signature verification working on mymachine.

I do sync frequently (I'm a bit of an update enthusiast) -- at leastonce a week, though I prefer more often as I find that the longer Ileave between syncs and world-updates, the more effort I have toovercome issues (few though they are). So git is a better fit for me, Ithink.


-d

------ Original Message ------
From: "Rich Freeman" <ri...@gentoo.org>
To: gentoo-user@lists.gentoo.org
Sent: 2018-07-06 13:47:11
Subject: Re: Re[2]: [gentoo-user] Re: Portage, git and shallow cloning

On Fri, Jul 6, 2018 at 4:34 AM Davyd McColl <dav...@gmail.com> wrote:

I understand that git history will build over time -- I'm lessconcernedwith (eventual) disk usage than I am with the speed of `emerge--sync`,which (and perhaps I'm sorely mistaken) appeared to be faster usinggit
than rsync -- hence my choice of git over rsync (the discussion at
https://forums.gentoo.org/viewtopic-t-1009562.html shows me to not be
alone in this experience).


From what I've generally seen/heard git is much more efficient as long
as you sync frequently.

rsync has the advantage that it only transfers the minimum necessary
to get you from the tree you have now to the tree that is current.  To
do this it has to stat every file (using default settings - you can
make it even slower if you want to), which is a lot of file I/O.

git has the advantage that it can just read the current HEAD and from
that know exactly what commits are missing, so there is way less
effort spent figuring out what changed.  It has the disadvantage that
it sends everything that happened since your last sync, which could
include files that were created and subsequently removed.  If you sync
often there won't be much of that, but if you're syncing monthly or
even less frequently then you probably will spend a lot of time
transmitting churn.

It is possible to trim down a repository, and as long as nobody is
doing force pushes on the main repo you should still be able to sync.
However, that is not something that just involves a git one-liner.
Personally I don't mind the space tradeoff, especially in exchange for
the IO tradeoff.  A sync is always a VERY fast operation.

I'll also note that the stable branch (which is always free of obvious
issues caused by devs not running repoman) is only available via git.
There is no reason that couldn't be replicated via rsync, but right
now we only have one set of mirrors.

I'm still syncing from github after enabling signature checking.
There is a patch that will make that more secure but in the meantime
my scripts keep an eye on exit status when I sync.  IMO signature
checking is more important than where you sync from - as long as gpg
says I'm good it really doesn't matter who has the ability to play
with the data enroute.  But, it certainly doesn't hurt to sync from
infra (I do have concerns for whether infra could handle everybody
doing it though - github is MS's problem to worry about).

--
Rich

Re[4]: [gentoo-user] Re: Portage, git and shallow cloning

Reply via email to