[gentoo-user] Portage, git and shallow cloning

2018-07-05 Thread Davyd McColl
I'm not sure if there's a better place to put this, so please feel free to
tell me I should report it elsewhere (:

After the recent GitHub fun, I changed from using the GitHub git source to
git://anongit.gentoo.org/repo/sync/gentoo.git, as suggested by some on this
mailing list. I completely nuked /usr/portage/* and set off with an `emerge
--sync`, which looked like it was going to take ages and clone about a gig.
Reading https://wiki.gentoo.org/wiki/Project:Portage/Sync, I figured there
was nothing much I could do about it, since the page speaks of the
`sync-depth` config option and states that 1 is "shallow clone, only
current state (*default if option is absent*)" (emphasis mine).
After multiple failures to clone (other side hangs up after a few minutes
-- I only have a 4mbps line, maxing out at around 450Kb/s), I thought I'd
try explicitly setting `sync-depth` in my repo config and found:

1) `sync-depth` has been deprecated (should now use `clone-depth`)
2) with the option missing, portage was fetching the entire history --
after adding the option (and nuking /usr/portage/* again), a new clone
happened in short order, bringing down only around 65Mb (according to git)

So I'd like to ask how to assist in rectifying the above:
1) the docs need to be updated to refer to `clone-depth`
2) I believe that the original intent of defaulting to a shallow clone was
a good idea -- perhaps that can be investigated. If the intent has changed
for some reason, the docs should be updated to reflect the change.

Thanks
-d

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
If you say that getting the money is the most important thing
You will spend your life completely wasting your time
You will be doing things you don't like doing
In order to go on living
That is, to go on doing things you don't like doing

Which is stupid.

- Alan Watts
https://www.youtube.com/watch?v=-gXTZM_uPMY

*Quidquid latine dictum sit, altum sonatur. *


Re: [gentoo-user] Portage, git and shallow cloning

2018-07-14 Thread Peter Humphrey
On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote:

> 1) `sync-depth` has been deprecated (should now use `clone-depth`)

But to what value should clone-depth be set? And why is the recent news item 
referring to instructions to use sync-depth?

-- 
Regards,
Peter.






Re: [gentoo-user] Portage, git and shallow cloning

2018-07-14 Thread Rich Freeman
On Sat, Jul 14, 2018 at 4:30 AM Peter Humphrey  wrote:
>
> On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote:
>
> > 1) `sync-depth` has been deprecated (should now use `clone-depth`)
>
> But to what value should clone-depth be set?

That comes down to personal taste.  Do you want any history to be able
to browse it?  More depth means more history.  If all you want is the
current tree without history then you want a depth of 1, and of course
you'll need to set up a cron job or something to go cleaning up past
history (you never NEED more than the last commit).  If you browse the
online git repo you can see about how many commits there are in a day
and estimate how many you want based on how many days you want.

Also, this value only matters for the first sync.  After that portage
currently doesn't try to discard past commits, and it will always
fetch all commits between your current state and the new head.

If you want you could set up a script to manually purge history, and
then do an initial sync with 1 depth.  Then anytime you sync you could
review the history since the last time you synced, and then run the
purge command to discard all history up to the current commit.  In
doing this you'll always see all the history since the last time you
reviewed it.

-- 
Rich



Re: [gentoo-user] Portage, git and shallow cloning

2018-07-14 Thread Peter Humphrey
On Saturday, 14 July 2018 11:40:03 BST Rich Freeman wrote:
> On Sat, Jul 14, 2018 at 4:30 AM Peter Humphrey  
wrote:
> > On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote:
> > > 1) `sync-depth` has been deprecated (should now use `clone-depth`)
> > 
> > But to what value should clone-depth be set?
> 
> That comes down to personal taste.  Do you want any history to be able
> to browse it?  More depth means more history.  If all you want is the
> current tree without history then you want a depth of 1...

That's all I need for the portage tree, unless removing everything at lower 
depths will remove the change records.

> ...and of course you'll need to set up a cron job or something to go
> cleaning up past history (you never NEED more than the last commit).  If you
> browse the online git repo you can see about how many commits there are in a
> day and estimate how many you want based on how many days you want.
> 
> Also, this value only matters for the first sync.  After that portage
> currently doesn't try to discard past commits, and it will always
> fetch all commits between your current state and the new head.
> 
> If you want you could set up a script to manually purge history, and
> then do an initial sync with 1 depth.  Then anytime you sync you could
> review the history since the last time you synced, and then run the
> purge command to discard all history up to the current commit.  In
> doing this you'll always see all the history since the last time you
> reviewed it.

Is there something in git to do that purging? If not, perhaps a simple monthly 
script to delete /usr/portage/* - but not packages or distfiles, which are on 
separate partitions here - would do the trick.

-- 
Regards,
Peter.






Re: [gentoo-user] Portage, git and shallow cloning

2018-07-14 Thread Rich Freeman
On Sat, Jul 14, 2018 at 8:00 AM Peter Humphrey  wrote:
>
> That's all I need for the portage tree, unless removing everything at lower
> depths will remove the change records.

If you clone with a depth of one you'll see the current state of the
tree, and a commit message from the CI bot, and that is it.  You'll
have zero change history for anything.

If you clone with a dept of 10 you'll see one or two CI bot messages,
and then the last 8 or so actual changes to the tree.  You'll also
have access to what the tree looked like when each of those changes
was made.

Note that git uses COW and compression, so the cost of increasing your
depth isn't very high.  A depth of 1 costs you about 670M, and a depth
of 236000 costs you 1.5G.  I'd expect the cost to be roughly linear
between these.

>
> Is there something in git to do that purging? If not, perhaps a simple monthly
> script to delete /usr/portage/* - but not packages or distfiles, which are on
> separate partitions here - would do the trick.

That delete would certainly work, though it would cost you a full sync
(which would go back to your depth setting).  I'd suggest moving
distfiles outside of the repo if you're going to do that (really, it
shouldn't be inside anyway), just to make it easier.

git has no facilities to do this automatically, probably because it
isn't something Linus does and git is very much his thing.  However, I
found that this works for me:

git rev-parse HEAD >! .git/shallow
git reflog expire --expire=all --all
git gc --prune=now

(This is a combination of:
https://stackoverflow.com/a/34829535  (which doesn't work)
and
https://stackoverflow.com/a/46004595   (which is incomplete))

It runs in about 14s for me in a tmpfs.

Another option would be to a local shallow clone and swap the repositories.

You'll find tons of guides online for throwing out history that
involve rebasing.  You do NOT want to do this here.  These will change
the hash of the HEAD, which means that the next git pull won't be a
fast-forward, and it will be a mess in general.  You just want to
discard local history, not rewrite the repository to say that there
never was any history.

Also note that the first line in this little script depends somewhat
on git internals and may or may not work in the distant future.

In any case, I suggest trying it.  If it somehow eats your repo for
breakfast just delete it and the next sync will re-fetch.





--
Rich



Re: [gentoo-user] Portage, git and shallow cloning

2018-07-14 Thread Rich Freeman
On Sat, Jul 14, 2018 at 11:06 AM Rich Freeman  wrote:
>
> git rev-parse HEAD >! .git/shallow
> git reflog expire --expire=all --all
> git gc --prune=now
>

Before anybody bangs their head against the wall too much I did end up
having syncing issues with this.  I suspect the fix is a one-liner,
but which one-liner has defied a fair bit of messing around with it.

In general the git authors aren't really big on supporting this sort
of thing, so it is just a big hack.  Doing a local sync to discard
history might be an option.  Just deleting the repo and re-syncing is
another option.

But, if somebody comes up with a good fix I'm all ears.

-- 
Rich