[gentoo-user] Portage, git and shallow cloning
I'm not sure if there's a better place to put this, so please feel free to tell me I should report it elsewhere (: After the recent GitHub fun, I changed from using the GitHub git source to git://anongit.gentoo.org/repo/sync/gentoo.git, as suggested by some on this mailing list. I completely nuked /usr/portage/* and set off with an `emerge --sync`, which looked like it was going to take ages and clone about a gig. Reading https://wiki.gentoo.org/wiki/Project:Portage/Sync, I figured there was nothing much I could do about it, since the page speaks of the `sync-depth` config option and states that 1 is "shallow clone, only current state (*default if option is absent*)" (emphasis mine). After multiple failures to clone (other side hangs up after a few minutes -- I only have a 4mbps line, maxing out at around 450Kb/s), I thought I'd try explicitly setting `sync-depth` in my repo config and found: 1) `sync-depth` has been deprecated (should now use `clone-depth`) 2) with the option missing, portage was fetching the entire history -- after adding the option (and nuking /usr/portage/* again), a new clone happened in short order, bringing down only around 65Mb (according to git) So I'd like to ask how to assist in rectifying the above: 1) the docs need to be updated to refer to `clone-depth` 2) I believe that the original intent of defaulting to a shallow clone was a good idea -- perhaps that can be investigated. If the intent has changed for some reason, the docs should be updated to reflect the change. Thanks -d -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- If you say that getting the money is the most important thing You will spend your life completely wasting your time You will be doing things you don't like doing In order to go on living That is, to go on doing things you don't like doing Which is stupid. - Alan Watts https://www.youtube.com/watch?v=-gXTZM_uPMY *Quidquid latine dictum sit, altum sonatur. *
Re: [gentoo-user] Portage, git and shallow cloning
On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote: > 1) `sync-depth` has been deprecated (should now use `clone-depth`) But to what value should clone-depth be set? And why is the recent news item referring to instructions to use sync-depth? -- Regards, Peter.
Re: [gentoo-user] Portage, git and shallow cloning
On Sat, Jul 14, 2018 at 4:30 AM Peter Humphrey wrote: > > On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote: > > > 1) `sync-depth` has been deprecated (should now use `clone-depth`) > > But to what value should clone-depth be set? That comes down to personal taste. Do you want any history to be able to browse it? More depth means more history. If all you want is the current tree without history then you want a depth of 1, and of course you'll need to set up a cron job or something to go cleaning up past history (you never NEED more than the last commit). If you browse the online git repo you can see about how many commits there are in a day and estimate how many you want based on how many days you want. Also, this value only matters for the first sync. After that portage currently doesn't try to discard past commits, and it will always fetch all commits between your current state and the new head. If you want you could set up a script to manually purge history, and then do an initial sync with 1 depth. Then anytime you sync you could review the history since the last time you synced, and then run the purge command to discard all history up to the current commit. In doing this you'll always see all the history since the last time you reviewed it. -- Rich
Re: [gentoo-user] Portage, git and shallow cloning
On Saturday, 14 July 2018 11:40:03 BST Rich Freeman wrote: > On Sat, Jul 14, 2018 at 4:30 AM Peter Humphrey wrote: > > On Friday, 6 July 2018 06:34:01 BST Davyd McColl wrote: > > > 1) `sync-depth` has been deprecated (should now use `clone-depth`) > > > > But to what value should clone-depth be set? > > That comes down to personal taste. Do you want any history to be able > to browse it? More depth means more history. If all you want is the > current tree without history then you want a depth of 1... That's all I need for the portage tree, unless removing everything at lower depths will remove the change records. > ...and of course you'll need to set up a cron job or something to go > cleaning up past history (you never NEED more than the last commit). If you > browse the online git repo you can see about how many commits there are in a > day and estimate how many you want based on how many days you want. > > Also, this value only matters for the first sync. After that portage > currently doesn't try to discard past commits, and it will always > fetch all commits between your current state and the new head. > > If you want you could set up a script to manually purge history, and > then do an initial sync with 1 depth. Then anytime you sync you could > review the history since the last time you synced, and then run the > purge command to discard all history up to the current commit. In > doing this you'll always see all the history since the last time you > reviewed it. Is there something in git to do that purging? If not, perhaps a simple monthly script to delete /usr/portage/* - but not packages or distfiles, which are on separate partitions here - would do the trick. -- Regards, Peter.
Re: [gentoo-user] Portage, git and shallow cloning
On Sat, Jul 14, 2018 at 8:00 AM Peter Humphrey wrote: > > That's all I need for the portage tree, unless removing everything at lower > depths will remove the change records. If you clone with a depth of one you'll see the current state of the tree, and a commit message from the CI bot, and that is it. You'll have zero change history for anything. If you clone with a dept of 10 you'll see one or two CI bot messages, and then the last 8 or so actual changes to the tree. You'll also have access to what the tree looked like when each of those changes was made. Note that git uses COW and compression, so the cost of increasing your depth isn't very high. A depth of 1 costs you about 670M, and a depth of 236000 costs you 1.5G. I'd expect the cost to be roughly linear between these. > > Is there something in git to do that purging? If not, perhaps a simple monthly > script to delete /usr/portage/* - but not packages or distfiles, which are on > separate partitions here - would do the trick. That delete would certainly work, though it would cost you a full sync (which would go back to your depth setting). I'd suggest moving distfiles outside of the repo if you're going to do that (really, it shouldn't be inside anyway), just to make it easier. git has no facilities to do this automatically, probably because it isn't something Linus does and git is very much his thing. However, I found that this works for me: git rev-parse HEAD >! .git/shallow git reflog expire --expire=all --all git gc --prune=now (This is a combination of: https://stackoverflow.com/a/34829535 (which doesn't work) and https://stackoverflow.com/a/46004595 (which is incomplete)) It runs in about 14s for me in a tmpfs. Another option would be to a local shallow clone and swap the repositories. You'll find tons of guides online for throwing out history that involve rebasing. You do NOT want to do this here. These will change the hash of the HEAD, which means that the next git pull won't be a fast-forward, and it will be a mess in general. You just want to discard local history, not rewrite the repository to say that there never was any history. Also note that the first line in this little script depends somewhat on git internals and may or may not work in the distant future. In any case, I suggest trying it. If it somehow eats your repo for breakfast just delete it and the next sync will re-fetch. -- Rich
Re: [gentoo-user] Portage, git and shallow cloning
On Sat, Jul 14, 2018 at 11:06 AM Rich Freeman wrote: > > git rev-parse HEAD >! .git/shallow > git reflog expire --expire=all --all > git gc --prune=now > Before anybody bangs their head against the wall too much I did end up having syncing issues with this. I suspect the fix is a one-liner, but which one-liner has defied a fair bit of messing around with it. In general the git authors aren't really big on supporting this sort of thing, so it is just a big hack. Doing a local sync to discard history might be an option. Just deleting the repo and re-syncing is another option. But, if somebody comes up with a good fix I'm all ears. -- Rich