On 11/06/2016 01:59 AM, Michał Górny wrote: > On Sat, 5 Nov 2016 15:56:20 -0700 > Zac Medico <zmed...@gentoo.org> wrote: > >> On 11/05/2016 03:22 PM, Michał Górny wrote: >>> On Sat, 5 Nov 2016 15:11:10 -0700 >>> Zac Medico <zmed...@gentoo.org> wrote: >>> >>>> On 11/05/2016 02:50 PM, Michał Górny wrote: >>>>> On Sat, 5 Nov 2016 13:43:15 -0700 >>>>> Zac Medico <zmed...@gentoo.org> wrote: >>>>> >>>>>> This is necessary in order to avoid "There are too many unreachable >>>>>> loose objects" warnings from automatic git gc calls. >>>>>> >>>>>> X-Gentoo-Bug: 599008 >>>>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008 >>>>>> --- >>>>>> pym/portage/sync/modules/git/git.py | 6 ++++++ >>>>>> 1 file changed, 6 insertions(+) >>>>>> >>>>>> diff --git a/pym/portage/sync/modules/git/git.py >>>>>> b/pym/portage/sync/modules/git/git.py >>>>>> index f288733..c90cf88 100644 >>>>>> --- a/pym/portage/sync/modules/git/git.py >>>>>> +++ b/pym/portage/sync/modules/git/git.py >>>>>> @@ -101,6 +101,12 @@ class GitSync(NewBase): >>>>>> writemsg_level(msg + "\n", >>>>>> level=logging.ERROR, noiselevel=-1) >>>>>> return (e.returncode, False) >>>>>> >>>>>> + # For shallow fetch, unreachable objects must >>>>>> be pruned >>>>>> + # manually, since otherwise automatic git gc >>>>>> calls will >>>>>> + # eventually warn about them (see bug 599008). >>>>>> + subprocess.call(['git', 'prune'], >>>>>> + >>>>>> cwd=portage._unicode_encode(self.repo.location)) >>>>>> + >>>>>> git_cmd_opts += " --depth %d" % >>>>>> self.repo.sync_depth >>>>>> git_cmd = "%s fetch %s%s" % (self.bin_command, >>>>>> remote_branch.partition('/')[0], >>>>>> git_cmd_opts) >>>>> >>>>> Does it have a performance impact? >>>> >>>> Yes, it takes about 20 seconds on my laptop. I suppose we could make >>>> this an optional thing, so that those people can do it manually if they >>>> want. >>> >>> So we have improvement from at most few seconds for normal 'git pull' >>> to around a minute for shallow pull? >> >> Well we've got a least 3 resources to consider: >> >> 1) network bandwidth >> 2) disk usage >> 3) sync time >> >> For me, sync time doesn't really matter that much, but I suppose it >> might for some people. > > For a common user, network bandwidth is not a problem with git (except > maybe for the huge initial clone). Especially when syncing frequently, > the gain from subsequent --depth=1 is negligible. When syncing rarely, > you probably prefer snapshots anyway. > > I doubt this could be of benefit even to dial-up users; that is, > that more time would be saved on fetching than lost on all the ops > needed to make things continue to work. The additional data won't > affect the data plan users much probably either. > > Especially that Gentoo is all about fetching distfiles that are huge > compared to the git updates for the repository. > > As for the disk usage, again, the difference should be negligible. > The major difference is done on initial fetch. Of course, regularly > pruning the repository will reduce its size. But then, pruning it will > non-shallow fetches would probably achieve a similar effect thanks to > delta compression. > > That leaves the sync time. Which is becoming worse than rsync.
Maybe this will be a reasonable default: * add a separate clone-depth setting which defaults to 1 * set the default sync-depth setting to 0 (unlimited) If the user enables shallow fetch by setting sync-depth to something other than 0, they I think we should call whatever commands are necessary to keep the repository healthy (including `git prune` if necessary). -- Thanks, Zac