On Sat, 5 Nov 2016 15:56:20 -0700 Zac Medico <zmed...@gentoo.org> wrote:
> On 11/05/2016 03:22 PM, Michał Górny wrote: > > On Sat, 5 Nov 2016 15:11:10 -0700 > > Zac Medico <zmed...@gentoo.org> wrote: > > > >> On 11/05/2016 02:50 PM, Michał Górny wrote: > >>> On Sat, 5 Nov 2016 13:43:15 -0700 > >>> Zac Medico <zmed...@gentoo.org> wrote: > >>> > >>>> This is necessary in order to avoid "There are too many unreachable > >>>> loose objects" warnings from automatic git gc calls. > >>>> > >>>> X-Gentoo-Bug: 599008 > >>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008 > >>>> --- > >>>> pym/portage/sync/modules/git/git.py | 6 ++++++ > >>>> 1 file changed, 6 insertions(+) > >>>> > >>>> diff --git a/pym/portage/sync/modules/git/git.py > >>>> b/pym/portage/sync/modules/git/git.py > >>>> index f288733..c90cf88 100644 > >>>> --- a/pym/portage/sync/modules/git/git.py > >>>> +++ b/pym/portage/sync/modules/git/git.py > >>>> @@ -101,6 +101,12 @@ class GitSync(NewBase): > >>>> writemsg_level(msg + "\n", > >>>> level=logging.ERROR, noiselevel=-1) > >>>> return (e.returncode, False) > >>>> > >>>> + # For shallow fetch, unreachable objects must > >>>> be pruned > >>>> + # manually, since otherwise automatic git gc > >>>> calls will > >>>> + # eventually warn about them (see bug 599008). > >>>> + subprocess.call(['git', 'prune'], > >>>> + > >>>> cwd=portage._unicode_encode(self.repo.location)) > >>>> + > >>>> git_cmd_opts += " --depth %d" % > >>>> self.repo.sync_depth > >>>> git_cmd = "%s fetch %s%s" % (self.bin_command, > >>>> remote_branch.partition('/')[0], > >>>> git_cmd_opts) > >>> > >>> Does it have a performance impact? > >> > >> Yes, it takes about 20 seconds on my laptop. I suppose we could make > >> this an optional thing, so that those people can do it manually if they > >> want. > > > > So we have improvement from at most few seconds for normal 'git pull' > > to around a minute for shallow pull? > > Well we've got a least 3 resources to consider: > > 1) network bandwidth > 2) disk usage > 3) sync time > > For me, sync time doesn't really matter that much, but I suppose it > might for some people. For a common user, network bandwidth is not a problem with git (except maybe for the huge initial clone). Especially when syncing frequently, the gain from subsequent --depth=1 is negligible. When syncing rarely, you probably prefer snapshots anyway. I doubt this could be of benefit even to dial-up users; that is, that more time would be saved on fetching than lost on all the ops needed to make things continue to work. The additional data won't affect the data plan users much probably either. Especially that Gentoo is all about fetching distfiles that are huge compared to the git updates for the repository. As for the disk usage, again, the difference should be negligible. The major difference is done on initial fetch. Of course, regularly pruning the repository will reduce its size. But then, pruning it will non-shallow fetches would probably achieve a similar effect thanks to delta compression. That leaves the sync time. Which is becoming worse than rsync. -- Best regards, Michał Górny <http://dev.gentoo.org/~mgorny/>
pgpxVXUiCYwdc.pgp
Description: OpenPGP digital signature