On Sat, 5 Nov 2016 15:56:20 -0700
Zac Medico <zmed...@gentoo.org> wrote:

> On 11/05/2016 03:22 PM, Michał Górny wrote:
> > On Sat, 5 Nov 2016 15:11:10 -0700
> > Zac Medico <zmed...@gentoo.org> wrote:
> >   
> >> On 11/05/2016 02:50 PM, Michał Górny wrote:  
> >>> On Sat,  5 Nov 2016 13:43:15 -0700
> >>> Zac Medico <zmed...@gentoo.org> wrote:
> >>>     
> >>>> This is necessary in order to avoid "There are too many unreachable
> >>>> loose objects" warnings from automatic git gc calls.
> >>>>
> >>>> X-Gentoo-Bug: 599008
> >>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008
> >>>> ---
> >>>>  pym/portage/sync/modules/git/git.py | 6 ++++++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/pym/portage/sync/modules/git/git.py 
> >>>> b/pym/portage/sync/modules/git/git.py
> >>>> index f288733..c90cf88 100644
> >>>> --- a/pym/portage/sync/modules/git/git.py
> >>>> +++ b/pym/portage/sync/modules/git/git.py
> >>>> @@ -101,6 +101,12 @@ class GitSync(NewBase):
> >>>>                                  writemsg_level(msg + "\n", 
> >>>> level=logging.ERROR, noiselevel=-1)
> >>>>                                  return (e.returncode, False)
> >>>>  
> >>>> +                        # For shallow fetch, unreachable objects must 
> >>>> be pruned
> >>>> +                        # manually, since otherwise automatic git gc 
> >>>> calls will
> >>>> +                        # eventually warn about them (see bug 599008).
> >>>> +                        subprocess.call(['git', 'prune'],
> >>>> +                                
> >>>> cwd=portage._unicode_encode(self.repo.location))
> >>>> +
> >>>>                          git_cmd_opts += " --depth %d" % 
> >>>> self.repo.sync_depth
> >>>>                          git_cmd = "%s fetch %s%s" % (self.bin_command,
> >>>>                                  remote_branch.partition('/')[0], 
> >>>> git_cmd_opts)    
> >>>
> >>> Does it have a performance impact?    
> >>
> >> Yes, it takes about 20 seconds on my laptop. I suppose we could make
> >> this an optional thing, so that those people can do it manually if they
> >> want.  
> > 
> > So we have improvement from at most few seconds for normal 'git pull'
> > to around a minute for shallow pull?  
> 
> Well we've got a least 3 resources to consider:
> 
> 1) network bandwidth
> 2) disk usage
> 3) sync time
> 
> For me, sync time doesn't really matter that much, but I suppose it
> might for some people.

For a common user, network bandwidth is not a problem with git (except
maybe for the huge initial clone). Especially when syncing frequently,
the gain from subsequent --depth=1 is negligible. When syncing rarely,
you probably prefer snapshots anyway.

I doubt this could be of benefit even to dial-up users; that is,
that more time would be saved on fetching than lost on all the ops
needed to make things continue to work. The additional data won't
affect the data plan users much probably either.

Especially that Gentoo is all about fetching distfiles that are huge
compared to the git updates for the repository.

As for the disk usage, again, the difference should be negligible.
The major difference is done on initial fetch. Of course, regularly
pruning the repository will reduce its size. But then, pruning it will
non-shallow fetches would probably achieve a similar effect thanks to
delta compression.

That leaves the sync time. Which is becoming worse than rsync.

-- 
Best regards,
Michał Górny
<http://dev.gentoo.org/~mgorny/>

Attachment: pgpxVXUiCYwdc.pgp
Description: OpenPGP digital signature

Reply via email to