On 11/06/2016 01:59 AM, Michał Górny wrote:
> On Sat, 5 Nov 2016 15:56:20 -0700
> Zac Medico <zmed...@gentoo.org> wrote:
> 
>> On 11/05/2016 03:22 PM, Michał Górny wrote:
>>> On Sat, 5 Nov 2016 15:11:10 -0700
>>> Zac Medico <zmed...@gentoo.org> wrote:
>>>   
>>>> On 11/05/2016 02:50 PM, Michał Górny wrote:  
>>>>> On Sat,  5 Nov 2016 13:43:15 -0700
>>>>> Zac Medico <zmed...@gentoo.org> wrote:
>>>>>     
>>>>>> This is necessary in order to avoid "There are too many unreachable
>>>>>> loose objects" warnings from automatic git gc calls.
>>>>>>
>>>>>> X-Gentoo-Bug: 599008
>>>>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008
>>>>>> ---
>>>>>>  pym/portage/sync/modules/git/git.py | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/pym/portage/sync/modules/git/git.py 
>>>>>> b/pym/portage/sync/modules/git/git.py
>>>>>> index f288733..c90cf88 100644
>>>>>> --- a/pym/portage/sync/modules/git/git.py
>>>>>> +++ b/pym/portage/sync/modules/git/git.py
>>>>>> @@ -101,6 +101,12 @@ class GitSync(NewBase):
>>>>>>                                  writemsg_level(msg + "\n", 
>>>>>> level=logging.ERROR, noiselevel=-1)
>>>>>>                                  return (e.returncode, False)
>>>>>>  
>>>>>> +                        # For shallow fetch, unreachable objects must 
>>>>>> be pruned
>>>>>> +                        # manually, since otherwise automatic git gc 
>>>>>> calls will
>>>>>> +                        # eventually warn about them (see bug 599008).
>>>>>> +                        subprocess.call(['git', 'prune'],
>>>>>> +                                
>>>>>> cwd=portage._unicode_encode(self.repo.location))
>>>>>> +
>>>>>>                          git_cmd_opts += " --depth %d" % 
>>>>>> self.repo.sync_depth
>>>>>>                          git_cmd = "%s fetch %s%s" % (self.bin_command,
>>>>>>                                  remote_branch.partition('/')[0], 
>>>>>> git_cmd_opts)    
>>>>>
>>>>> Does it have a performance impact?    
>>>>
>>>> Yes, it takes about 20 seconds on my laptop. I suppose we could make
>>>> this an optional thing, so that those people can do it manually if they
>>>> want.  
>>>
>>> So we have improvement from at most few seconds for normal 'git pull'
>>> to around a minute for shallow pull?  
>>
>> Well we've got a least 3 resources to consider:
>>
>> 1) network bandwidth
>> 2) disk usage
>> 3) sync time
>>
>> For me, sync time doesn't really matter that much, but I suppose it
>> might for some people.
> 
> For a common user, network bandwidth is not a problem with git (except
> maybe for the huge initial clone). Especially when syncing frequently,
> the gain from subsequent --depth=1 is negligible. When syncing rarely,
> you probably prefer snapshots anyway.
> 
> I doubt this could be of benefit even to dial-up users; that is,
> that more time would be saved on fetching than lost on all the ops
> needed to make things continue to work. The additional data won't
> affect the data plan users much probably either.
> 
> Especially that Gentoo is all about fetching distfiles that are huge
> compared to the git updates for the repository.
> 
> As for the disk usage, again, the difference should be negligible.
> The major difference is done on initial fetch. Of course, regularly
> pruning the repository will reduce its size. But then, pruning it will
> non-shallow fetches would probably achieve a similar effect thanks to
> delta compression.
> 
> That leaves the sync time. Which is becoming worse than rsync.

Maybe this will be a reasonable default:

* add a separate clone-depth setting which defaults to 1
* set the default sync-depth setting to 0 (unlimited)

If the user enables shallow fetch by setting sync-depth to something
other than 0, they I think we should call whatever commands are
necessary to keep the repository healthy (including `git prune` if
necessary).
-- 
Thanks,
Zac

Reply via email to