Am 22.09.19 um 21:05 schrieb SZEDER Gábor:
> On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote:
>> Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
>>> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
>>>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
>>>>>> If the (re)introduced leak doesn't impact performance and memory
>>>>>> usage too much then duplicating tip_name again in name_rev() or
>>>>>> rather your new create_or_update_name() would likely make the
>>>>>> lifetimes of those string buffers easier to manage.
>>>>>
>>>>> Yeah, the easiest would be when each 'struct rev_name' in the commit
>>>>> slab would have its own 'tip_name' string, but that would result in
>>>>> a lot of duplicated strings and increased memory usage.
>>>>
>>>> I didn't measure how much more memory would be used, though.
>>>
>>> So, I tried the patch below to give each 'struct rev_name' instance
>>> its own copy of 'tip_name', and the memory usage of 'git name-rev
>>> --all' usually increased.
>>>
>>> The increase depends on how many merges and how many refs there are
>>> compared to the number of commits: the fewer merges and refs, the
>>> higher the more the memory usage increased:
>>>
>>> linux: +4.8%
>>> gcc: +7.2%
>>> gecko-dev: +9.2%
>>> webkit: +12.4%
>>> llvm-project: +19.0%
>>
>> Is that the overall memory usage or just for struct rev_name instances
>> and tip_name strings?
>
> It's overall memory usage, the avarage of five runs of:
>
> /usr/bin/time --format='%M' ~/src/git/git name-rev --all
>
>> And how much is that in absolute terms?
>
> git: 29801 -> 28514
> linux: 317018 -> 332218
> gcc: 106462 -> 114140
> gecko: 315448 -> 344486
> webkit: 55847 -> 62780
> llvm: 112867 -> 134384
I only have the first two handy, and I get numbers like this with
master:
git, lots of branches with long names: 3075476
git, local clone, single branch: 1349016
linux, single branch: 1520468
O_o
>> (Perhaps
>> it's worth it to get the memory ownership question off the table at
>> least during the transformation to iterative processing.)
>
> I looked into it only because I got curious, but other than that I
> will definitely play the "beyond the scope of this patch series" card
> :)
Fair enough.
>> I wonder why regular commits even need a struct name_rev. Shouldn't
>> only tips and roots need ones? And perhaps merges and occasional
>> regular "checkpoint" commits, to avoid too many duplicate traversals.
>
> The 'struct rev_name' holds all info that's necessary to determine the
> commit's name. It seems to be much simpler to just attach one to each
> commit and then retrieve it from the commit slab when printing the
> name of the commit than to come up with an algorithm where only a
> sleect set of commits get a 'struct rev_name', including how to access
> those when naming a commit that doesn't have one.
Sure, the lookup of individual commits is much easier once all commits
have name tags attached. Preparing that sounds expensive, though.
It's a trade-off favoring looking up lots of names per program run.
>>> --- >8 ---
>>>
>>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>>> index 6969af76c4..62ab78242b 100644
>>> --- a/builtin/name-rev.c
>>> +++ b/builtin/name-rev.c
>>> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct
>>> commit *commit,
>>> set_commit_rev_name(commit, name);
>>> goto copy_data;
>>> } else if (is_better_name(name, taggerdate, distance, from_tag)) {
>>> + free((char*) name->tip_name);
>>> copy_data:
>>> name->tip_name = tip_name;
>>
>> I would have expected a xstrdup() call here.
>
> But then we'd needed to release the results of all those xstrfmt()
> calls at the callsites of create_or_update_name(), so instead of those
> strdup() calls that you deem unnecessary we would need additional
> free() calls.
Correct. That would be simpler and shouldn't affect peak memory
usage.
René