Re: [PATCH v7 17/31] merge-recursive: add a new hashmap for storing directory renames

Elijah Newren Sat, 03 Feb 2018 13:34:58 -0800

On Fri, Feb 2, 2018 at 4:26 PM, Stefan Beller <[email protected]> wrote:
> On Tue, Jan 30, 2018 at 3:25 PM, Elijah Newren <[email protected]> wrote:


>> +static void dir_rename_init(struct hashmap *map)
>> +{
>> +       hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0);
>
> See 55c965f3a2 (Merge branch 'sb/hashmap-cleanup', 2017-08-11) or rather
> 201c14e375 (attr.c: drop hashmap_cmp_fn cast, 2017-06-30) for an attempt to
> keep out the casting in the init, but cast the void->type inside the function.

Fixed (locally).

>> +struct dir_rename_entry {
>> +       struct hashmap_entry ent; /* must be the first member! */
>> +       char *dir;
>> +       unsigned non_unique_new_dir:1;
>> +       struct strbuf new_dir;
>> +       struct string_list possible_new_dirs;
>> +};
>
> Could you add comments what these are and if they have any constraints?
> e.g. is 'dir' the full path from the root of the repo and does it end
> with a '/' ?

Good idea, will do.  To help you out with the rest of your review:

yes, dir is full path from the root of the repo, but no it does not
end with a '/'.  The same two constraints hold for how I store
directories in other variables as well.

If someone renames multiple files in a single directory to several
different other directories, then possible_new_dirs is used
(temporarily) to track what those other directories are and how many
files were moved from `dir` to that directory.  The 'winner' (the
target directory with the most renames from `dir` to it) will be
recorded in new_dir.  If there is no 'winner' (a tie for target
directory with the most renames), then non_unique_new_dir is set.
Once we can set either non_unique_new_dir or possible_new_dirs, then
possible_new_dirs is unnecessary and can be freed.

> Having a stringlist of potentially new dirs sounds like the algorithm is
> at least n^2, but how do I know? I'll read on.

Yes, I suppose it's technically n^2, but n is expected to be O(1).
While one can trivially construct a case making n arbitrarily large,
statistically for real world repositories, I expected the mode of n to
be 1 and the mean to be less than 2.  My original idea was to use a
hash for possible_new_dirs, but since hashes are so painful in C and n
should be very small anyway, I didn't bother.  If anyone can find an
example of a real world open source repository (linux, webkit, git,
etc.) with a merge where n is greater than about 10, I'll be
surprised.

Does that address your concern, or does it sound like I'm kicking the
can down the road?  If it's the latter, we can switch it out.

> This patch only adds static functions, so some compilers may even refuse
> to compile after this patch (-Werror -Wunused). I am not sure how strict we
> are there, but as git-bisect still hasn't learned about going only
> into the first
> parent to have bisect working on a "per series" level rather than a "per 
> commit"
> level, it is possible that someone will end up on this commit in the future 
> and
> it won't compile well. :/
>
> Not sure what to recommend, maybe squash this with the patch that makes
> use of these functions?

I'm unsure as well; I can combine it with 19/31 but I thought that
keeping them separate made it slightly easier to review both.

Re: [PATCH v7 17/31] merge-recursive: add a new hashmap for storing directory renames

Reply via email to