> /*
> * For
> * "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
> * the "e/foo.c" part is the same, we just want to know that
> * "a/b/c/d" was renamed to "a/b/some/thing/else"
> * so, for this example, this function returns "a/b/c/d" in
> * *old_dir and "a/b/some/thing/else" in *new_dir.
> *
> * Also, if the basename of the file changed, we don't care. We
> * want to know which portion of the directory, if any, changed.
> */
>
> Is that better?
yes, that helps me understanding pretty much.
>
>>> + *
>>> + * Also, if the basename of the file changed, we don't care. We
>>> + * want to know which portion of the directory, if any, changed.
>>> + */
>>> + end_of_old = strrchr(old_path, '/');
>>> + end_of_new = strrchr(new_path, '/');
>>> +
>>> + if (end_of_old == NULL || end_of_new == NULL)
>>> + return;
>>
>> return early as the files are in the top level, and apparently we do
>> not support top level renaming?
>>
>> What about a commit like 81b50f3ce4 (Move 'builtin-*' into
>> a 'builtin/' subdirectory, 2010-02-22) ?
>>
>> Well that specific commit left many files outside the new builtin dir,
>> but conceptually we could see a directory rename of
>>
>> /* => /src/*
>
> We had some internal usecases for want to support a "rename" of the
> toplevel directory into a subdirectory of itself. However, attempting
> to support that opened much too big a can of worms for me. We'd open
> up some big surprises somewhere.
ok, I was just trying to understand the code.
(As said before, I was not asking for having this implemented.)
> In particular, note that not supporting a "rename" of the toplevel
> directory is a special case of not supporting a "rename" of any
> directory to a subdirectory below itself, which in turn is a very
> special case of excluding partial directory renames. I addressed this
> in the cover letter of my original submission, as follows:
>
> """
> Further, there's a basic question about when directory rename detection
> should be applied at all. I have a simple rule:
>
> 3) If a given directory still exists on both sides of a merge, we do
> not consider it to have been renamed.
>
> Rule 3 may sound obvious at first, but it will probably arise as a
> question for some users -- what if someone "mostly" moved a directory but
> still left some files around, or, equivalently (from the perspective of the
> three-way merge that merge-recursive performs), fully renamed a directory
> in one commmit and then recreated that directory in a later commit adding
> some new files and then tried to merge? See the big comment in section 4
> of the new t6043 for further discussion of this rule.
> """
>
> Patch 04/31 is the one that adds that big comment with further discussion.
>
> Maybe there's a way to support toplevel renames, but I think it'd make
> this series longer and more complicated...and may cause more edge
> cases that confuse users.
Thanks for reminding!
>
>>> + while (*--end_of_new == *--end_of_old &&
>>> + end_of_old != old_path &&
>>> + end_of_new != new_path)
>>> + ; /* Do nothing; all in the while loop */
>>
>> We have to compare manually as we'd want to find
>> the first non-equal and there doesn't seem to be a good
>> library function for that.
>>
>> Assuming many repos are UTF8 (including in their paths),
>> how does this work with display characters longer than one char?
>> It should be fine as we cut at the slash?
>
> Oh, UTF-8. Ugh.
> Can UTF-8 characters, other than '/', have a byte whose value matches
> (unsigned char)('/')? If so, then I'll need to figure out how to do
> utf-8 character parsing. Anyone have pointers?
Oh right we are always cutting at '/', which hex 2F, so we cannot split
a codepoint in half accidentally (finding the slash inside a codepoint).
And renaming a directory from one utf8 codepoint to another, which
have the same prefix is not an issue at this layer, too.
(Think of abc -> abd just all of abc/d in one code point, but there
but even for multi code points/ASCII we repeat the prefix when
printing the rename)
>> So the first loop is about counting the number of files in each directory
>> that are renamed and the later while loop is about mapping them?
>
> Close; would adding the following comment at the top of the function help?
>
> /*
> * Typically, we think of a directory rename as all files from a
> * certain directory being moved to a target directory. However,
> * what if someone first moved two files from the original
> * directory in one commit, and then renamed the directory
> * somewhere else in a later commit? At merge time, we just know
> * that files from the original directory went to two different
> * places, and that the bulk of them ended up in the same place.
> * We want each directory rename to follow the bulk of the files
> * from that directory. This function exists to find where the
> * bulk of the files went.
> *
> * The first loop below simply iterates through the list of
> * renames, adding up all the rename source->target locations with
> * a count.
> *
> * The second loop compares the count for each renamed directory
> * and declares the "winning" target location.
> */
>
> Is there any part that remains unclear with that comment? (Also, is
> that comment too long?)
Sounds good to me. Thanks!
>
>>> + /* Strings were xstrndup'ed before inserting into
>>> string-list,
>>> + * so ask string_list to remove the entries for us.
>>> + */
>>
>> comment style.
>
> Thanks.
>
>>> + entry->possible_new_dirs.strdup_strings = 1;
>>
>> Why do we need to set strdup_strings here (so late in the
>> game, we are about to clear it?) Could we set it earlier?
>>
>> Or rather have the string list duplicate the strings instead of
>> get_renamed_dir_portion ?
>
> We didn't strdup the original string (a file path) as-is, we
> strndup'ed a subset of the original string (just the relevant portion
> of the directory). Since we already had to allocate a special string
> for that, making the string list duplicate the strings would have
> caused an extra unnecessary allocation and required us to free the
> memory allocated by get_renamed_dir_portion manually. So, we do need
> this here.
>
> Does this comment make it clearer?:
>
> /*
> * The relevant directory sub-portion of the original full
> * filepaths were xstrndup'ed before inserting into
> * possible_new_dirs, and instead of manually iterating the
> * list and free'ing each, just lie and tell
> * possible_new_dirs that it did the strdup'ing so that it
> * will free them for us.
> */
> entry->possible_new_dirs.strdup_strings = 1;
> string_list_clear(&entry->possible_new_dirs, 1);
Phrased this way, it makes sense.
Thanks,
Stefan