Hi Elijah,

sorry about the blast from the past, but I just stumbled over something
I could not even find any discussion about:

On Thu, 19 Apr 2018, Elijah Newren wrote:

> This populates a set of directory renames for us.  The set of directory
> renames is not yet used, but will be in subsequent commits.
>
> Note that the use of a string_list for possible_new_dirs in the new
> dir_rename_entry struct implies an O(n^2) algorithm; however, in practice
> I expect the number of distinct directories that files were renamed into
> from a single original directory to be O(1).  My guess is that n has a
> mode of 1 and a mean of less than 2, so, for now, string_list seems good
> enough for possible_new_dirs.
>
> Reviewed-by: Stefan Beller <sbel...@google.com>
> Signed-off-by: Elijah Newren <new...@gmail.com>
> Signed-off-by: Junio C Hamano <gits...@pobox.com>
> ---
>  merge-recursive.c | 224 +++++++++++++++++++++++++++++++++++++++++++++-
>  merge-recursive.h |  18 ++++
>  2 files changed, 239 insertions(+), 3 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 30894c1cc7..22c5e8e5c9 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> [...]
> @@ -1357,6 +1395,169 @@ static struct diff_queue_struct *get_diffpairs(struct 
> merge_options *o,
>       return ret;
>  }
>
> +static void get_renamed_dir_portion(const char *old_path, const char 
> *new_path,
> +                                 char **old_dir, char **new_dir)
> +{
> +     char *end_of_old, *end_of_new;
> +     int old_len, new_len;
> +
> +     *old_dir = NULL;
> +     *new_dir = NULL;
> +
> +     /*
> +      * For
> +      *    "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
> +      * the "e/foo.c" part is the same, we just want to know that
> +      *    "a/b/c/d" was renamed to "a/b/some/thing/else"
> +      * so, for this example, this function returns "a/b/c/d" in
> +      * *old_dir and "a/b/some/thing/else" in *new_dir.
> +      *
> +      * Also, if the basename of the file changed, we don't care.  We
> +      * want to know which portion of the directory, if any, changed.
> +      */
> +     end_of_old = strrchr(old_path, '/');
> +     end_of_new = strrchr(new_path, '/');
> +
> +     if (end_of_old == NULL || end_of_new == NULL)
> +             return;
> +     while (*--end_of_new == *--end_of_old &&
> +            end_of_old != old_path &&
> +            end_of_new != new_path)
> +             ; /* Do nothing; all in the while loop */
> +     /*
> +      * We've found the first non-matching character in the directory
> +      * paths.  That means the current directory we were comparing
> +      * represents the rename.  Move end_of_old and end_of_new back
> +      * to the full directory name.
> +      */
> +     if (*end_of_old == '/')
> +             end_of_old++;
> +     if (*end_of_old != '/')
> +             end_of_new++;

Is this intentional? Even after thinking about it for fifteen minutes, I
think it was probable meant to test for `*end_of_new == '/'` instead of
`*end_of_old != '/'`. And...

> +     end_of_old = strchr(end_of_old, '/');
> +     end_of_new = strchr(end_of_new, '/');

... while I satisfied myself that these calls cannot return `NULL` at
this point, it took quite a few minutes of reasoning.

So I think we might want to rewrite these past 6 lines, to make
everything quite a bit more obvious, like this:

        if (end_of_old != old_path)
                while (*(++end_of_old) != '/')
                        ; /* keep looking */
        if (end_of_new != new_path)
                while (*(++end_of_new) != '/')
                        ; /* keep looking */

There is _still_ one thing that makes this harder than trivial to reason
about: the case where one of `*end_of_old` and `*end_of_new` is a slash.
At this point, we assume that `*end_of_old != *end_of_new` (more about
that assumption in the next paragraph), therefore only one of them can
be a slash, and we want to advance beyond it. But even if the pointer
does not point at a slash, we want to look for one, so we want to
advance beyond it.

I also think that we need an extra guard: we do not handle the case
`a/b/c` -> `a/b/d` well. As stated a few lines above, "if the basename
of the file changed, we don't care". So we start looking at the last
slash, then go backwards, and since everything matches, end up with
`end_of_old == old_path` and `end_of_new == new_path`. The current code
will advance `end_of_new` (which I think is wrong) and then looks for
the next slash in both `end_of_new` and `end_of_old` (which is also
wrong).

Is my reading correct?

Ciao,
Dscho

> +
> +     /*
> +      * It may have been the case that old_path and new_path were the same
> +      * directory all along.  Don't claim a rename if they're the same.
> +      */
> +     old_len = end_of_old - old_path;
> +     new_len = end_of_new - new_path;
> +
> +     if (old_len != new_len || strncmp(old_path, new_path, old_len)) {
> +             *old_dir = xstrndup(old_path, old_len);
> +             *new_dir = xstrndup(new_path, new_len);
> +     }
> +}
> [...]

Reply via email to