Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix

2013-05-29 Thread Jiang Xin
2013/5/26 Jiang Xin :
> 2013/5/22 Michael Haggerty :
>
>> The old and new versions both seem to be (differently) inconsistent
>> about when the output has a trailing slash.  What is the rule?
>
> The reason for introducing patch 02/15 is that we don't want to reinvent
> the wheel. Patch 06/15 (git-clean: refactor git-clean into two phases)
> needs to save relative_path of each git-clean candidate file/directory in
> del_list, but the public method in path.c (i.e. relative_path) is not
> powerful, and static method in quote.c (i.e. path_relative) can note be
> used directly. One way is to enhanced relative_path in path.c, like this
> patch.
>
> Since we combine the two methods (relative_path in path.c and
> path_relative in quote.c), the new relative_path must be compatible
> with the original two methods.
>
> relative_path in path.c
> ===
>
> relative_path is called only once in setup.c:
>
> if (getenv(GIT_WORK_TREE_ENVIRONMENT))
> setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
>
> set_git_dir(relative_path(git_dir, work_tree));
> initialized = 1;
>
> and set_git_dir only set the environment GIT_DIR_ENVIRONMENT
> like this:
>
> int set_git_dir(const char *path)
> {
> if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
> return error("Could not set GIT_DIR to '%s'", path);
> setup_git_env();
> return 0;
> }
>
> So the only restriction for relative_path is that the return value can
> not be blank. If the abs and base arguments for relative_path are
> the same, the return value should be "." ("./" is also OK), then
> set the envionment GIT_DIR_ENVIRONMENT to "." (or "./").
>
> path_relative in quote.c
> 
>
> We can not simply move "path_relative" in quote.c to "relative_path"
> in path.c directly. It is because:
>
> * The arguments for "relative_path" are from user input. So must
>validate (remove duplicate slash) before use. But "path_relative"
>does not check duplicate slash in arguments.
>
> * "path_relative" will return blank string, if abs and base are the same.
>
> Also I noticed that "quote_path_relative" of quote.c (which calls
> path_relative) will transform the blank string from path_relative to
> "./" (not ".")
>
> char *quote_path_relative(const char *in, int len,
> ...
> const char *rel = path_relative(in, len, &sb, prefix, -1);
> ...
> if (!out->len)
> strbuf_addstr(out, "./");
>
> That's why the "path_relative" in path.c refactor the output of "." into "./".
>
Hi, Junio

I have updated comment and commit log in this patch. You can substitute
commit bd4d1 (path.c: refactor relative_path(), not only strip prefix)
in jx/clean-interactive branch with this one.

-- 8< --
Subject: [PATCH] path.c: refactor relative_path(), not only strip prefix

Original design of relative_path() is simple, just strip the prefix
(*base) from the absolute path (*abs). In most cases, we need a real
relative path, such as: ../foo, ../../bar. That's why there is another
reimplementation (path_relative()) in quote.c.

Borrowed some codes from path_relative() in quote.c to refactor
relative_path() in path.c, so that it could return real relative path,
and user can reuse this function without reimplement his/her own.
The function path_relative() in quote.c will be substituted, and I
would use the new relative_path() function when implement the
interactive git-clean later.

Different results for relative_path() before and after this refactor:

abs path  base path  relative (original)  relative (refactor)
  =  ===  ===
/a/b/c/   /a/b   c/   c/
/a/b//c/  //a///b/   c/   c/
/a/b  /a/b   ../
/a/b/ /a/b   ../
/a/a/b/  /a   ../
/ /a/b/  /../../
/a/c  /a/b/  /a/c ../c
/a/b  (empty)/a/b /a/b
/a/b  (null) /a/b /a/b
(empty)   /a/b   (empty)  ./
(null)(empty)(null)   ./
(null)/a/b   (segfault)   ./

You may notice that return value "." has been changed to "./".
It is because:

 * Function quote_path_relative() in quote.c will show the relative
   path as "./" if abs(in) and base(prefix) are the same.

 * Function relative_path() is called only once (in setup.c), and
   it will be OK for the return value as "./" instead of ".".

Signed-off-by: Jiang Xin 
Signed-off-by: Junio C Hamano 
---
 cache.h   |   2 +-
 path.c| 112 +++---
 setup.c   |   5 ++-
 t/t0060-path-utils.sh |  27 ++--
 test-path-utils.c |   4 +-
 5 f

Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix

2013-05-25 Thread Jiang Xin
2013/5/22 Michael Haggerty :
> Sorry for coming late to the party.

I am on a business travel, and respond late also. ;-)

>
> On 05/22/2013 03:40 AM, Jiang Xin wrote:
>> Different results for relative_path() before and after this refactor:
>>
>> abs path  base path  relative (original)  relative (refactor)
>>   =  ===  ===
>> /a/b/c/   /a/b   c/   c/
>> /a/b//c/  //a///b/   c/   c/
>> /a/b  /a/b   ../
>> /a/b/ /a/b   ../
>> /a/a/b/  /a   ../
>> / /a/b/  /../../
>> /a/c  /a/b/  /a/c ../c
>> /a/b  (empty)/a/b /a/b
>> /a/b  (null) /a/b /a/b
>> (empty)   /a/b   (empty)  ./
>> (null)(empty)(null)   ./
>> (null)/a/b   (segfault)   ./
>
> The old and new versions both seem to be (differently) inconsistent
> about when the output has a trailing slash.  What is the rule?

The reason for introducing patch 02/15 is that we don't want to reinvent
the wheel. Patch 06/15 (git-clean: refactor git-clean into two phases)
needs to save relative_path of each git-clean candidate file/directory in
del_list, but the public method in path.c (i.e. relative_path) is not
powerful, and static method in quote.c (i.e. path_relative) can note be
used directly. One way is to enhanced relative_path in path.c, like this
patch.

Since we combine the two methods (relative_path in path.c and
path_relative in quote.c), the new relative_path must be compatible
with the original two methods.

relative_path in path.c
===

relative_path is called in one place:

if (getenv(GIT_WORK_TREE_ENVIRONMENT))
setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);

set_git_dir(relative_path(git_dir, work_tree));
initialized = 1;

and set_git_dir only set the environment GIT_DIR_ENVIRONMENT
like this:

int set_git_dir(const char *path)
{
if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
return error("Could not set GIT_DIR to '%s'", path);
setup_git_env();
return 0;
}

So the only restraint for relative_path is that the return value can
not be blank. If the abs and base arguments for relative_path are
the same, the return value should be "." ("./" is also OK), then
set the envionment GIT_DIR_ENVIRONMENT to "." (or "./").

path_relative in quote.c


We can not simply move "path_relative" in quote.c to "relative_path"
in path.c directly. It is because:

* The arguments for "relative_path" are from user input. So must
   validate (remove duplicate slash) before use. But "path_relative"
   does not check duplicate slash in arguments.

* "path_relative" will return blank string, if abs and base are the same.

Also I noticed that "quote_path_relative" of quote.c (which calls
path_relative) will transform the blank string from path_relative to
"./" (not ".")

char *quote_path_relative(const char *in, int len,
...
const char *rel = path_relative(in, len, &sb, prefix, -1);
...
if (!out->len)
strbuf_addstr(out, "./");

That's why the "path_relative" in path.c refactor the output of "." into "./".

>
>> diff --git a/path.c b/path.c
>> index 04ff..0174d 100644
>> --- a/path.c
>> +++ b/path.c
>> @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path)
>>   return 0;
>>  }
>>
>> -const char *relative_path(const char *abs, const char *base)
>> +/*
>> + * Give path as relative to prefix.
>> + *
>> + * The strbuf may or may not be used, so do not assume it contains the
>> + * returned path.
>> + */
>> +const char *relative_path(const char *abs, const char *base,
>> +   struct strbuf *sb)
>
> Thanks for adding documentation.  But I think it could be improved:
>
> * The comment refers to "path" and "prefix" but the function parameters
> are "abs" and "base".  I suggest making them agree.

Yes, it will be nice to update the comments.

> * Who owns the memory pointed to by the return value?
>
> * The comment says that "the strbuf may or may not be used".  So why is
> it part of the interface?  (I suppose it is because the strbuf might be
> given ownership of the returned memory if it has to be allocated.)
> Would it be more straightforward to *always* return the result in the
> strbuf?
>
> * Please document when the return value contains a trailing slash and
> also that superfluous internal slashes are (apparently) normalized away.
>
> * Leading double-slashes have a special meaning on some operating
> systems.  The old and new versions of this function both seem to ignore
> differences between initial slashes.  Perhaps somebody who

Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix

2013-05-22 Thread Junio C Hamano
Michael Haggerty  writes:

>> Different results for relative_path() before and after this refactor:
>> 
>> abs path  base path  relative (original)  relative (refactor)
>>   =  ===  ===
>> /a/b/c/   /a/b   c/   c/
>> /a/b//c/  //a///b/   c/   c/
>> /a/b  /a/b   ../
>> /a/b/ /a/b   ../
>> /a/a/b/  /a   ../
>> / /a/b/  /../../
>> /a/c  /a/b/  /a/c ../c
>> /a/b  (empty)/a/b /a/b
>> /a/b  (null) /a/b /a/b
>> (empty)   /a/b   (empty)  ./
>> (null)(empty)(null)   ./
>> (null)/a/b   (segfault)   ./
>
> The old and new versions both seem to be (differently) inconsistent
> about when the output has a trailing slash.  What is the rule?

That is a good point.  At least adding / at the end of "." seems
unneeded, given that the output in some cases have no trailing
slash, forcing a caller who wanted to get a directory to append a
trailing path components to it to check if it needs to add one
before doing so.  Always adding a slash / to the output may sound
consistent, but it is not quite; e.g. "/a/c based on /a/b is ../c"
case may be referring to a non directory /a/c and ensuring a
trailing slash to produce ../c/ is actively wrong.

"The caller knows" rule might work (I am thinking aloud, without
looking at existing callers to see what would break, only to see if
a consistent and simple-to-explain rule is possible).

When the caller asks to turn /a/b relative to /a/b (or /a/b/,
/a//b/./), then we do not know (or we do not want to know) if the
caller means it to be a directory with the intention of appending
something after it, so just return ".".  When the caller asks to
turn /a/b/ relative to the same base, return "./" with the trailing
slash.  Remember if the "abs path" side had a trailing slash,
normalize both input (turning "/./" into "/" and "foo/bar/../" into
foo/", squashing multiple slashes into one, etc.) and then strip the
trailing slash from them, do the usual "comparison and replacement
of leading path components with series of ../" and then append a
slash if the original had one, or something?

>> index 04ff..0174d 100644
>> --- a/path.c
>> +++ b/path.c
>> @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path)
>>  return 0;
>>  }
>>  
>> -const char *relative_path(const char *abs, const char *base)
>> +/*
>> + * Give path as relative to prefix.
>> + *
>> + * The strbuf may or may not be used, so do not assume it contains the
>> + * returned path.
>> + */
>> +const char *relative_path(const char *abs, const char *base,
>> +  struct strbuf *sb)
>
> Thanks for adding documentation.  But I think it could be improved:
>
> * The comment refers to "path" and "prefix" but the function parameters
> are "abs" and "base".  I suggest making them agree.
>
> * Who owns the memory pointed to by the return value?
>
> * The comment says that "the strbuf may or may not be used".  So why is
> it part of the interface?  (I suppose it is because the strbuf might be
> given ownership of the returned memory if it has to be allocated.)
> Would it be more straightforward to *always* return the result in the
> strbuf?

This comes from the original in quote.c, I think.  The caller
supplies a strbuf as a scratchpad area and releasing it is the
caller's responsibility.  If the function does not need any
scratchpad area (i.e. the answer is a substring of the input), the
result may point into the abs.

So the calling convention is:

struct strbuf scratch = STRBUF_INIT;
const char *result = relative(abs, base, &scratch);
use(result);
strbuf_release(&scratch);

and the lifetime rule is "consume it before either abs or scratch
is changed".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix

2013-05-22 Thread Michael Haggerty
Sorry for coming late to the party.

On 05/22/2013 03:40 AM, Jiang Xin wrote:
> Original design of relative_path() is simple, just strip the prefix
> (*base) from the absolute path (*abs). In most cases, we need a real
> relative path, such as: ../foo, ../../bar. That's why there is another
> reimplementation (path_relative()) in quote.c.
> 
> Refactor relative_path() in path.c to return real relative path, so
> that user can reuse this function without reimplement his/her own.
> I will use this method for interactive git-clean later. Some of the
> implementations are borrowed from path_relative() in quote.c.
> 
> Different results for relative_path() before and after this refactor:
> 
> abs path  base path  relative (original)  relative (refactor)
>   =  ===  ===
> /a/b/c/   /a/b   c/   c/
> /a/b//c/  //a///b/   c/   c/
> /a/b  /a/b   ../
> /a/b/ /a/b   ../
> /a/a/b/  /a   ../
> / /a/b/  /../../
> /a/c  /a/b/  /a/c ../c
> /a/b  (empty)/a/b /a/b
> /a/b  (null) /a/b /a/b
> (empty)   /a/b   (empty)  ./
> (null)(empty)(null)   ./
> (null)/a/b   (segfault)   ./

The old and new versions both seem to be (differently) inconsistent
about when the output has a trailing slash.  What is the rule?

> diff --git a/path.c b/path.c
> index 04ff..0174d 100644
> --- a/path.c
> +++ b/path.c
> @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path)
>   return 0;
>  }
>  
> -const char *relative_path(const char *abs, const char *base)
> +/*
> + * Give path as relative to prefix.
> + *
> + * The strbuf may or may not be used, so do not assume it contains the
> + * returned path.
> + */
> +const char *relative_path(const char *abs, const char *base,
> +   struct strbuf *sb)

Thanks for adding documentation.  But I think it could be improved:

* The comment refers to "path" and "prefix" but the function parameters
are "abs" and "base".  I suggest making them agree.

* Who owns the memory pointed to by the return value?

* The comment says that "the strbuf may or may not be used".  So why is
it part of the interface?  (I suppose it is because the strbuf might be
given ownership of the returned memory if it has to be allocated.)
Would it be more straightforward to *always* return the result in the
strbuf?

* Please document when the return value contains a trailing slash and
also that superfluous internal slashes are (apparently) normalized away.

* Leading double-slashes have a special meaning on some operating
systems.  The old and new versions of this function both seem to ignore
differences between initial slashes.  Perhaps somebody who knows the
rules better could say whether this is an issue but I guess the problem
would rarely be encountered in practice.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix

2013-05-21 Thread Jiang Xin
Original design of relative_path() is simple, just strip the prefix
(*base) from the absolute path (*abs). In most cases, we need a real
relative path, such as: ../foo, ../../bar. That's why there is another
reimplementation (path_relative()) in quote.c.

Refactor relative_path() in path.c to return real relative path, so
that user can reuse this function without reimplement his/her own.
I will use this method for interactive git-clean later. Some of the
implementations are borrowed from path_relative() in quote.c.

Different results for relative_path() before and after this refactor:

abs path  base path  relative (original)  relative (refactor)
  =  ===  ===
/a/b/c/   /a/b   c/   c/
/a/b//c/  //a///b/   c/   c/
/a/b  /a/b   ../
/a/b/ /a/b   ../
/a/a/b/  /a   ../
/ /a/b/  /../../
/a/c  /a/b/  /a/c ../c
/a/b  (empty)/a/b /a/b
/a/b  (null) /a/b /a/b
(empty)   /a/b   (empty)  ./
(null)(empty)(null)   ./
(null)/a/b   (segfault)   ./

Signed-off-by: Jiang Xin 
Signed-off-by: Junio C Hamano 
---
 cache.h   |   2 +-
 path.c| 112 +++---
 setup.c   |   5 ++-
 t/t0060-path-utils.sh |  27 ++--
 test-path-utils.c |   4 +-
 5 files changed, 107 insertions(+), 43 deletions(-)

diff --git a/cache.h b/cache.h
index 94ca1..4016c 100644
--- a/cache.h
+++ b/cache.h
@@ -737,7 +737,7 @@ int is_directory(const char *);
 const char *real_path(const char *path);
 const char *real_path_if_valid(const char *path);
 const char *absolute_path(const char *path);
-const char *relative_path(const char *abs, const char *base);
+const char *relative_path(const char *abs, const char *base, struct strbuf 
*sb);
 int normalize_path_copy(char *dst, const char *src);
 int longest_ancestor_length(const char *path, struct string_list *prefixes);
 char *strip_path_suffix(const char *path, const char *suffix);
diff --git a/path.c b/path.c
index 04ff..0174d 100644
--- a/path.c
+++ b/path.c
@@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path)
return 0;
 }
 
-const char *relative_path(const char *abs, const char *base)
+/*
+ * Give path as relative to prefix.
+ *
+ * The strbuf may or may not be used, so do not assume it contains the
+ * returned path.
+ */
+const char *relative_path(const char *abs, const char *base,
+ struct strbuf *sb)
 {
-   static char buf[PATH_MAX + 1];
-   int i = 0, j = 0;
-
-   if (!base || !base[0])
+   int abs_off, base_off, i, j;
+   int abs_len, base_len;
+
+   abs_len = abs ? strlen(abs) : 0;
+   base_len = base ? strlen(base) : 0;
+   abs_off = 0;
+   base_off = 0;
+   i = 0;
+   j = 0;
+
+   if (!abs_len)
+   return "./";
+   else if (!base_len)
return abs;
-   while (base[i]) {
+
+   while (i < base_len && j < abs_len && base[i] == abs[j]) {
if (is_dir_sep(base[i])) {
-   if (!is_dir_sep(abs[j]))
-   return abs;
while (is_dir_sep(base[i]))
i++;
while (is_dir_sep(abs[j]))
j++;
-   continue;
-   } else if (abs[j] != base[i]) {
+   base_off = i;
+   abs_off = j;
+   } else {
+   i++;
+   j++;
+   }
+   }
+
+   if (
+   /* base seems like prefix of abs */
+   i >= base_len &&
+   /*
+* but "/foo" is not a prefix of "/foobar"
+* (i.e. base not end with '/')
+*/
+   base_off < base_len) {
+   if (j >= abs_len) {
+   /* abs="/a/b", base="/a/b" */
+   abs_off = abs_len;
+   } else if (is_dir_sep(abs[j])) {
+   /* abs="/a/b/c", base="/a/b" */
+   while (is_dir_sep(abs[j]))
+   j++;
+   abs_off = j;
+   } else {
+   /* abs="/a/bbb/c", base="/a/b" */
+   i = base_off;
+   }
+   } else if (
+  /* abs is short than base (prefix of base) */
+  j >= abs_len &&
+  /* abs not end with '/' */
+  abs_off < abs_len) {
+   if (is_dir_sep(base[i])) {
+   /* abs="/a/b", base="/a/b/c/" */
+   while (is_dir_sep(base[i]))
+