Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix
2013/5/26 Jiang Xin : > 2013/5/22 Michael Haggerty : > >> The old and new versions both seem to be (differently) inconsistent >> about when the output has a trailing slash. What is the rule? > > The reason for introducing patch 02/15 is that we don't want to reinvent > the wheel. Patch 06/15 (git-clean: refactor git-clean into two phases) > needs to save relative_path of each git-clean candidate file/directory in > del_list, but the public method in path.c (i.e. relative_path) is not > powerful, and static method in quote.c (i.e. path_relative) can note be > used directly. One way is to enhanced relative_path in path.c, like this > patch. > > Since we combine the two methods (relative_path in path.c and > path_relative in quote.c), the new relative_path must be compatible > with the original two methods. > > relative_path in path.c > === > > relative_path is called only once in setup.c: > > if (getenv(GIT_WORK_TREE_ENVIRONMENT)) > setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1); > > set_git_dir(relative_path(git_dir, work_tree)); > initialized = 1; > > and set_git_dir only set the environment GIT_DIR_ENVIRONMENT > like this: > > int set_git_dir(const char *path) > { > if (setenv(GIT_DIR_ENVIRONMENT, path, 1)) > return error("Could not set GIT_DIR to '%s'", path); > setup_git_env(); > return 0; > } > > So the only restriction for relative_path is that the return value can > not be blank. If the abs and base arguments for relative_path are > the same, the return value should be "." ("./" is also OK), then > set the envionment GIT_DIR_ENVIRONMENT to "." (or "./"). > > path_relative in quote.c > > > We can not simply move "path_relative" in quote.c to "relative_path" > in path.c directly. It is because: > > * The arguments for "relative_path" are from user input. So must >validate (remove duplicate slash) before use. But "path_relative" >does not check duplicate slash in arguments. > > * "path_relative" will return blank string, if abs and base are the same. > > Also I noticed that "quote_path_relative" of quote.c (which calls > path_relative) will transform the blank string from path_relative to > "./" (not ".") > > char *quote_path_relative(const char *in, int len, > ... > const char *rel = path_relative(in, len, &sb, prefix, -1); > ... > if (!out->len) > strbuf_addstr(out, "./"); > > That's why the "path_relative" in path.c refactor the output of "." into "./". > Hi, Junio I have updated comment and commit log in this patch. You can substitute commit bd4d1 (path.c: refactor relative_path(), not only strip prefix) in jx/clean-interactive branch with this one. -- 8< -- Subject: [PATCH] path.c: refactor relative_path(), not only strip prefix Original design of relative_path() is simple, just strip the prefix (*base) from the absolute path (*abs). In most cases, we need a real relative path, such as: ../foo, ../../bar. That's why there is another reimplementation (path_relative()) in quote.c. Borrowed some codes from path_relative() in quote.c to refactor relative_path() in path.c, so that it could return real relative path, and user can reuse this function without reimplement his/her own. The function path_relative() in quote.c will be substituted, and I would use the new relative_path() function when implement the interactive git-clean later. Different results for relative_path() before and after this refactor: abs path base path relative (original) relative (refactor) = === === /a/b/c/ /a/b c/ c/ /a/b//c/ //a///b/ c/ c/ /a/b /a/b ../ /a/b/ /a/b ../ /a/a/b/ /a ../ / /a/b/ /../../ /a/c /a/b/ /a/c ../c /a/b (empty)/a/b /a/b /a/b (null) /a/b /a/b (empty) /a/b (empty) ./ (null)(empty)(null) ./ (null)/a/b (segfault) ./ You may notice that return value "." has been changed to "./". It is because: * Function quote_path_relative() in quote.c will show the relative path as "./" if abs(in) and base(prefix) are the same. * Function relative_path() is called only once (in setup.c), and it will be OK for the return value as "./" instead of ".". Signed-off-by: Jiang Xin Signed-off-by: Junio C Hamano --- cache.h | 2 +- path.c| 112 +++--- setup.c | 5 ++- t/t0060-path-utils.sh | 27 ++-- test-path-utils.c | 4 +- 5 f
Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix
2013/5/22 Michael Haggerty : > Sorry for coming late to the party. I am on a business travel, and respond late also. ;-) > > On 05/22/2013 03:40 AM, Jiang Xin wrote: >> Different results for relative_path() before and after this refactor: >> >> abs path base path relative (original) relative (refactor) >> = === === >> /a/b/c/ /a/b c/ c/ >> /a/b//c/ //a///b/ c/ c/ >> /a/b /a/b ../ >> /a/b/ /a/b ../ >> /a/a/b/ /a ../ >> / /a/b/ /../../ >> /a/c /a/b/ /a/c ../c >> /a/b (empty)/a/b /a/b >> /a/b (null) /a/b /a/b >> (empty) /a/b (empty) ./ >> (null)(empty)(null) ./ >> (null)/a/b (segfault) ./ > > The old and new versions both seem to be (differently) inconsistent > about when the output has a trailing slash. What is the rule? The reason for introducing patch 02/15 is that we don't want to reinvent the wheel. Patch 06/15 (git-clean: refactor git-clean into two phases) needs to save relative_path of each git-clean candidate file/directory in del_list, but the public method in path.c (i.e. relative_path) is not powerful, and static method in quote.c (i.e. path_relative) can note be used directly. One way is to enhanced relative_path in path.c, like this patch. Since we combine the two methods (relative_path in path.c and path_relative in quote.c), the new relative_path must be compatible with the original two methods. relative_path in path.c === relative_path is called in one place: if (getenv(GIT_WORK_TREE_ENVIRONMENT)) setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1); set_git_dir(relative_path(git_dir, work_tree)); initialized = 1; and set_git_dir only set the environment GIT_DIR_ENVIRONMENT like this: int set_git_dir(const char *path) { if (setenv(GIT_DIR_ENVIRONMENT, path, 1)) return error("Could not set GIT_DIR to '%s'", path); setup_git_env(); return 0; } So the only restraint for relative_path is that the return value can not be blank. If the abs and base arguments for relative_path are the same, the return value should be "." ("./" is also OK), then set the envionment GIT_DIR_ENVIRONMENT to "." (or "./"). path_relative in quote.c We can not simply move "path_relative" in quote.c to "relative_path" in path.c directly. It is because: * The arguments for "relative_path" are from user input. So must validate (remove duplicate slash) before use. But "path_relative" does not check duplicate slash in arguments. * "path_relative" will return blank string, if abs and base are the same. Also I noticed that "quote_path_relative" of quote.c (which calls path_relative) will transform the blank string from path_relative to "./" (not ".") char *quote_path_relative(const char *in, int len, ... const char *rel = path_relative(in, len, &sb, prefix, -1); ... if (!out->len) strbuf_addstr(out, "./"); That's why the "path_relative" in path.c refactor the output of "." into "./". > >> diff --git a/path.c b/path.c >> index 04ff..0174d 100644 >> --- a/path.c >> +++ b/path.c >> @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path) >> return 0; >> } >> >> -const char *relative_path(const char *abs, const char *base) >> +/* >> + * Give path as relative to prefix. >> + * >> + * The strbuf may or may not be used, so do not assume it contains the >> + * returned path. >> + */ >> +const char *relative_path(const char *abs, const char *base, >> + struct strbuf *sb) > > Thanks for adding documentation. But I think it could be improved: > > * The comment refers to "path" and "prefix" but the function parameters > are "abs" and "base". I suggest making them agree. Yes, it will be nice to update the comments. > * Who owns the memory pointed to by the return value? > > * The comment says that "the strbuf may or may not be used". So why is > it part of the interface? (I suppose it is because the strbuf might be > given ownership of the returned memory if it has to be allocated.) > Would it be more straightforward to *always* return the result in the > strbuf? > > * Please document when the return value contains a trailing slash and > also that superfluous internal slashes are (apparently) normalized away. > > * Leading double-slashes have a special meaning on some operating > systems. The old and new versions of this function both seem to ignore > differences between initial slashes. Perhaps somebody who
Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix
Michael Haggerty writes: >> Different results for relative_path() before and after this refactor: >> >> abs path base path relative (original) relative (refactor) >> = === === >> /a/b/c/ /a/b c/ c/ >> /a/b//c/ //a///b/ c/ c/ >> /a/b /a/b ../ >> /a/b/ /a/b ../ >> /a/a/b/ /a ../ >> / /a/b/ /../../ >> /a/c /a/b/ /a/c ../c >> /a/b (empty)/a/b /a/b >> /a/b (null) /a/b /a/b >> (empty) /a/b (empty) ./ >> (null)(empty)(null) ./ >> (null)/a/b (segfault) ./ > > The old and new versions both seem to be (differently) inconsistent > about when the output has a trailing slash. What is the rule? That is a good point. At least adding / at the end of "." seems unneeded, given that the output in some cases have no trailing slash, forcing a caller who wanted to get a directory to append a trailing path components to it to check if it needs to add one before doing so. Always adding a slash / to the output may sound consistent, but it is not quite; e.g. "/a/c based on /a/b is ../c" case may be referring to a non directory /a/c and ensuring a trailing slash to produce ../c/ is actively wrong. "The caller knows" rule might work (I am thinking aloud, without looking at existing callers to see what would break, only to see if a consistent and simple-to-explain rule is possible). When the caller asks to turn /a/b relative to /a/b (or /a/b/, /a//b/./), then we do not know (or we do not want to know) if the caller means it to be a directory with the intention of appending something after it, so just return ".". When the caller asks to turn /a/b/ relative to the same base, return "./" with the trailing slash. Remember if the "abs path" side had a trailing slash, normalize both input (turning "/./" into "/" and "foo/bar/../" into foo/", squashing multiple slashes into one, etc.) and then strip the trailing slash from them, do the usual "comparison and replacement of leading path components with series of ../" and then append a slash if the original had one, or something? >> index 04ff..0174d 100644 >> --- a/path.c >> +++ b/path.c >> @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path) >> return 0; >> } >> >> -const char *relative_path(const char *abs, const char *base) >> +/* >> + * Give path as relative to prefix. >> + * >> + * The strbuf may or may not be used, so do not assume it contains the >> + * returned path. >> + */ >> +const char *relative_path(const char *abs, const char *base, >> + struct strbuf *sb) > > Thanks for adding documentation. But I think it could be improved: > > * The comment refers to "path" and "prefix" but the function parameters > are "abs" and "base". I suggest making them agree. > > * Who owns the memory pointed to by the return value? > > * The comment says that "the strbuf may or may not be used". So why is > it part of the interface? (I suppose it is because the strbuf might be > given ownership of the returned memory if it has to be allocated.) > Would it be more straightforward to *always* return the result in the > strbuf? This comes from the original in quote.c, I think. The caller supplies a strbuf as a scratchpad area and releasing it is the caller's responsibility. If the function does not need any scratchpad area (i.e. the answer is a substring of the input), the result may point into the abs. So the calling convention is: struct strbuf scratch = STRBUF_INIT; const char *result = relative(abs, base, &scratch); use(result); strbuf_release(&scratch); and the lifetime rule is "consume it before either abs or scratch is changed". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix
Sorry for coming late to the party. On 05/22/2013 03:40 AM, Jiang Xin wrote: > Original design of relative_path() is simple, just strip the prefix > (*base) from the absolute path (*abs). In most cases, we need a real > relative path, such as: ../foo, ../../bar. That's why there is another > reimplementation (path_relative()) in quote.c. > > Refactor relative_path() in path.c to return real relative path, so > that user can reuse this function without reimplement his/her own. > I will use this method for interactive git-clean later. Some of the > implementations are borrowed from path_relative() in quote.c. > > Different results for relative_path() before and after this refactor: > > abs path base path relative (original) relative (refactor) > = === === > /a/b/c/ /a/b c/ c/ > /a/b//c/ //a///b/ c/ c/ > /a/b /a/b ../ > /a/b/ /a/b ../ > /a/a/b/ /a ../ > / /a/b/ /../../ > /a/c /a/b/ /a/c ../c > /a/b (empty)/a/b /a/b > /a/b (null) /a/b /a/b > (empty) /a/b (empty) ./ > (null)(empty)(null) ./ > (null)/a/b (segfault) ./ The old and new versions both seem to be (differently) inconsistent about when the output has a trailing slash. What is the rule? > diff --git a/path.c b/path.c > index 04ff..0174d 100644 > --- a/path.c > +++ b/path.c > @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path) > return 0; > } > > -const char *relative_path(const char *abs, const char *base) > +/* > + * Give path as relative to prefix. > + * > + * The strbuf may or may not be used, so do not assume it contains the > + * returned path. > + */ > +const char *relative_path(const char *abs, const char *base, > + struct strbuf *sb) Thanks for adding documentation. But I think it could be improved: * The comment refers to "path" and "prefix" but the function parameters are "abs" and "base". I suggest making them agree. * Who owns the memory pointed to by the return value? * The comment says that "the strbuf may or may not be used". So why is it part of the interface? (I suppose it is because the strbuf might be given ownership of the returned memory if it has to be allocated.) Would it be more straightforward to *always* return the result in the strbuf? * Please document when the return value contains a trailing slash and also that superfluous internal slashes are (apparently) normalized away. * Leading double-slashes have a special meaning on some operating systems. The old and new versions of this function both seem to ignore differences between initial slashes. Perhaps somebody who knows the rules better could say whether this is an issue but I guess the problem would rarely be encountered in practice. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v13 02/15] path.c: refactor relative_path(), not only strip prefix
Original design of relative_path() is simple, just strip the prefix (*base) from the absolute path (*abs). In most cases, we need a real relative path, such as: ../foo, ../../bar. That's why there is another reimplementation (path_relative()) in quote.c. Refactor relative_path() in path.c to return real relative path, so that user can reuse this function without reimplement his/her own. I will use this method for interactive git-clean later. Some of the implementations are borrowed from path_relative() in quote.c. Different results for relative_path() before and after this refactor: abs path base path relative (original) relative (refactor) = === === /a/b/c/ /a/b c/ c/ /a/b//c/ //a///b/ c/ c/ /a/b /a/b ../ /a/b/ /a/b ../ /a/a/b/ /a ../ / /a/b/ /../../ /a/c /a/b/ /a/c ../c /a/b (empty)/a/b /a/b /a/b (null) /a/b /a/b (empty) /a/b (empty) ./ (null)(empty)(null) ./ (null)/a/b (segfault) ./ Signed-off-by: Jiang Xin Signed-off-by: Junio C Hamano --- cache.h | 2 +- path.c| 112 +++--- setup.c | 5 ++- t/t0060-path-utils.sh | 27 ++-- test-path-utils.c | 4 +- 5 files changed, 107 insertions(+), 43 deletions(-) diff --git a/cache.h b/cache.h index 94ca1..4016c 100644 --- a/cache.h +++ b/cache.h @@ -737,7 +737,7 @@ int is_directory(const char *); const char *real_path(const char *path); const char *real_path_if_valid(const char *path); const char *absolute_path(const char *path); -const char *relative_path(const char *abs, const char *base); +const char *relative_path(const char *abs, const char *base, struct strbuf *sb); int normalize_path_copy(char *dst, const char *src); int longest_ancestor_length(const char *path, struct string_list *prefixes); char *strip_path_suffix(const char *path, const char *suffix); diff --git a/path.c b/path.c index 04ff..0174d 100644 --- a/path.c +++ b/path.c @@ -441,42 +441,104 @@ int adjust_shared_perm(const char *path) return 0; } -const char *relative_path(const char *abs, const char *base) +/* + * Give path as relative to prefix. + * + * The strbuf may or may not be used, so do not assume it contains the + * returned path. + */ +const char *relative_path(const char *abs, const char *base, + struct strbuf *sb) { - static char buf[PATH_MAX + 1]; - int i = 0, j = 0; - - if (!base || !base[0]) + int abs_off, base_off, i, j; + int abs_len, base_len; + + abs_len = abs ? strlen(abs) : 0; + base_len = base ? strlen(base) : 0; + abs_off = 0; + base_off = 0; + i = 0; + j = 0; + + if (!abs_len) + return "./"; + else if (!base_len) return abs; - while (base[i]) { + + while (i < base_len && j < abs_len && base[i] == abs[j]) { if (is_dir_sep(base[i])) { - if (!is_dir_sep(abs[j])) - return abs; while (is_dir_sep(base[i])) i++; while (is_dir_sep(abs[j])) j++; - continue; - } else if (abs[j] != base[i]) { + base_off = i; + abs_off = j; + } else { + i++; + j++; + } + } + + if ( + /* base seems like prefix of abs */ + i >= base_len && + /* +* but "/foo" is not a prefix of "/foobar" +* (i.e. base not end with '/') +*/ + base_off < base_len) { + if (j >= abs_len) { + /* abs="/a/b", base="/a/b" */ + abs_off = abs_len; + } else if (is_dir_sep(abs[j])) { + /* abs="/a/b/c", base="/a/b" */ + while (is_dir_sep(abs[j])) + j++; + abs_off = j; + } else { + /* abs="/a/bbb/c", base="/a/b" */ + i = base_off; + } + } else if ( + /* abs is short than base (prefix of base) */ + j >= abs_len && + /* abs not end with '/' */ + abs_off < abs_len) { + if (is_dir_sep(base[i])) { + /* abs="/a/b", base="/a/b/c/" */ + while (is_dir_sep(base[i])) +