Re: [PATCH v5 4/5] setup: Add 'abspath_part_inside_repo' function

2014-02-04 Thread Junio C Hamano
Martin Erik Werner martinerikwer...@gmail.com writes:

 Will you add that test or should I place it in the series with you as
 author?

Either is fine.  Thanks.

 On Mon, Feb 03, 2014 at 01:00:48PM -0800, Junio C Hamano wrote:
 Martin Erik Werner martinerikwer...@gmail.com writes:
 
  The path being exactly equal to the work tree is handled separately,
  since then there is no directory separator between the work tree and
  in-repo part.
 
 What is an in-repo part?  Whatever it is, I am not sure if I
 follow that logic.  After the while (*path) loop checks each level,
 you check the whole path---wouldn't that code handle that special
 case already?

 Given /dir1/repo/dir2/file I've used 'in-repo' to refer to
 dir2/file, sometimes interchangably with part inside the work tree
 which might be better terminology, and should replace it?

Yes, inside the working tree is much clearer than inside repo,
because the word repository often is used to mean only the stuff
inside $GIT_DIR, i.e. what controls the working tree files.

 I was trying to convey that if path is simply /dir/repo, then the while
 loop method of replacing a '/' and checking from the beginning won't
 work for the last level, since it has no terminating '/' to replace, so
 hence it's a special case, mentioning the part inside the work tree
 is arguably confusing in that case, since there isn't really one, maybe
 it should be left out completely, since the check each level
 explanation covers it already?

I dunno about the explanation, but it still looks strange to have
the special case to deal with /dir/repo before you enter the while
loop, and then also have code immediately after the loop that seems
to handle the same case.  Isn't the latter one redundant?

 Because it is probably the normal case not to have any symbolic link
 in the leading part (hence not having to dereference them), I think
 checking is work_tree[] a prefix of path[] early is justified from
 performance point of view, though.
 
   /*
  + * No checking if the path is in fact an absolute path is done, and it 
  must
  + * already be normalized.
 
 This is not quite parsable to me.
 Hmm, what about
   The input parameter must contain an absolute path, and it must
   already be normalized.

OK.

  +  const char* work_tree = get_git_work_tree();
  +  if (!work_tree)
  +  return -1;
  +  wtlen = strlen(work_tree);
  +  len = strlen(path);
 
 I expect that this is called from a callsite that _knows_ how long
 path[] is.  Shouldn't len be a parameter to this function instead?

 Currently, it actually doesn't, since 'normalize_path_copy_len' is
 called on it prior, which can mess with the string length.

OK, strlen() here is perfectly fine, then.

 Hmph  How do our callers treat an empty path?  In other words,
 should these three be equivalent?
 
  cd /a  git ls-files /a
  cd /a  git ls-files 
  cd /a  git ls-files .

 Here I have only gone by the assumption that previous code did the right
 thing,...

Good to know.  And the answer to the above question I think is yes,
these should be equivalent.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] setup: Add 'abspath_part_inside_repo' function

2014-02-04 Thread Martin Erik Werner
On Tue, 2014-02-04 at 10:09 -0800, Junio C Hamano wrote:
 Martin Erik Werner martinerikwer...@gmail.com writes:
(...)
  I was trying to convey that if path is simply /dir/repo, then the while
  loop method of replacing a '/' and checking from the beginning won't
  work for the last level, since it has no terminating '/' to replace, so
  hence it's a special case, mentioning the part inside the work tree
  is arguably confusing in that case, since there isn't really one, maybe
  it should be left out completely, since the check each level
  explanation covers it already?
 
 I dunno about the explanation, but it still looks strange to have
 the special case to deal with /dir/repo before you enter the while
 loop, and then also have code immediately after the loop that seems
 to handle the same case.  Isn't the latter one redundant?

The check before the loop doesn't use 'real_path', the one after does:
/dir/repo vs /dir/repolink

-- 
Martin Erik Werner martinerikwer...@gmail.com

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] setup: Add 'abspath_part_inside_repo' function

2014-02-03 Thread Junio C Hamano
Martin Erik Werner martinerikwer...@gmail.com writes:

 The path being exactly equal to the work tree is handled separately,
 since then there is no directory separator between the work tree and
 in-repo part.

What is an in-repo part?  Whatever it is, I am not sure if I
follow that logic.  After the while (*path) loop checks each level,
you check the whole path---wouldn't that code handle that special
case already?

Because it is probably the normal case not to have any symbolic link
in the leading part (hence not having to dereference them), I think
checking is work_tree[] a prefix of path[] early is justified from
performance point of view, though.

  /*
 + * No checking if the path is in fact an absolute path is done, and it must
 + * already be normalized.

This is not quite parsable to me.

 + * Find the part of an absolute path that lies inside the work tree by
 + * dereferencing symlinks outside the work tree, for example:
 + * /dir1/repo/dir2/file   (work tree is /dir1/repo)  - dir2/file
 + * /dir/file  (work tree is /)   - dir/file
 + * /dir/symlink1/symlink2 (symlink1 points to work tree) - symlink2
 + * /dir/repolink/file (repolink points to /dir/repo) - file
 + * /dir/repo  (exactly equal to work tree)   - (empty string)
 + */
 +static inline int abspath_part_inside_repo(char *path)

It looks a bit too big to be marked inline; better leave it to the
compiler to notice that there is only a single callsite and decide
to (or not to) inline it.

 +{
 + size_t len;
 + size_t wtlen;
 + char *path0;
 + int off;
 +
 + const char* work_tree = get_git_work_tree();
 + if (!work_tree)
 + return -1;
 + wtlen = strlen(work_tree);
 + len = strlen(path);

I expect that this is called from a callsite that _knows_ how long
path[] is.  Shouldn't len be a parameter to this function instead?

 + off = 0;
 +
 + /* check if work tree is already the prefix */
 + if (strncmp(path, work_tree, wtlen) == 0) {

I wonder if we want to be explicit and compare wtlen and len before
even attempting strncmp().  If work_tree is longer than path, it
cannot be a prefix of path, right?

In other words, also with a style-fix to prefer !XXcmp() over
XXcmp() == 0:

if (wtlen = len  !strncmp(path, work_tree, wtlen)) {

perhaps?  That would make it clear why referring to path[wtlen] on
the next line is permissible (it is obvious that it won't access
past the end of path[]):

 + if (path[wtlen] == '/') {
 + memmove(path, path + wtlen + 1, len - wtlen);
 + return 0;
 + } else if (path[wtlen - 1] == '/' || path[wtlen] == '\0') {
 + /* work tree is the root, or the whole path */
 + memmove(path, path + wtlen, len - wtlen + 1);

If work_tree[] == /, path[] == /a, then len == 2, wtlen == 1,
path[wtlen - 1] == '/' and this shifts path[] to the left by one,
leaving path[] = a, which is what we want.  OK.

If work_tree[] == path[] == /a, then len == wtlen == 2,
path[wtlen] == '\0', and this becomes equivalent to path[0] = '\0'.
Hmph  How do our callers treat an empty path?  In other words,
should these three be equivalent?

cd /a  git ls-files /a
cd /a  git ls-files 
cd /a  git ls-files .

 + return 0;
 + }
 + /* work tree might match beginning of a symlink to work tree */
 + off = wtlen;
 + }
 + path0 = path;
 + path += offset_1st_component(path) + off;
 +
 + /* check each level */
 + while (*path) {
 + path++;
 + if (*path == '/') {
 + *path = '\0';
 + if (strcmp(real_path(path0), work_tree) == 0) {
 + memmove(path0, path + 1, len - (path - path0));
 + return 0;
 + }
 + *path = '/';
 + }
 + }
 +
 + /* check whole path */
 + if (strcmp(real_path(path0), work_tree) == 0) {
 + *path0 = '\0';
 + return 0;
 + }
 +
 + return -1;
 +}
 +
 +/*
   * Normalize path, prepending the prefix for relative paths. If
   * remaining_prefix is not NULL, return the actual prefix still
   * remains in the path. For example, prefix = sub1/sub2/ and path is
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] setup: Add 'abspath_part_inside_repo' function

2014-02-03 Thread Martin Erik Werner
On Mon, Feb 03, 2014 at 11:52:33AM -0800, Junio C Hamano wrote:
 Junio C Hamano gits...@pobox.com writes:
 
  Can we have that git foo $path to the testsuite as well?  That is
  the breakage we do not want to repeat in the future by regressing.
 
 Something like this, perhaps?
 
  t/t3004-ls-files-basic.sh | 17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/t/t3004-ls-files-basic.sh b/t/t3004-ls-files-basic.sh
 index 8d9bc3c..d93089d 100755
 --- a/t/t3004-ls-files-basic.sh
 +++ b/t/t3004-ls-files-basic.sh
 @@ -36,4 +36,21 @@ test_expect_success 'ls-files -h in corrupt repository' '
   test_i18ngrep [Uu]sage: git ls-files  broken/usage
  '
  
 +test_expect_success SYMLINKS 'ls-files with symlinks' '
 + mkdir subs 
 + ln -s nosuch link 
 + ln -s ../nosuch subs/link 
 + git add link subs/link 
 + git ls-files -s link subs/link expect 
 + git ls-files -s $(pwd)/link $(pwd)/subs/link actual 
 + test_cmp expect actual 
 +
 + (
 + cd subs 
 + git ls-files -s link ../expect 
 + git ls-files -s $(pwd)/link ../actual
 + ) 
 + test_cmp expect actual
 +'
 +
  test_done

Your test looks preferrable to the simple git-add one that I proposed,
since it covers some more ground than the simple:

test_expect_success SYMLINKS 'add with abs path to link...' '
ln -s target link 
git add link
'

Will you add that test or should I place it in the series with you as
author?

On Mon, Feb 03, 2014 at 01:00:48PM -0800, Junio C Hamano wrote:
 Martin Erik Werner martinerikwer...@gmail.com writes:
 
  The path being exactly equal to the work tree is handled separately,
  since then there is no directory separator between the work tree and
  in-repo part.
 
 What is an in-repo part?  Whatever it is, I am not sure if I
 follow that logic.  After the while (*path) loop checks each level,
 you check the whole path---wouldn't that code handle that special
 case already?

Given /dir1/repo/dir2/file I've used 'in-repo' to refer to
dir2/file, sometimes interchangably with part inside the work tree
which might be better terminology, and should replace it?

I was trying to convey that if path is simply /dir/repo, then the while
loop method of replacing a '/' and checking from the beginning won't
work for the last level, since it has no terminating '/' to replace, so
hence it's a special case, mentioning the part inside the work tree
is arguably confusing in that case, since there isn't really one, maybe
it should be left out completely, since the check each level
explanation covers it already?

 
 Because it is probably the normal case not to have any symbolic link
 in the leading part (hence not having to dereference them), I think
 checking is work_tree[] a prefix of path[] early is justified from
 performance point of view, though.
 
   /*
  + * No checking if the path is in fact an absolute path is done, and it must
  + * already be normalized.
 
 This is not quite parsable to me.
Hmm, what about
The input parameter must contain an absolute path, and it must
already be normalized.
?

  + * Find the part of an absolute path that lies inside the work tree by
  + * dereferencing symlinks outside the work tree, for example:
  + * /dir1/repo/dir2/file   (work tree is /dir1/repo)  - dir2/file
  + * /dir/file  (work tree is /)   - dir/file
  + * /dir/symlink1/symlink2 (symlink1 points to work tree) - symlink2
  + * /dir/repolink/file (repolink points to /dir/repo) - file
  + * /dir/repo  (exactly equal to work tree)   - (empty string)
  + */
  +static inline int abspath_part_inside_repo(char *path)
 
 It looks a bit too big to be marked inline; better leave it to the
 compiler to notice that there is only a single callsite and decide
 to (or not to) inline it.

Ok, will do.

  +{
  +   size_t len;
  +   size_t wtlen;
  +   char *path0;
  +   int off;
  +
  +   const char* work_tree = get_git_work_tree();
  +   if (!work_tree)
  +   return -1;
  +   wtlen = strlen(work_tree);
  +   len = strlen(path);
 
 I expect that this is called from a callsite that _knows_ how long
 path[] is.  Shouldn't len be a parameter to this function instead?

Currently, it actually doesn't, since 'normalize_path_copy_len' is
called on it prior, which can mess with the string length. Do you think
the 'strlen' call should still be moved up a step into
'prefix_path_gently'?

  +   off = 0;
  +
  +   /* check if work tree is already the prefix */
  +   if (strncmp(path, work_tree, wtlen) == 0) {
 
 I wonder if we want to be explicit and compare wtlen and len before
 even attempting strncmp().  If work_tree is longer than path, it
 cannot be a prefix of path, right?
 
 In other words, also with a style-fix to prefer !XXcmp() over
 XXcmp() == 0:
 
   if (wtlen = len  !strncmp(path, work_tree, wtlen)) {
 
 perhaps?  That would make it clear why referring to path[wtlen] on
 the next line is 

[PATCH v5 4/5] setup: Add 'abspath_part_inside_repo' function

2014-02-02 Thread Martin Erik Werner
In order to extract the part of an absolute path which lies inside the
repo, it is not possible to directly use real_path, since that would
dereference symlinks both outside and inside the work tree.

Add an 'abspath_part_inside_repo' function which first checks if the
work tree is already the prefix, then incrementally checks each path
level by temporarily NUL-terminating at each '/' and comparing against
the work tree path. If a match is found, it overwrites the input path
with the remainder past the work tree (which will be the in-repo part).

The path being exactly equal to the work tree is handled separately,
since then there is no directory separator between the work tree and
in-repo part.

This function is currently only intended for use in
'prefix_path_gently'.

Signed-off-by: Martin Erik Werner martinerikwer...@gmail.com
---
 setup.c | 64 
 1 file changed, 64 insertions(+)

diff --git a/setup.c b/setup.c
index 5432a31..a2e60ab 100644
--- a/setup.c
+++ b/setup.c
@@ -6,6 +6,70 @@ static int inside_git_dir = -1;
 static int inside_work_tree = -1;
 
 /*
+ * No checking if the path is in fact an absolute path is done, and it must
+ * already be normalized.
+ *
+ * Find the part of an absolute path that lies inside the work tree by
+ * dereferencing symlinks outside the work tree, for example:
+ * /dir1/repo/dir2/file   (work tree is /dir1/repo)  - dir2/file
+ * /dir/file  (work tree is /)   - dir/file
+ * /dir/symlink1/symlink2 (symlink1 points to work tree) - symlink2
+ * /dir/repolink/file (repolink points to /dir/repo) - file
+ * /dir/repo  (exactly equal to work tree)   - (empty string)
+ */
+static inline int abspath_part_inside_repo(char *path)
+{
+   size_t len;
+   size_t wtlen;
+   char *path0;
+   int off;
+
+   const char* work_tree = get_git_work_tree();
+   if (!work_tree)
+   return -1;
+   wtlen = strlen(work_tree);
+   len = strlen(path);
+   off = 0;
+
+   /* check if work tree is already the prefix */
+   if (strncmp(path, work_tree, wtlen) == 0) {
+   if (path[wtlen] == '/') {
+   memmove(path, path + wtlen + 1, len - wtlen);
+   return 0;
+   } else if (path[wtlen - 1] == '/' || path[wtlen] == '\0') {
+   /* work tree is the root, or the whole path */
+   memmove(path, path + wtlen, len - wtlen + 1);
+   return 0;
+   }
+   /* work tree might match beginning of a symlink to work tree */
+   off = wtlen;
+   }
+   path0 = path;
+   path += offset_1st_component(path) + off;
+
+   /* check each level */
+   while (*path) {
+   path++;
+   if (*path == '/') {
+   *path = '\0';
+   if (strcmp(real_path(path0), work_tree) == 0) {
+   memmove(path0, path + 1, len - (path - path0));
+   return 0;
+   }
+   *path = '/';
+   }
+   }
+
+   /* check whole path */
+   if (strcmp(real_path(path0), work_tree) == 0) {
+   *path0 = '\0';
+   return 0;
+   }
+
+   return -1;
+}
+
+/*
  * Normalize path, prepending the prefix for relative paths. If
  * remaining_prefix is not NULL, return the actual prefix still
  * remains in the path. For example, prefix = sub1/sub2/ and path is
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html