[PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-10 Thread Nguyễn Thái Ngọc Duy
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.

This patch adds those optimizations to attr. Basically it tries to
avoid fnmatch as much as possible in favor of strncmp.

A few notes from this work is put in the documentation:

* "!pattern" syntax is not supported in .gitattributes as it's not
  clear what it means (e.g. "!path attr" is about unsetting attr, or
  undefining it..)

* patterns applying to directories

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 How about this? Diff from the previous version:

   diff --git a/Documentation/gitattributes.txt 
b/Documentation/gitattributes.txt
   index cc2ff1d..9a0ed19 100644
   --- a/Documentation/gitattributes.txt
   +++ b/Documentation/gitattributes.txt
   @@ -23,7 +23,7 @@ Each line in `gitattributes` file is of form:
That is, a pattern followed by an attributes list,
separated by whitespaces.  When the pattern matches the
path in question, the attributes listed on the line are given to
   -the path. Only files can be attached attributes to.
   +the path.

Each attribute can be in one of these states for a given path:

   @@ -58,6 +58,13 @@ attribute.  The rules how the pattern matches paths are 
the
same as in `.gitignore` files; see linkgit:gitignore[5].
Unlike `.gitignore`, negative patterns are not supported.

   +Note that if a .gitignore rule matches a directory, the directory
   +is ignored, which may be seen as assigning "ignore" attribute the
   +directory and all files and directories inside. However, if a
   +.gitattributes rule matches a directory, it manipulates
   +attributes on that directory only, not files and directories
   +inside.
   +
When deciding what attributes are assigned to a path, git
consults `$GIT_DIR/info/attributes` file (which has the highest
precedence), `.gitattributes` file in the same directory as the
   diff --git a/attr.c b/attr.c
   index 7e85f82..4faf1ff 100644
   --- a/attr.c
   +++ b/attr.c
   @@ -250,7 +250,6 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
else {
char *p = (char *)&(res->state[num_attr]);
memcpy(p, name, namelen);
   -p[namelen] = 0;
res->u.pat.pattern = p;
parse_exclude_pattern(&res->u.pat.pattern,
  &res->u.pat.patternlen,
   @@ -690,16 +689,18 @@ static int path_matches(const char *pathname, int 
pathlen,
 * contain the trailing slash
 */

   -if (pathlen < baselen ||
   +if (pathlen < baselen + 1 ||
(baselen && pathname[baselen] != '/') ||
   -strncmp(pathname, base, baselen))
   +strncmp_icase(pathname, base, baselen))
return 0;

namelen = baselen ? pathlen - baselen - 1 : pathlen;
name = pathname + pathlen - namelen;

   -/* if the non-wildcard part is longer than the remaining
   -   pathname, surely it cannot match */
   +/*
   + * if the non-wildcard part is longer than the remaining
   + * pathname, surely it cannot match
   + */
if (!namelen || prefix > namelen)
return 0;
 

 Documentation/gitattributes.txt |  8 +
 attr.c  | 72 +
 dir.c   |  8 ++---
 dir.h   |  1 +
 t/t0003-attributes.sh   | 14 
 5 files changed, 86 insertions(+), 17 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 80120ea..9a0ed19 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -56,6 +56,14 @@ When more than one pattern matches the path, a later line
 overrides an earlier line.  This overriding is done per
 attribute.  The rules how the pattern matches paths are the
 same as in `.gitignore` files; see linkgit:gitignore[5].
+Unlike `.gitignore`, negative patterns are not supported.
+
+Note that if a .gitignore rule matches a directory, the directory
+is ignored, which may be seen as assigning "ignore" attribute the
+directory and all files and directories inside. However, if a
+.gitattributes rule matches a directory, it manipulates
+attributes on that directory only, not files and directories
+inside.
 
 When deciding what attributes are assigned to a path, git
 consults `$GIT_DIR/info/attributes` file (which has the highest
diff --git a/attr.c b/attr.c
index e7caee4..4faf1ff 100644
--- a/attr.c
+++ b/attr.c
@@ -115,6 +115,13 @@ struct attr_state {
const char *setto;
 };
 
+struct pattern {
+   const char *pattern;
+   int patternlen;
+   int nowildcardlen;
+   int flags;  /* EXC_FLAG_* */
+};
+
 /*
  * One rule, as from a .gitattributes file.
  *
@@ -131,7 +138,7 @@ struct attr_

Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-10 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  writes:

> .gitattributes and .gitignore share the same pattern syntax but has
> separate matching implementation. Over the years, ignore's
> implementation accumulates more optimizations while attr's stays the
> same.
>
> This patch adds those optimizations to attr. Basically it tries to
> avoid fnmatch as much as possible in favor of strncmp.
>
> A few notes from this work is put in the documentation:
>
> * "!pattern" syntax is not supported in .gitattributes as it's not

s/not supported/forbidden/;

A reader can take "not supported" as "silently ignored", which is
not the case.  An explicit "forbidden" does not have such a
misinterpretation.

It would save time from both of us if you can check what is queued
on 'pu'.  I do not think I touched the code for off-by-one bugs
there, though.

>   clear what it means (e.g. "!path attr" is about unsetting attr, or
>   undefining it..)
>
> * patterns applying to directories
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  How about this? Diff from the previous version:
>
>diff --git a/Documentation/gitattributes.txt 
> b/Documentation/gitattributes.txt
>index cc2ff1d..9a0ed19 100644
>--- a/Documentation/gitattributes.txt
>+++ b/Documentation/gitattributes.txt
>@@ -23,7 +23,7 @@ Each line in `gitattributes` file is of form:
> That is, a pattern followed by an attributes list,
> separated by whitespaces.  When the pattern matches the
> path in question, the attributes listed on the line are given to
>-the path. Only files can be attached attributes to.
>+the path.
> 
> Each attribute can be in one of these states for a given path:
> 
>@@ -58,6 +58,13 @@ attribute.  The rules how the pattern matches paths are 
> the
> same as in `.gitignore` files; see linkgit:gitignore[5].
> Unlike `.gitignore`, negative patterns are not supported.
> 
>+Note that if a .gitignore rule matches a directory, the directory
>+is ignored, which may be seen as assigning "ignore" attribute the
>+directory and all files and directories inside. However, if a
>+.gitattributes rule matches a directory, it manipulates
>+attributes on that directory only, not files and directories
>+inside.

Why do you even need to mention .gitignore in gitattributes manual
where it is irrelevant from the reader's point of view?

Besides, the interpretation the "may be seen as" suggests is
actively wrong.  It is assigning "ignore-this-and-below" attribute
to the directory, and there is no inconsistency between the two.

Again, I'd suggest dropping this addition.

>diff --git a/attr.c b/attr.c
>index 7e85f82..4faf1ff 100644
>--- a/attr.c
>+++ b/attr.c
>@@ -250,7 +250,6 @@ static struct match_attr *parse_attr_line(const char 
> *line, const char *src,
>   else {
>   char *p = (char *)&(res->state[num_attr]);
>   memcpy(p, name, namelen);
>-  p[namelen] = 0;
>   res->u.pat.pattern = p;
>   parse_exclude_pattern(&res->u.pat.pattern,
> &res->u.pat.patternlen,
>@@ -690,16 +689,18 @@ static int path_matches(const char *pathname, int 
> pathlen,
>* contain the trailing slash
>*/
> 
>-  if (pathlen < baselen ||
>+  if (pathlen < baselen + 1 ||
>   (baselen && pathname[baselen] != '/') ||
>-  strncmp(pathname, base, baselen))
>+  strncmp_icase(pathname, base, baselen))
>   return 0;
> 
>   namelen = baselen ? pathlen - baselen - 1 : pathlen;
>   name = pathname + pathlen - namelen;
> 
>-  /* if the non-wildcard part is longer than the remaining
>- pathname, surely it cannot match */
>+  /*
>+   * if the non-wildcard part is longer than the remaining
>+   * pathname, surely it cannot match
>+   */
>   if (!namelen || prefix > namelen)
>   return 0;

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-10 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

>@@ -690,16 +689,18 @@ static int path_matches(const char *pathname, int 
> pathlen,
>* contain the trailing slash
>*/
> 
>-  if (pathlen < baselen ||
>+  if (pathlen < baselen + 1 ||
>   (baselen && pathname[baselen] != '/') ||
>-  strncmp(pathname, base, baselen))
>+  strncmp_icase(pathname, base, baselen))

Shouldn't the last comparison be

strncmp_icase(pathname, base, baselen + 1)

instead, if you are trying to match this part from dir.c where
baselen does count the trailing slash?

if (pathlen < x->baselen ||
(x->baselen && pathname[x->baselen-1] != '/') ||
strncmp_icase(pathname, x->base, x->baselen))
continue;

In other words, relative to what was queued to 'pu', something like
this instead

-- >8 --
Subject: [PATCH] fixup: matching path_matches() in attr.c to that of dir.c

In this function, baselen does not count the trailing slash that
should come after the directory name held in "basename" variable, so
whenever the corresponding code in dir.c:excluded_from_list() refers
to x->baselen, we would need to use "baselen+1" consistently.

Also remove unnecessary NUL-termination of a buffer obtained from
calloc().

Signed-off-by: Junio C Hamano 
---
 attr.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/attr.c b/attr.c
index 7e85f82..a9c04a8 100644
--- a/attr.c
+++ b/attr.c
@@ -250,7 +250,6 @@ static struct match_attr *parse_attr_line(const char *line, 
const char *src,
else {
char *p = (char *)&(res->state[num_attr]);
memcpy(p, name, namelen);
-   p[namelen] = 0;
res->u.pat.pattern = p;
parse_exclude_pattern(&res->u.pat.pattern,
  &res->u.pat.patternlen,
@@ -687,19 +686,21 @@ static int path_matches(const char *pathname, int pathlen,
 
/*
 * note: unlike excluded_from_list, baselen here does not
-* contain the trailing slash
+* count the trailing slash.
 */
 
-   if (pathlen < baselen ||
+   if (pathlen < baselen + 1 ||
(baselen && pathname[baselen] != '/') ||
-   strncmp(pathname, base, baselen))
+   strncmp_icase(pathname, base, baselen + 1))
return 0;
 
namelen = baselen ? pathlen - baselen - 1 : pathlen;
name = pathname + pathlen - namelen;
 
-   /* if the non-wildcard part is longer than the remaining
-  pathname, surely it cannot match */
+   /*
+* if the non-wildcard part is longer than the remaining
+* pathname, surely it cannot match.
+*/
if (!namelen || prefix > namelen)
return 0;
 
-- 
1.8.0.rc1.76.g5a375e6

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-10 Thread Junio C Hamano
Junio C Hamano  writes:

> Nguyễn Thái Ngọc Duy   writes:
>
>>@@ -690,16 +689,18 @@ static int path_matches(const char *pathname, int 
>> pathlen,
>>   * contain the trailing slash
>>   */
>> 
>>- if (pathlen < baselen ||
>>+ if (pathlen < baselen + 1 ||
>>  (baselen && pathname[baselen] != '/') ||
>>- strncmp(pathname, base, baselen))
>>+ strncmp_icase(pathname, base, baselen))
>
> Shouldn't the last comparison be
>
>   strncmp_icase(pathname, base, baselen + 1)
>
> instead, if you are trying to match this part from dir.c where
> baselen does count the trailing slash?
>
>   if (pathlen < x->baselen ||
>   (x->baselen && pathname[x->baselen-1] != '/') ||
>   strncmp_icase(pathname, x->base, x->baselen))
>   continue;
>
> In other words, relative to what was queued to 'pu', something like
> this instead

And,... it doesn't work and breaks t0003.sh.  Sigh...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-10 Thread Nguyen Thai Ngoc Duy
On Thu, Oct 11, 2012 at 4:41 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>>@@ -690,16 +689,18 @@ static int path_matches(const char *pathname, int 
>> pathlen,
>>* contain the trailing slash
>>*/
>>
>>-  if (pathlen < baselen ||
>>+  if (pathlen < baselen + 1 ||
>>   (baselen && pathname[baselen] != '/') ||
>>-  strncmp(pathname, base, baselen))
>>+  strncmp_icase(pathname, base, baselen))
>
> Shouldn't the last comparison be
>
> strncmp_icase(pathname, base, baselen + 1)
>
> instead,

"base" does not contain the trailing slash, so it can only match up to
base[baselen-1], then fail at base[baselen], which is '\0'. The "no
trailing slash" business in this function is tricky :(

> if you are trying to match this part from dir.c where
> baselen does count the trailing slash?
>
> if (pathlen < x->baselen ||
> (x->baselen && pathname[x->baselen-1] != '/') ||
> strncmp_icase(pathname, x->base, x->baselen))
> continue;

strncmp_icase() here just needs to compare x->baselen-1 chars (i.e. no
trailing slash) as the trailing slash is explicitly checked just above
strncmp_icase. But it does not hurt to compare an extra character so I
leave it unchanged. But obviously it causes confusion when we try to
match this function and the one in attr.c
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-12 Thread Nguyen Thai Ngoc Duy
On Thu, Oct 11, 2012 at 3:03 AM, Junio C Hamano  wrote:
> It would save time from both of us if you can check what is queued
> on 'pu'.  I do not think I touched the code for off-by-one bugs
> there, though.

'pu' looks good.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-12 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index 51f3045..4a1402f 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -242,4 +242,18 @@ test_expect_success 'bare repository: test 
> info/attributes' '
>   attr_check subdir/a/i unspecified
>  '
>  
> +test_expect_success 'leave bare' '
> + cd ..
> +'
> +
> +test_expect_success 'negative patterns' '
> + echo "!f test=bar" >.gitattributes &&
> + test_must_fail git check-attr test -- f
> +'
> +
> +test_expect_success 'patterns starting with exclamation' '
> + echo "\!f test=foo" >.gitattributes &&
> + attr_check "!f" foo
> +'
> +
>  test_done

This is not entirely your fault, but please don't do that "cd ..".

The original test had "cd bare", made an assumption that step will
never fail (which is mostly correct), and ran everything afterward
in that subdirectory.

Adding "Do a 'cd ..' to come back" is a horrible way to build on
it.  Imagine what happens when another person also did the same
thing, and both changes need to be merged.  You will end up going up
two levels, which is not what you want.

I think the right fix is to make each of the test that wants to run
in "bare" chdir in its own subshell, or not append these new tests
that do not run in the "bare" to the end of this file, but before
the execution goes down to "bare".

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] attr: more matching optimizations from .gitignore

2012-10-12 Thread Nguyen Thai Ngoc Duy
On Fri, Oct 12, 2012 at 12:09:57PM -0700, Junio C Hamano wrote:
> This is not entirely your fault, but please don't do that "cd ..".
> 
> The original test had "cd bare", made an assumption that step will
> never fail (which is mostly correct), and ran everything afterward
> in that subdirectory.
> 
> Adding "Do a 'cd ..' to come back" is a horrible way to build on
> it.  Imagine what happens when another person also did the same
> thing, and both changes need to be merged.  You will end up going up
> two levels, which is not what you want.
> 
> I think the right fix is to make each of the test that wants to run
> in "bare" chdir in its own subshell, or not append these new tests
> that do not run in the "bare" to the end of this file, but before
> the execution goes down to "bare".
> 

The reason I put these tests at the end was because I destroy
.gitattributes and it might affect the following tests. But obviously
the bare tests run in its own repository (and I have not committed
anything till the repo is cloned) so my .gitattributes changes can't
affect them.

Please squash this in

-- 8< --
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 4a1402f..f6c21ea 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -206,6 +206,16 @@ test_expect_success 'root subdir attribute test' '
attr_check subdir/a/i unspecified
 '
 
+test_expect_success 'negative patterns' '
+   echo "!f test=bar" >.gitattributes &&
+   test_must_fail git check-attr test -- f
+'
+
+test_expect_success 'patterns starting with exclamation' '
+   echo "\!f test=foo" >.gitattributes &&
+   attr_check "!f" foo
+'
+
 test_expect_success 'setup bare' '
git clone --bare . bare.git &&
cd bare.git
@@ -242,18 +252,4 @@ test_expect_success 'bare repository: test 
info/attributes' '
attr_check subdir/a/i unspecified
 '
 
-test_expect_success 'leave bare' '
-   cd ..
-'
-
-test_expect_success 'negative patterns' '
-   echo "!f test=bar" >.gitattributes &&
-   test_must_fail git check-attr test -- f
-'
-
-test_expect_success 'patterns starting with exclamation' '
-   echo "\!f test=foo" >.gitattributes &&
-   attr_check "!f" foo
-'
-
 test_done
-- 8< --
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html