Re: [PATCH nd/attr-match-optim-more 2/2] attr: more matching optimizations from .gitignore

2012-10-09 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Sixt  writes:
>
>> Am 10/9/2012 7:08, schrieb Junio C Hamano:
>>> Imagine if we allowed only one attribute per line, instead of
>>> multiple attributes on one line.
>>> 
>>>  - If you want to unset the attribute, you would write "path -attr".
>>> 
>>>  - If you want to reset the attribute to unspecified, you would
>>>write "path !attr".
>>> 
>>> Both are used in conjunction with some other (typically more
>>> generic) pattern that sets, sets to a value, and/or unsets the
>>> attribute, to countermand its effect.
>>> 
>>> If you were to allow "!path attr", what does it mean?  It obviously
>>> is not about setting the attr to true or to a string value, but is
>>> it countermanding an earlier set and telling us to unset the attr,
>>> or make the attr unspecified?
>>
>> If I have at the toplevel:
>>
>>   *.txt  whitespace=tabwidth=4
>>
>> and in a subdirectory
>>
>>   *.txt  whitespace=tabwidth=8
>>   !README.txt
>>
>> it could be interpreted as "do not apply *.txt to REAME.txt in this
>> subdirectory". That is, it does not countermand some _particular_
>> attribute setting, but says "use the attributes collected elsewhere".
>
> It makes it unclear what "elsewhere" means, though (besides, it does
> not match the way the matching logic works at all).

Ignoring the current implementation, I find the suggested semantics
somewhat intriguing.  It is something we may want to look into in
the future.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nd/attr-match-optim-more 2/2] attr: more matching optimizations from .gitignore

2012-10-08 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 10/9/2012 7:08, schrieb Junio C Hamano:
>> Imagine if we allowed only one attribute per line, instead of
>> multiple attributes on one line.
>> 
>>  - If you want to unset the attribute, you would write "path -attr".
>> 
>>  - If you want to reset the attribute to unspecified, you would
>>write "path !attr".
>> 
>> Both are used in conjunction with some other (typically more
>> generic) pattern that sets, sets to a value, and/or unsets the
>> attribute, to countermand its effect.
>> 
>> If you were to allow "!path attr", what does it mean?  It obviously
>> is not about setting the attr to true or to a string value, but is
>> it countermanding an earlier set and telling us to unset the attr,
>> or make the attr unspecified?
>
> If I have at the toplevel:
>
>   *.txt  whitespace=tabwidth=4
>
> and in a subdirectory
>
>   *.txt  whitespace=tabwidth=8
>   !README.txt
>
> it could be interpreted as "do not apply *.txt to REAME.txt in this
> subdirectory". That is, it does not countermand some _particular_
> attribute setting, but says "use the attributes collected elsewhere".

It makes it unclear what "elsewhere" means, though (besides, it does
not match the way the matching logic works at all).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nd/attr-match-optim-more 2/2] attr: more matching optimizations from .gitignore

2012-10-08 Thread Johannes Sixt
Am 10/9/2012 7:08, schrieb Junio C Hamano:
> Imagine if we allowed only one attribute per line, instead of
> multiple attributes on one line.
> 
>  - If you want to unset the attribute, you would write "path -attr".
> 
>  - If you want to reset the attribute to unspecified, you would
>write "path !attr".
> 
> Both are used in conjunction with some other (typically more
> generic) pattern that sets, sets to a value, and/or unsets the
> attribute, to countermand its effect.
> 
> If you were to allow "!path attr", what does it mean?  It obviously
> is not about setting the attr to true or to a string value, but is
> it countermanding an earlier set and telling us to unset the attr,
> or make the attr unspecified?

If I have at the toplevel:

  *.txt  whitespace=tabwidth=4

and in a subdirectory

  *.txt  whitespace=tabwidth=8
  !README.txt

it could be interpreted as "do not apply *.txt to REAME.txt in this
subdirectory". That is, it does not countermand some _particular_
attribute setting, but says "use the attributes collected elsewhere".

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nd/attr-match-optim-more 2/2] attr: more matching optimizations from .gitignore

2012-10-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> .gitattributes and .gitignore share the same pattern syntax but
> has separate matching implementation. Over the years, ignore's
> implementation accumulates more optimizations while attr's stays
> the same.
>
> This patch adds those optimizations to attr. Basically it tries to
> avoid fnmatch as much as possible in favor of strncmp.
>
> A few notes from this work is put in the documentation:
>
> * "!pattern" syntax is not supported in .gitattributes. Negative
>   patterns may work well for a single attribute like .gitignore. It's
>   confusing in .gitattributes are many attributes can be
>   set/unset/undefined at using the same pattern.

I think the above misses the point.

Imagine if we allowed only one attribute per line, instead of
multiple attributes on one line.

 - If you want to unset the attribute, you would write "path -attr".

 - If you want to reset the attribute to unspecified, you would
   write "path !attr".

Both are used in conjunction with some other (typically more
generic) pattern that sets, sets to a value, and/or unsets the
attribute, to countermand its effect.

If you were to allow "!path attr", what does it mean?  It obviously
is not about setting the attr to true or to a string value, but is
it countermanding an earlier set and telling us to unset the attr,
or make the attr unspecified?

That is the ambiguity issue "!pattern" syntax would introduce if it
were to be allowed in the attributes.  I think "multiple attributes
on the same line" is a red herring.

> * we support attaching attributes to directories at the syntax
>   level, but we do not really attach attributes on directory or use
>   them.

I would say "... but we do not currently use attributes on
directories."

> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 80120ea..cc2ff1d 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -23,7 +23,7 @@ Each line in `gitattributes` file is of form:
>  That is, a pattern followed by an attributes list,
>  separated by whitespaces.  When the pattern matches the
>  path in question, the attributes listed on the line are given to
> -the path.
> +the path. Only files can be attached attributes to.

Symbolic links?

I would strongly suggest dropping "Only ... can be...".  You can
specify attributes to anything and check with "check-attr".  It is
just that core part does not have anything that pays attention to
attributes given to directories in the current codebase.

> @@ -56,6 +56,7 @@ When more than one pattern matches the path, a later line
>  overrides an earlier line.  This overriding is done per
>  attribute.  The rules how the pattern matches paths are the
>  same as in `.gitignore` files; see linkgit:gitignore[5].
> +Unlike `.gitignore`, negative patterns are forbidden.

OK (I am debating myself if it helps the readers if we said why it
is forbidden to write such).

> diff --git a/attr.c b/attr.c
> index e7caee4..7e85f82 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -115,6 +115,13 @@ struct attr_state {
>   const char *setto;
>  };
>  
> +struct pattern {
> + const char *pattern;
> + int patternlen;
> + int nowildcardlen;
> + int flags;  /* EXC_FLAG_* */
> +};
> +
>  /*
>   * One rule, as from a .gitattributes file.
>   *
> @@ -131,7 +138,7 @@ struct attr_state {
>   */
>  struct match_attr {
>   union {
> - char *pattern;
> + struct pattern pat;
>   struct git_attr *attr;
>   } u;
>   char is_macro;
> @@ -241,9 +248,17 @@ static struct match_attr *parse_attr_line(const char 
> *line, const char *src,
>   if (is_macro)
>   res->u.attr = git_attr_internal(name, namelen);
>   else {
> - res->u.pattern = (char *)&(res->state[num_attr]);
> - memcpy(res->u.pattern, name, namelen);
> - res->u.pattern[namelen] = 0;
> + char *p = (char *)&(res->state[num_attr]);
> + memcpy(p, name, namelen);
> + p[namelen] = 0;

This is inherited from the original code, but *res is calloc(3)ed
so the above memcpy() automatically NUL terminates this string.

> + res->u.pat.pattern = p;
> + parse_exclude_pattern(&res->u.pat.pattern,
> +   &res->u.pat.patternlen,
> +   &res->u.pat.flags,
> +   &res->u.pat.nowildcardlen);
> + if (res->u.pat.flags & EXC_FLAG_NEGATIVE)
> + die(_("Negative patterns are forbidden in git 
> attributes\n"
> +   "Use '\\!' for literal leading exclamation."));
>   }
>   res->is_macro = is_macro;
>   res->num_attr = num_attr;
> @@ -640,25 +655,55 @@ static void prepare_attr_stack(const char *path)
>  
>  static int path_matches(const char *pathname, int pathlen,
>   const char *basename,
> -   

[PATCH nd/attr-match-optim-more 2/2] attr: more matching optimizations from .gitignore

2012-10-08 Thread Nguyễn Thái Ngọc Duy
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.

This patch adds those optimizations to attr. Basically it tries to
avoid fnmatch as much as possible in favor of strncmp.

A few notes from this work is put in the documentation:

* "!pattern" syntax is not supported in .gitattributes. Negative
  patterns may work well for a single attribute like .gitignore. It's
  confusing in .gitattributes are many attributes can be
  set/unset/undefined at using the same pattern.

* we support attaching attributes to directories at the syntax
  level, but we do not really attach attributes on directory or use
  them.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 No more die() for path/ syntax. Rephrase "!syntax" messages from not
 supported to forbidden. A small test to verify that we actually
 forbid it.

 Documentation/gitattributes.txt |  3 +-
 attr.c  | 67 ++---
 dir.c   |  8 ++---
 dir.h   |  1 +
 t/t0003-attributes.sh   | 14 +
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 80120ea..cc2ff1d 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -23,7 +23,7 @@ Each line in `gitattributes` file is of form:
 That is, a pattern followed by an attributes list,
 separated by whitespaces.  When the pattern matches the
 path in question, the attributes listed on the line are given to
-the path.
+the path. Only files can be attached attributes to.
 
 Each attribute can be in one of these states for a given path:
 
@@ -56,6 +56,7 @@ When more than one pattern matches the path, a later line
 overrides an earlier line.  This overriding is done per
 attribute.  The rules how the pattern matches paths are the
 same as in `.gitignore` files; see linkgit:gitignore[5].
+Unlike `.gitignore`, negative patterns are forbidden.
 
 When deciding what attributes are assigned to a path, git
 consults `$GIT_DIR/info/attributes` file (which has the highest
diff --git a/attr.c b/attr.c
index e7caee4..7e85f82 100644
--- a/attr.c
+++ b/attr.c
@@ -115,6 +115,13 @@ struct attr_state {
const char *setto;
 };
 
+struct pattern {
+   const char *pattern;
+   int patternlen;
+   int nowildcardlen;
+   int flags;  /* EXC_FLAG_* */
+};
+
 /*
  * One rule, as from a .gitattributes file.
  *
@@ -131,7 +138,7 @@ struct attr_state {
  */
 struct match_attr {
union {
-   char *pattern;
+   struct pattern pat;
struct git_attr *attr;
} u;
char is_macro;
@@ -241,9 +248,17 @@ static struct match_attr *parse_attr_line(const char 
*line, const char *src,
if (is_macro)
res->u.attr = git_attr_internal(name, namelen);
else {
-   res->u.pattern = (char *)&(res->state[num_attr]);
-   memcpy(res->u.pattern, name, namelen);
-   res->u.pattern[namelen] = 0;
+   char *p = (char *)&(res->state[num_attr]);
+   memcpy(p, name, namelen);
+   p[namelen] = 0;
+   res->u.pat.pattern = p;
+   parse_exclude_pattern(&res->u.pat.pattern,
+ &res->u.pat.patternlen,
+ &res->u.pat.flags,
+ &res->u.pat.nowildcardlen);
+   if (res->u.pat.flags & EXC_FLAG_NEGATIVE)
+   die(_("Negative patterns are forbidden in git 
attributes\n"
+ "Use '\\!' for literal leading exclamation."));
}
res->is_macro = is_macro;
res->num_attr = num_attr;
@@ -640,25 +655,55 @@ static void prepare_attr_stack(const char *path)
 
 static int path_matches(const char *pathname, int pathlen,
const char *basename,
-   const char *pattern,
+   const struct pattern *pat,
const char *base, int baselen)
 {
-   if (!strchr(pattern, '/')) {
+   const char *pattern = pat->pattern;
+   int prefix = pat->nowildcardlen;
+   const char *name;
+   int namelen;
+
+   if (pat->flags & EXC_FLAG_NODIR) {
+   if (prefix == pat->patternlen &&
+   !strcmp_icase(pattern, basename))
+   return 1;
+
+   if (pat->flags & EXC_FLAG_ENDSWITH &&
+   pat->patternlen - 1 <= pathlen &&
+   !strcmp_icase(pattern + 1, pathname +
+ pathlen - pat->patternlen + 1))
+   return 1;
+
return (fnmatch_icase(pattern, basename, 0) == 0);
}
/*
 * match with FNM_PATHNAME; t