On Thu, Oct 11, 2018 at 05:19:06AM -0500, dana wrote:
> Hello,
> 
> I'm a contributor to ripgrep, which is a grep-like tool that supports using
> gitignore files to control which files are searched in a repo (or any other
> directory tree). ripgrep's support for the patterns in these files is based on
> git's official documentation, as seen here:
> 
>   https://git-scm.com/docs/gitignore
> 
> One of the most common reports on the ripgrep bug tracker is that it does not
> allow patterns like the following real-world examples, where a ** is used 
> along
> with other text within the same path component:
> 
>   **/**$$*.java
>   **.orig
>   **local.properties
>   !**.sha1
> 
> The reason it doesn't allow them is that the gitignore documentation 
> explicitly
> states that they're invalid:
>
> ...

I've checked the code and run some tests. There is a twist here. "**"
is only special when matched in "pathname" mode. That is when the
pattern contains at least one slash. In your patterns above, that only
applies to the first pattern.

When '**' is special, if it's neither '**/', '/**/' or '/**', it _is_
considered invalid (i.e. bad pattern) and the pattern will not match
anything.

The confusion comes from when '**' is not special for the remaining
three patterns, it's considered as regular '*' and still matches
stuff.

So, I think we have two options. The document could be clarified with
something like this

-- 8< --
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index d107daaffd..500cd43939 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -100,7 +100,8 @@ PATTERN FORMAT
    a shell glob pattern and checks for a match against the
    pathname relative to the location of the `.gitignore` file
    (relative to the toplevel of the work tree if not from a
-   `.gitignore` file).
+   `.gitignore` file). Note that the "two consecutive asterisks" rule
+   below does not apply.
 
  - Otherwise, Git treats the pattern as a shell glob: "`*`" matches
    anything except "`/`", "`?`" matches any one character except "`/`"
@@ -129,7 +130,8 @@ full pathname may have special meaning:
    matches zero or more directories. For example, "`a/**/b`"
    matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
 
- - Other consecutive asterisks are considered invalid.
+ - Other consecutive asterisks are considered invalid and the pattern
+   is ignored.
 
 NOTES
 -----
-- 8< --

Or we could make the behavior consistent. If '**' is invalid, just
consider it two separate regular '*'. Then all four of your patterns
will behave the same way. The change for that is quite simple

-- 8< --
diff --git a/wildmatch.c b/wildmatch.c
index d074c1be10..64087bf02c 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -104,8 +104,10 @@ static int dowild(const uchar *p, const uchar *text, 
unsigned int flags)
                                            dowild(p + 1, text, flags) == 
WM_MATCH)
                                                return WM_MATCH;
                                        match_slash = 1;
-                               } else
-                                       return WM_ABORT_MALFORMED;
+                               } else {
+                                       /* without WM_PATHNAME, '*' == '**' */
+                                       match_slash = flags & WM_PATHNAME ? 0 : 
1;
+                               }
                        } else
                                /* without WM_PATHNAME, '*' == '**' */
                                match_slash = flags & WM_PATHNAME ? 0 : 1;
-- 8< --

Which way should we go? I'm leaning towards the second one...
--
Duy

Reply via email to