[
https://issues.apache.org/jira/browse/RAT-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894999#comment-17894999
]
Claude Warren commented on RAT-265:
-----------------------------------
[~raphinesse]
The latest unreleased version of RAT (0.17-SNAPSHOT) makes significant advances
in filtering files. Specific to your case the include/exclude capability that
was previously only available in Maven is now part of core and can be specified
on the command line.
Please download and run the "–help" command for a complete list of how to
include/exclude files
The major changes are:
* The inclusion/exclusion that was defined for Maven has been moved to core so
that it is now available across all UIs.
* The ability to include/exclude standard sets of files and read standard
exclusion files (e.g. the standard GIT exclusion will ignore "*{*}/.git/{*}{*}"
and "{*}*/.gitignore" and will also process all ".gitignore" files in the
search tree)
* The ability to use standard Plexus based include/exclude patterns: (e.g. **,
*, ?) as well a Plexus supported regular expressions (e.g. "%regex[<regex
pattern>]")
* Extensive help update to explain what files are and are not included in the
various include/exclude options.
I would like to close this issue and would appreciate any feedback from you.
> CLI: Certain wildcard file filters do not work anymore
> ------------------------------------------------------
>
> Key: RAT-265
> URL: https://issues.apache.org/jira/browse/RAT-265
> Project: Apache Rat
> Issue Type: Sub-task
> Components: Client - cli
> Affects Versions: 0.13, 0.14
> Reporter: Raphael von der Grün
> Assignee: Claude Warren
> Priority: Major
> Fix For: 0.17
>
>
> Run the following command in the root of the `rat` repo:
> {noformat}
> java -jar apache-rat-0.14-20191120.132901-66.jar -e "*.txt" -d
> apache-rat-core/src/test/resources/violations{noformat}
> This will give the following output on `stderr`:
> {noformat}
> Will skip given exclusion '*.txt' due to
> java.util.regex.PatternSyntaxException: Dangling meta character '*' near
> index 0
> *.txt
> ^
> {noformat}
> Furthermore, `bad.txt` will NOT be excluded from the license check.
> The error that causes this is thrown in [line 132 of
> `org.apache.rat.Report.java`|#L132]]. The reason is simple: any glob pattern
> that starts with `*` or `?` is not a valid regex. When Line 132 throws, the
> next two lines will also be skipped, so the pattern will not be added at all.
> Unfortunately, a solution to this problem is not so simple. In `v0.12` the
> `-e` option always added wildcard filters while `-E` always added regex
> filters. The documentation still states the same in the latest `v0.14`
> snapshot. Beginning with `v0.13` the code tries to add any exclude rule as
> three different filters. I believe this approach is inherently flawed.
> Firstly, the `new NameFileFilter(exclusion)` is redundant if we also add `new
> WildcardFileFilter(exclusion)`. The files matched by the `NameFileFilter` are
> a subset of those matched by the `WildcardFileFilter` since any magic
> character (i.e. `?` or `*`) in `exclusion` also matches itself when used in a
> `WildcardFileFilter`.
> So let's assume we only register the `WildcardFileFilter` and the
> `RegexFileFilter`. Even if we properly add patterns as wildcard filters that
> are not a valid RegEx, there are still patterns where we cannot decide what
> the user's intention was. Consider the pattern `bi.ini`. Should it be
> interpreted as a wildcard pattern and match only itself or should it be
> interpreted as a regex and also match `bikini` for example?
> My recommendation for a quick patch solution would be to go back to the
> exclusion behavior of `v0.12`.
> Beyond that, the nicest solution IMHO would be support for ignore files with
> the same semantics as `.gitignore` (via `-E`) and support for giving extended
> shell globs via `-e`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)