This v3 has a new patch (3/10) that I believe fixes the regression on
MinGW Johannes noted in
https://public-inbox.org/git/[email protected]/
As noted in the updated commit message in 10/10 I believe just
skipping this test & documenting this in a commit message is the least
amount of suck for now. It's really an existing issue with us doing
nothing sensible when the log/grep haystack encoding doesn't match the
needle encoding supplied via the command line.
We swept that under the carpet with the kwset backend, but PCRE v2
exposes it.
Ævar Arnfjörð Bjarmason (10):
log tests: test regex backends in "--encode=<enc>" tests
grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>"
t4210: skip more command-line encoding tests on MinGW
grep: inline the return value of a function call used only once
grep tests: move "grep binary" alongside the rest
grep tests: move binary pattern tests into their own file
grep: make the behavior for NUL-byte in patterns sane
grep: drop support for \0 in --fixed-strings <pattern>
grep: remove the kwset optimization
grep: use PCRE v2 for optimized fixed-string search
Documentation/git-grep.txt | 17 +++
grep.c | 115 +++++++---------
grep.h | 3 +-
revision.c | 3 +
t/t4210-log-i18n.sh | 41 +++++-
...a1.sh => t7008-filter-branch-null-sha1.sh} | 0
...08-grep-binary.sh => t7815-grep-binary.sh} | 101 --------------
t/t7816-grep-binary-pattern.sh | 127 ++++++++++++++++++
8 files changed, 234 insertions(+), 173 deletions(-)
rename t/{t7009-filter-branch-null-sha1.sh =>
t7008-filter-branch-null-sha1.sh} (100%)
rename t/{t7008-grep-binary.sh => t7815-grep-binary.sh} (55%)
create mode 100755 t/t7816-grep-binary-pattern.sh
Range-diff:
1: cfc01f49d3 = 1: cfc01f49d3 log tests: test regex backends in
"--encode=<enc>" tests
2: 4b59eb32f0 = 2: 4b59eb32f0 grep: don't use PCRE2?_UTF8 with "log
--encoding=<non-utf8>"
-: ---------- > 3: 676c76afe4 t4210: skip more command-line encoding tests
on MinGW
3: cc4d3b50d5 = 4: da9b491f70 grep: inline the return value of a function
call used only once
4: d9b29bdd89 = 5: c42d3268fa grep tests: move "grep binary" alongside the
rest
5: f85614f435 = 6: 36b9c1c541 grep tests: move binary pattern tests into
their own file
6: 90afca8707 = 7: 3c54e782e6 grep: make the behavior for NUL-byte in
patterns sane
7: 526b925fdc = 8: 8e5f418189 grep: drop support for \0 in --fixed-strings
<pattern>
8: 14269bb295 = 9: d1cb8319d5 grep: remove the kwset optimization
9: c0fd75d102 ! 10: 4de0c82314 grep: use PCRE v2 for optimized fixed-string
search
@@ -15,6 +15,15 @@
makes the behavior harder to understand and document, and makes tests
for the different backends more painful.
+ This does change the behavior under non-C locales when "log"'s
+ "--encoding" option is used and the heystack/needle in the
+ content/command-line doesn't have a matching encoding. See the recent
+ change in "t4210: skip more command-line encoding tests on MinGW" in
+ this series. I think that's OK. We did nothing sensible before
+ then (just compared raw bytes that had no hope of matching). At least
+ now the user will get some idea why their grep/log never matches in
+ that edge case.
+
I could also support the PCRE v1 backend here, but that would make the
code more complex. I'd rather aim for simplicity here and in future
changes to the diffcore. We're not going to have someone who
--
2.22.0.455.g172b71a6c5