Package: ripgrep Version: 0.10.0-2 Severity: important Tags: upstream
Hi, with several GB via STDIN, rg as well as rg -F immediately exited without any output while fgrep found many hits until it issued the warning "Binary file (standard input) matches". Consider the following example (based on the attached file): → cat -v aeh.txt a M-CM-$ ^@ a → cat aeh.txt | fgrep a Binary file (standard input) matches → cat aeh.txt | fgrep -a a a a → cat aeh.txt | rg a a → cat aeh.txt | rg -a a a a In the third example with "rg a", rg neither crashed nor issued a warning. fgrep in comparison issued a warning. While the above example might be close to what fgrep does, just without the warning, the following example is even worse: → cat aeh.txt | fgrep ä Binary file (standard input) matches → echo $? 0 → cat aeh.txt | fgrep ö → echo $? 1 → cat aeh.txt | rg ä → echo $? 1 → cat aeh.txt | rg ö → echo $? 1 → cat aeh.txt | rg -a ä ä → echo $? 0 So fgrep properly indicates with the exit code if there was a hit even though it didn't output anything besides the warning about binary junk. But even though the hit would have been before the NUL byte, rg claims (via exit code) that there is no hit inside the STDIN despite "rg -a" says otherwise (via output and exit code). "cat aeh.txt | strace rg ä" shows that it exits rather quickly after having read the NUL byte: read(0, "a\n\303\244\n\0\na\n", 8192) = 9 sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0 munmap(0x7fbfac3d0000, 8192) = 0 exit_group(1) = ? +++ exited with 1 +++ Constraints to trigger the issue: data must contain a NUL byte and neither of the options "-a" and "--text" must be set. On larger files (gigabytes) it is obvious that rg exits preliminarily if the NUL byte is close to the beginning solely because of how quick the command exits. We actually discovered the issue that way: rg exited way too quickly and without any output at all, especially in comparison to fgrep. Impact: Does not indicate that there were hits and preliminarily exits without further notice, hence can yield wrong results (exit code as well as output) without any indication of there being an issue. Workaround: always use option -a or --text when contents might contain binary junk. P.S.: Yes, fgrep/grep/egrep also has its issues there like the warning being on STDOUT, not STDERR, but it's still much more clear in indicating the issue compared to rg. P.P.S.: I also tried to see if the options -F and --no-encoding make a difference in this case, but they don't. P.P.P.S.: This might be related to https://github.com/BurntSushi/ripgrep/issues/1207
a ä