On Tue, 18 Oct 2022 05:45:02 -0500 Rob Landley <r...@landley.net> wrote:
> $ echo -e 'one\0two' | busybox grep -l ^t > (standard input) /* BB_AUDIT GNU defects - always acts as -a. */ $ man grep | grep -A5 "^\s*-z," -z, --null-data Treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names. $ echo -e "one\0two" | ./busybox grep -l ^t $ echo -e "one\0two" | ./busybox grep -la ^t $ echo -e "one\0two" | ./busybox grep -laz ^t (standard input) $ grep --version | head -n1 grep (GNU grep) 3.8 $ echo -e "one\0two" | grep -l ^t (standard input) $ echo -e "one\0two" | grep -la ^t $ echo -e "one\0two" | grep -laz ^t (standard input) So... why does grep -l match while busybox grep -l does not? It seems that GNU/the-fabulous grep defaults to --binary-files=binary: $ echo -e "one\0two" | grep -l --binary-files=text ^t $ echo -e "one\0two" | grep -l --binary-files=binary ^t (standard input) which is what we see above, i think. >From the GNU/the-very-best grep: --binary-files=TYPE If a file's data or metadata indicate that the file contains binary data, assume that the file is of type TYPE. Non-text bytes indicate binary data; these are either output bytes that are improperly encoded for the current locale, or null input bytes when the -z option is not given. By default, TYPE is binary, and grep suppresses output after null input binary data is discovered, and suppresses output lines that contain improperly encoded data. When some output is suppressed, grep follows any output with a message to standard error saying that a binary file matches. If TYPE is without-match, when grep discovers null input binary data it assumes that the rest of the file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option. When type is binary, grep may treat non-text bytes as line terminators even without the -z option. This means choosing binary versus text can affect whether a pattern matches a file. For example, when type is binary the pattern q$ might match q immediately followed by a null byte, even though this is not matched when type is text. Conversely, when type is binary the pattern . (period) might not match a null byte. Warning: The -a option might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands. On the other hand, when reading files whose text encodings are unknown, it can be helpful to use -a or to set LC_ALL='C' in the environment, in order to find more matches even if the matches are unsafe for direct display. thanks, > > I note that the gnu/dammit grep in my devuan system (from 2018) also gets this > wrong without -a, but gets it right with -a? > > $ echo -e 'one\0two' | grep -l ^t > (standard input) > $ echo -e 'one\0two' | grep -al ^t > $ > > Which is just extremely gnu. The gnu/dammit sed gets it right: > > $ echo -e 'one\0two' | sed 's/^t/x/' | hd > 00000000 6f 6e 65 00 74 77 6f 0a |one.two.| > 00000008 > $ echo -e 'one\0two' | sed 's/t/x/' | hd > 00000000 6f 6e 65 00 78 77 6f 0a |one.xwo.| > 00000008 > > Rob > _______________________________________________ > busybox mailing list > busybox@busybox.net > http://lists.busybox.net/mailman/listinfo/busybox _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox