On 12/01/2014 03:06 PM, Pádraig Brady wrote: > BTW the argument that it's not a text file is a bit beside the point > as POSIX also says text files can't contain NUL chars, but we process > this just fine: > > $ printf 'a\000b' | cut -c3 > b
The fact that GNU offers an extension where we gracefully handle NUL bytes is a bonus of GNU, and does not change the fact that POSIX already says we are in unspecified territory and can do whatever we deem most useful. I suspect that in multibyte locales with non-character encoding errors, the behavior becomes harder to pinpoint on what makes the most sense - but again, that is another aspect that makes a file binary rather than text and therefore falls under unspecified behavior. > Also comparing other tools like uniq we have: > > solaris> printf '1' | uniq > solaris> (nothing output!) > > freebsd> printf '1' | uniq > 1freebsd> > > coreutl> printf '1' | uniq > 1 > coreutl> What about: printf '1\n1' | uniq GNU treats the two lines as identical (and thus supplied a missing \n on the second line); but I don't have ready access to test the other two as I type this. > If we were just implementing now, I'd not output the extra '\n', > but changing at this stage needs to be carefully considered, > and with all the textutils, not just cut(1). I tend to go the opposite - producing text output, even on non-text input, is more likely to be useful when piping files to other utilities that don't handle non-text files as gracefully as the coreutils. But I definitely agree that it is not something we change lightly. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature