On 01/12/14 23:06, Eric Blake wrote: > On 12/01/2014 03:06 PM, Pádraig Brady wrote: > >> BTW the argument that it's not a text file is a bit beside the point >> as POSIX also says text files can't contain NUL chars, but we process >> this just fine: >> >> $ printf 'a\000b' | cut -c3 >> b > > The fact that GNU offers an extension where we gracefully handle NUL > bytes is a bonus of GNU, and does not change the fact that POSIX already > says we are in unspecified territory and can do whatever we deem most > useful. I suspect that in multibyte locales with non-character encoding > errors, the behavior becomes harder to pinpoint on what makes the most > sense - but again, that is another aspect that makes a file binary > rather than text and therefore falls under unspecified behavior. > > >> Also comparing other tools like uniq we have: >> >> solaris> printf '1' | uniq >> solaris> (nothing output!) >> >> freebsd> printf '1' | uniq >> 1freebsd> >> >> coreutl> printf '1' | uniq >> 1 >> coreutl> > > What about: > printf '1\n1' | uniq
Both solaris and FreeBSD behave like GNU with that input. > GNU treats the two lines as identical (and thus supplied a missing \n on > the second line); but I don't have ready access to test the other two as > I type this. > >> If we were just implementing now, I'd not output the extra '\n', >> but changing at this stage needs to be carefully considered, >> and with all the textutils, not just cut(1). > > I tend to go the opposite - producing text output, even on non-text > input, is more likely to be useful when piping files to other utilities > that don't handle non-text files as gracefully as the coreutils. But I > definitely agree that it is not something we change lightly. > cheers, Pádraig.