Re: Question about uniq's treatment of spaces-only lines

Pádraig Brady Sat, 30 Jul 2022 05:25:56 -0700

On 29/07/2022 19:10, Sudarshan S Chawathe wrote:

In brief, uniq seems to treat lines containing only spaces differently
when given the -f 1 option (compared to when given -f 0 or no -f
option).  My question is: Is this behavior intentional (or is it a bug
in the implementation or docs)?  I find it difficult to reconcile with
my understanding of the docs.


In more detail, consider a file in.txt with the following contents
(which can be reconstructed based on the descriptive line if mangled by
mailers):

Next 5 lines have, resp., 1, 2, 1, 2, and 1, blanks:

Last line


Given this input, the output of 'uniq -u -f 1 in.txt' is different from
that of 'uniq -u in.txt' and 'uniq -u -f 0 in.txt'.  (With -f 1, the
blanks-only lines are all removed, but not so with the others.)

I tested the above originally on the uniq from coreutils 8.32 but later
also on 9.1.42 (built from the git sources I just pulled a short while
ago) and both versions exhibit the same behavior.

Regards,


More succinctly:

  $ printf '%s\n' first blah ' ' '  ' 'l ast' | uniq -f1
  first
  l ast

I.e. skipping one field will compare all but the 'l ast' line as equal.
This is operating as per the POSIX standard which states:

"Ignore the first fields fields on each input line when doing comparisons,
where fields is a positive decimal integer. A field is the maximal string
matched by the basic regular expression:

[[:blank:]]*[^[:blank:]]*

If the fields option-argument specifies more fields than appear on an input 
line,
a null string shall be used for comparison."

thanks,
Pádraig

Re: Question about uniq's treatment of spaces-only lines

Reply via email to