On 29/07/2022 19:10, Sudarshan S Chawathe wrote:
In brief, uniq seems to treat lines containing only spaces differently
when given the -f 1 option (compared to when given -f 0 or no -f
option). My question is: Is this behavior intentional (or is it a bug
in the implementation or docs)? I find it difficult to reconcile with
my understanding of the docs.
In more detail, consider a file in.txt with the following contents
(which can be reconstructed based on the descriptive line if mangled by
mailers):
Next 5 lines have, resp., 1, 2, 1, 2, and 1, blanks:
Last line
Given this input, the output of 'uniq -u -f 1 in.txt' is different from
that of 'uniq -u in.txt' and 'uniq -u -f 0 in.txt'. (With -f 1, the
blanks-only lines are all removed, but not so with the others.)
I tested the above originally on the uniq from coreutils 8.32 but later
also on 9.1.42 (built from the git sources I just pulled a short while
ago) and both versions exhibit the same behavior.
Regards,
More succinctly:
$ printf '%s\n' first blah ' ' ' ' 'l ast' | uniq -f1
first
l ast
I.e. skipping one field will compare all but the 'l ast' line as equal.
This is operating as per the POSIX standard which states:
"Ignore the first fields fields on each input line when doing comparisons,
where fields is a positive decimal integer. A field is the maximal string
matched by the basic regular expression:
[[:blank:]]*[^[:blank:]]*
If the fields option-argument specifies more fields than appear on an input
line,
a null string shall be used for comparison."
thanks,
Pádraig