Bug#775197: marked as done (uniq: -u -d -D options non-orthogonal and confusing when combined)

Debian Bug Tracking System Tue, 13 Jan 2015 06:34:21 -0800

Your message dated Mon, 12 Jan 2015 14:29:42 +0000
with message-id <[email protected]>
and subject line Re: Bug#775197: uniq: -u -d -D options non-orthogonal and 
confusing when combined
has caused the Debian Bug report #775197,
regarding uniq: -u -d -D options non-orthogonal and confusing when combined
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
775197: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=775197
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: coreutils
Version: 8.13-3.5
Severity: normal

I was attempting to use uniq to categorise my data based on the first so
many characters and I discover that:

a) it is currently impossible to use uniq to output all lines; with lines 
  grouped by initial prefix ( -w N ) and separated by an empty line 
  (--all-repeated=separate) because there is no way to specify outputing
  all lines
b) the combination behaviour of -u -d and -D is odd and suboptimal.

Here is an example with a small example dataset:

:; cat > uniq-test
AAA
AAB
ABA
ABC
ACA
ADA
ADD
ADE
:; uniq -w 2 -u uniq-test
ACA
:; uniq -w 2 -d uniq-test
AAA
ABA
ADA
:; uniq -w 2 -D uniq-test
AAA
AAB
ABA
ABC
ADA
ADD
ADE
:; uniq -w 2 -ud uniq-test
:; uniq -w 2 -du uniq-test
:; uniq -w 2 --all-repeated=separate uniq-test
AAA
AAB

ABA
ABC

ADA
ADD
ADE
:; uniq -w 2 -u --all-repeated=separate uniq-test
AAA

ABA

ADA
ADD
:; uniq -w 2 -c -D uniq-test
uniq: printing all duplicated lines and repeat counts is meaningless
Try `uniq --help' for more information.
:;

 So in summary:

 -ud or -du produces no output; but doesn't produce an error (where -c -D does)

 -u -D produces unexpected output (all the repeated lines except the last one 
   for each set).

 There is no way to output all lines, with separations.

 There are a number of ways to address this issue; but I think the best one
would be to correct behavior and documentation such that:

 -u outputs any lines which are unique
 -d outputs the first of any lines which are duplicated
 -D outputs all lines which are duplicated
 --all-repeated=METHOD seperates any groups of lines in the specified behavior

 Such that the output I wanted would be produced with:

:; uniq -w 2 -u --all-repeated=separate uniq-test
AAA
AAB

ABA
ABC

ACA

ADA
ADD
ADE
:;

 Thanks,

J.

-- System Information:
Debian Release: 7.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-4-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages coreutils depends on:
ii  dpkg          1.16.15
ii  install-info  4.13a.dfsg.1-10
ii  libacl1       2.2.51-8
ii  libattr1      1:2.4.46-8
ii  libc6         2.13-38+deb7u4
ii  libselinux1   2.1.9-5

coreutils recommends no packages.

coreutils suggests no packages.

-- no debconf information

--- End Message ---

--- Begin Message ---

On 12/01/15 14:13, Jonathan David Amery wrote:
> Package: coreutils
> Version: 8.13-3.5
> Severity: normal
> 
> I was attempting to use uniq to categorise my data based on the first so
> many characters and I discover that:
> 
> a) it is currently impossible to use uniq to output all lines; with lines 
>   grouped by initial prefix ( -w N ) and separated by an empty line 
>   (--all-repeated=separate) because there is no way to specify outputing
>   all lines
> b) the combination behaviour of -u -d and -D is odd and suboptimal.
> 
> Here is an example with a small example dataset:
> 
> :; cat > uniq-test
> AAA
> AAB
> ABA
> ABC
> ACA
> ADA
> ADD
> ADE
> :; uniq -w 2 -u uniq-test
> ACA
> :; uniq -w 2 -d uniq-test
> AAA
> ABA
> ADA
> :; uniq -w 2 -D uniq-test
> AAA
> AAB
> ABA
> ABC
> ADA
> ADD
> ADE
> :; uniq -w 2 -ud uniq-test
> :; uniq -w 2 -du uniq-test
> :; uniq -w 2 --all-repeated=separate uniq-test
> AAA
> AAB
> 
> ABA
> ABC
> 
> ADA
> ADD
> ADE
> :; uniq -w 2 -u --all-repeated=separate uniq-test
> AAA
> 
> ABA
> 
> ADA
> ADD
> :; uniq -w 2 -c -D uniq-test
> uniq: printing all duplicated lines and repeat counts is meaningless
> Try `uniq --help' for more information.
> :;
> 
>  So in summary:
> 
>  -ud or -du produces no output; but doesn't produce an error (where -c -D 
> does)
> 
>  -u -D produces unexpected output (all the repeated lines except the last one 
>    for each set).
> 
>  There is no way to output all lines, with separations.
> 
>  There are a number of ways to address this issue; but I think the best one
> would be to correct behavior and documentation such that:
> 
>  -u outputs any lines which are unique
>  -d outputs the first of any lines which are duplicated
>  -D outputs all lines which are duplicated
>  --all-repeated=METHOD seperates any groups of lines in the specified behavior
> 
>  Such that the output I wanted would be produced with:
> 
> :; uniq -w 2 -u --all-repeated=separate uniq-test
> AAA
> AAB
> 
> ABA
> ABC
> 
> ACA
> 
> ADA
> ADD
> ADE
> :;
> 
>  Thanks,
> 
> J.

Upstream has gone with uniq --group since version 8.22

--- End Message ---

Bug#775197: marked as done (uniq: -u -d -D options non-orthogonal and confusing when combined)

Reply via email to