bug#22236: Not exactly a bug...

2015-12-25 Thread Todd Shandelman
Hi,

I love your products.

I am using the 'uniq' command line utility on Cygwin, where I do most of my
development work.

$ uniq --version
uniq (GNU coreutils) 8.24
Packaged by Cygwin (8.24-3)
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later .
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.
$




I feel confused about the usage options, particularly those for restricting
comparison to a limited number of initial or non-initial characters or
fields.

Observe:

$ uniq --h|egrep 'char|field'
  -f, --skip-fields=N   avoid comparing the first N fields
  -s, --skip-chars=Navoid comparing the first N characters
  -w, --check-chars=N   compare no more than N characters in lines


...
...
...


So it looks like that for *chars*, 'uniq' has options to compare only the
first N chars, or *all but* the first N chars.

Whereas for *fields*, 'uniq' has only the option to skip the first N
fields, but has no corresponding option to compare *only* the first N
fields.

Why this lack of symmetry? And what do I do when I need that missing
functionality, to compare *only *an initial subset of fields in each line?

Ot, am I missing something?

Thanks!

Todd Shandelman
Houston, Texas


bug#22236: Not exactly a bug...

2015-12-25 Thread Assaf Gordon
tag 22236 notabug
close 22236
thanks

Hello Todd,

> On Dec 25, 2015, at 13:37, Todd Shandelman  wrote:

[...]

> So it looks like that for chars, 'uniq' has options to compare only the first 
> N chars, or *all but* the first N chars.

> 
> Whereas for fields, 'uniq' has only the option to skip the first N fields, 
> but has no corresponding option to compare *only* the first N fields.
> 
> Why this lack of symmetry?

This lack of symmetry originates from the POSIX standard:
  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
Which codified the existing features at that time.

GNU Coreutils' uniq program have added few more features, and there is a 
working plan to add the ability to use specific fields ( 
http://lists.gnu.org/archive/html/coreutils/2013-02/msg00082.html , 
http://lists.gnu.org/archive/html/coreutils/2013-09/msg00047.html ) but this 
has not yet been integrated into the main program - perhaps in future versions.


> And what do I do when I need that missing functionality, to compare only an 
> initial subset of fields in each line?

To print unique lines of specific fields you can use 'sort':

Example, given the following sample input file:

$ cat input.txt
1   A   10  x   100
5   B   14  z   104
2   A   11  x   101
3   B   12  y   102
4   B   13  z   103

Print only lines with unique values in columns 2 and 4:

$ sort -k2,2 -k4,4 -s -u input.txt

1   A   10  x   100
3   B   12  y   102
5   B   14  z   104

This can be extended to include as many fields as you need.
If the fields are consecutive, you can specify them as so:

$ cat input2.txt
A   x   1   97
B   x   1   96
A   x   1   99
A   x   1   98

$ sort -k1,3 -u input2.txt 
A   x   1   97
B   x   1   96





regards,
 - assaf