bug#14996: man uniq -d mention more

2013-07-31 Thread jidanni
man uniq:
   -d, --repeated
  only print duplicate lines
ADD:
, one for each group.

Even though the Info page says something.





bug#14988: sort enhancement request

2013-07-31 Thread Eric Blake
tag 14988 notabug
thanks

[re-adding the list; and please don't top-post on technical lists]

On 07/31/2013 07:19 AM, Danny Nicholas wrote:
> Thank you Eric.  We have two sorts on our system.  Our /usr/bin/sort does not 
> support the -s option,

Makes sense - the '-s' option is a GNU extension, and your /usr/bin/sort
is probably not GNU sort.  If you want stable sorting using only POSIX
features, then you have to supply enough sort keys so that no two lines
ever compare equal (since POSIX has no way to disable the full-line sort
of last resort).  And depending on your input to be sorted; this may
indeed require a pre-filter run that adds line numbering (by the way,
sed's '=' command can do this much more efficiently than a python
script), then sorting, then a post-filter run that removes the line number.

> but our /usr/local/bin/sort does.

Indeed - life is simpler if you can write your script to ensure that it
always sets PATH to use the full power of the GNU tools.

>  Unfortunately, that did not resolve the issue. Here is a portion of the file 
> I'm trying to sort

Thank you - THIS makes much more sense for understanding your problem.

> 010_01_731_1_20081610_
> 010_01_731_2_20081610_ 4102 LANGUAGE 
> EN
> 010_01_731_3_20081610_ YES
> 010_01_731_3_20081610_ 010
> 010_01_731_3_20081610_ 
> 06/12/2013
> 010_01_731_3_20081610_
> 277.59
> 010_01_731_3_20081610_ 
> 
> 010_01_731_3_20081610_ PAGE1
> 010_01_731_4_20081610_ 
> REGULAR
> 010_01_731_5_20081610_ PRINTER
> 010_01_731_6_20081610_ S
> 010_01_731_7_20081610_ PRINTER
> 010_01_731_8_20081610_ R3P
> 
> What I am executing is /usr/local/bin/sort -k 1,36 -s file -o file2

So, with "-k1,36" you asked sort to treat as its sort key the portion of
the line ranging from the first field to the 36th field.  I only see 2
fields in most of the lines (a few have more, but none of them with 36
fields), so you are basically sorting by the entire line.  You didn't
provide any other keys, but since your first key is already botched as
the ENTIRE line, there were no lines that compared equal for -s to make
any difference.  Again, sort --debug makes this clear (using a subset of
just two lines of your input):

>> $ printf '010_01_731_3_20081610_ 
>> \n010_01_731_3_20081610_
>>  PAGE1\n' \
>>| LC_ALL=C sort --debug -k1,36 -s
>> sort: using simple byte comparison
>> 010_01_731_3_20081610_ PAGE1
>> ___
>> 010_01_731_3_20081610_ 
>> 
>> 

But it appears that what you WANTED was to sort on just the first 36
bytes, with a stable sort of the results.  If so, then ASK for that, by
using the correct -k option:

>> $ printf '010_01_731_3_20081610_ 
>> \n010_01_731_3_20081610_
>>  PAGE1\n' \
>>| LC_ALL=C sort --debug -k1,1.36 -s
>> sort: using simple byte comparison
>> 010_01_731_3_20081610_ 
>> 
>> 
>> 010_01_731_3_20081610_ PAGE1
>> 

Note how I asked for a sort key -k1,1.36, which says to start in the
first field, and end 36 bytes into the first field (hmm, it looks like
you actually want 38 bytes - but I'll leave that for you to decide).
Also note that -s now makes a difference, when the content of that first
sort key is identical so the last-resort full-line comparison swaps
unequal lines when -s is not used:

>> $ printf '010_01_731_3_20081610_ 
>> \n010_01_731_3_20081610_
>>  PAGE1\n' \
>>| LC_ALL=C sort --debug -k1,1.36
>> sort: using simple byte comparison
>> 010_01_731_3_20081610_ PAGE1
>> 
>> ___
>> 010_01_731_3_20081610_ 
>> 
>> 
>> 

As this is a case of you not passing the correct command line arguments,
rather than a bug in sort, I am marking this bug as closed.  However,
feel free to continue to comment on the topic (preferably on-list) if
you have more questions.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature