On 07/01/2011 07:22 PM, Peng Yu wrote: > Hi, > > The following explanation for coreutils manual is not very clear. > > "Also note that the ‘n’ modifier was applied to the field-end > specifier for the first key. It > would have been equivalent to specify ‘-k 2n,2’ or ‘-k 2n,2n’. All > modifiers except ‘b’ > apply to the associated field, regardless of whether the modifier > character is attached > to the field-start and/or the field-end part of the key specifier."
Maybe it also helps to read the POSIX wording for this same feature: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html The keydef argument is a restricted sort key field definition. The format of this definition is: field_start[type][,field_end[type]] where field_start and field_end define a key field restricted to a portion of the line (see the EXTENDED DESCRIPTION section), and type is a modifier from the list of characters 'b' , 'd' , 'f' , 'i' , 'n' , 'r' . The 'b' modifier shall behave like the -b option, but shall apply only to the field_start or field_end to which it is attached. The other modifiers shall behave like the corresponding options, but shall apply only to the key field to which they are attached; they shall have this effect if specified with field_start, field_end, or both. If any modifier is attached to a field_start or to a field_end, no option shall apply to either. > > According to the manual and the following output, '-k 1,2n' is the > same as '-k 1n,2' and '-k 1n,2n'. But isn't this syntax a little > confusing? Shouldn't '-k 1n,2n' be the same as '-k1,1n -k2,2n'? No. '-k 1n,2' says to treat the combination of fields 1 and 2 as a single numeric string, and is generally not what you want. Meanwhile, '-k 1n,1 -2n,2' says to treat both field 1 and field 2 as numeric strings, where field 2 is used to break ties when field 1 compares equal. > > Also I don't understand what "associated field" refers to? The "associated field" is the -k1,1 portion. Most letters can be written on the start, end, or both positions of the -k1,1 field, at which point that entire key takes on that option letter. But b is special, in that ignoring blanks of just the start or just the end makes sense, so it only applies to the half of the associated -k1,1 field where the b appears. Perhaps you might gain further understanding of this by using the --debug option. > > >> cat input1.txt > 1 10 > 1 9 >> sort --key=1,2n input1.txt > 1 10 > 1 9 $ printf '1 10\n1 9\n' | LC_ALL=C sort --debug -k1,2n sort: using simple byte comparison sort: key 1 is numeric and spans multiple fields 1 10 _ ____ 1 9 _ ___ Here, -k1,2n means to sort the single key comprised of fields 1 and 2 as a number (but the number necessarily ends at the end of field 1), with a fall-back sort to the lexicographical sort of the entire line. '9' > '1' lexicographically, even though "10" > "9" numerically. >> sort --key=1n,2n input1.txt > 1 10 > 1 9 >> sort --key=1,1n --key=2,2n input1.txt > 1 9 > 1 10 That's better - you have now separated the two numeric keys, as evidenced by --debug not warning you about spanning multiple fields: $ printf '1 10\n1 9\n' | LC_ALL=C ../coreutils/src/sort --debug -k1,1n -k2,2n ../coreutils/src/sort: using simple byte comparison 1 9 _ _ ___ 1 10 _ __ ____ -- Eric Blake [email protected] +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
