David Hall (coding) wrote:
> The sort command appears to ignore (as in not include) locale 
> punctuation when splitting data into fields. This has a side effect of 
> causing a numeric sort to fail to correctly sort comma-separated files.

Unfortunately that is by design of the locale collation sequence.
There is nothing that sort can do about it.

> The following is a test demonstrating this:
> 
> $ export LANG=C
> $ (echo 10#1;echo 1#10) | sort -n -t '#'
> 1#10
> 10#1
> $ (echo 10,1;echo 1,10) | sort -n -t ','
> 1,10
> 10,1
> $ export LANG=en_NZ.UTF-8
> $ (echo 10#1;echo 1#10) | sort -n -t '#'
> 1#10
> 10#1

All are as expected.  But why specify -t without -k?

> $ (echo 10,1;echo 1,10) | sort -n -t ','
> 10,1
> 1,10

If you want sort to sort on individual fields try the -k option.

  (echo 10,1;echo 1,10) | sort -t , -k 1,1n -k 2,2n
  1,10
  10,1

> Now that I'm aware that the locale is an issue, I can update my scripts 
> to compensate, but this is a problem that is likely to bite other people 
> who use the sort command for comma-separated files.

Yes.  I hate the choices made for the locale data.  This is an FAQ.  I
personally set the following in my environment.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

> NOTE: This bug may be similar to others, #367891, #409221, #353909, and 
> #223068. Please mark as duplicate if you do not think there is any 
> additional issues identified by this bug.

YOu seem to be aware that sort sorts across an entire line unless -k
is given and you are aware that sort uses locale defined collating
sequence which folds case and ignore punctuation because both of those
are identified in those bugs that you referenced.  But you obviously
thought this was different in some way.  But I fail to see the
difference myself.

Bob


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to