LC_ALL=C  ?

On Sun, 14 Sep 2014, Omer Zak wrote:

Date: Sun, 14 Sep 2014 16:59:32 +0300
From: Omer Zak <w...@zak.co.il>
To: linux-il <linux-il@cs.huji.ac.il>
Subject: How to sort a file by pure ASCII order?

I encountered a counterintuitive behavior of 'sort' in modern Linux
releases.

I checked the sorting behavior of sort, as installed in Debian Jessie
and Ubuntu 14.04 LTS.
Turns out that the default behavior of sort (with locale=en_US.UTF-8) is
not to sort by ASCII order, but as if letters and digits are more
important to sort order than punctuation marks.

Attached please find a sort-test.txt file and the output of
sort < sort-test.txt (as the file actual.txt).

To show how would the output look like using pure ASCII sort, I sorted
sort-test.txt using python-sort.py (attached).
and got the result reproduced in correct.txt (attached).

The problem is then what options would get GNU sort to sort like
python-sort.py?

Can anyone shed a light on the matter?

--- Omer




--
 9590 8E58 D30D 1660 C349  673D B205 4FC4 B8F5 B7F9  ~. .~  Tk Open Systems
=}-------- Jonathan Ben-Avraham ("yba") ----------ooO--U--Ooo------------{=
mailto:y...@tkos.co.il tel:+972.52.486.3386 http://tkos.co.il skype:benavrhm
a
b
c
d
C
B
z-a
z-c
z-B
z/d
z/f
z/E
zy-a
zy/A
zy/b
zy-B
a
b
B
c
C
d
z-a
z-B
z-c
z/d
z/E
z/f
zy-a
zy/A
zy/b
zy-B
import sys
data = [line for line in sys.stdin]
data.sort()
for line in data:
  sys.stdout.write(line)
B
C
a
b
c
d
z-B
z-a
z-c
z/E
z/d
z/f
zy-B
zy-a
zy/A
zy/b
_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to