Le 1 dÃc. 03, Ã 16:46, Jarkko Hietaniemi a Ãcrit :

Thank you both for your replies. What about sorting words in one particular
language, is Perl's sort() good enough? I'm wondering, since language isn't
one of sort()'s arguments.

First we need to define "good enough"... again, if you are sorting "simple" English or Hawaiian, you are probably fine. But as soon as your "words" contain real-life complications like

- letters like à or or à or à or ...
- beyond-Latin-1-letters like Ä or Å or Ð or × or à or ã or ... - peoples' names
- acronyms and the like
- do all the characters matter or just the letters
- sorting mixed letters and digits
- Roman numbers


you are on your own. For the first item the use of the locale pragma can help
as long as your data is 8-bit and in one locale. As soon as data becomes Unicode,
Perl will as far as I know ignore localeness for sorting.


If you find yourself wanting some complex sorting, look into CPAN, what you
can find from search.cpan.org with "sort", for example Sort::ArbBiLex might
be useful.

Ok, this is in line with what how I understood this paragraph in perluniintro:


The short answer is that by default, Perl compares strings ("lt",
"le", "cmp", "ge", "gt") based only on the code points of the char-
acters. In the above case, the answer is "after", since 0x00C1 >
0x00C0.


So is it just by chance that these French words are accurately sorted?

% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort qw(cÃte cÃtà cote cotÃ)'
cote cotà cÃte cÃtÃ


Thanks,
--
Eric Cholet



Reply via email to