On 28/05/2010 9:24 AM, (Ted Harding) wrote:
An experiment:

  sort(c("AACD","A CD"))
  #  [1] "AACD" "A CD"

  sort(c("ABCD","A CD"))
  #  [1] "ABCD" "A CD"

  sort(c("ACCD","A CD"))
  #  [1] "ACCD" "A CD"

  sort(c("ADCD","A CD"))
  #  [1] "A CD" "ADCD"

  sort(c("AECD","A CD"))
  #  [1] "A CD" "AECD"
  ## (with results for "AFCD", ... "AZCD" similar to the last two).

  LC_COLLATE=en_GB.UTF-8

(R version 2.11.0 (2010-04-22) on Linux).

So this behaves, in en_GB.UTF-8, as though " " (SPACE) is between
"C" and "D".

This is nuts!!!

Curable if I set (e.g.) LC_LOCALE="C" on startup. But what else
might break if I do so?

You have to realize that to a large extent this is not under our control. Your system will have linked to some library (outside of R) to do string collation, and the problem lies in that library. You should determine which system library is handling your collations.

I'd like to tell you how to do that, but I don't know for your build. You can find out if you're using the recommended ICU library by running example(icuSetCollate); that gives a number of warnings like

In icuSetCollate(locale = "da_DK", case_first = "default") :
 ICU is not supported on this build

in Windows. If you don't see those, then you want to talk to the ICU people. If you do, then you'll need to look deeper to find out what you're actually using.

Duncan Murdoch
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-10                                       Time: 14:24:08
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to