It is documented to depend on your locale.  I get

>  sort(x)
[1] " A" " B" " C" "A"  "B"  "C"

in the C locale.  The help page does say so:

     The sort order for character vectors will depend on the collating
     sequence of the locale in use: see 'Comparison'.

The default collation sequences for standard locales in Linux distros are
quite unintuitive (and are not character-by-character either).  If you 
want ASCII, ask for it by LC_COLLATE=C.


On Thu, 19 Aug 2004 [EMAIL PROTECTED] wrote:

> The following is not what I expected in sorting characters (single letters 
> and the same letters with preceding spaces).
> Can someone enlighten me as to why the following might be a correct result 
> for sorting?
> 
> ; x <- c(LETTERS[1:3], paste(" ", LETTERS[1:3], sep=""))
> ; x
> [1] "A"  "B"  "C"  " A" " B" " C"
> ; sort(x)
> [1] "A"  " A" "B"  " B" "C"  " C"
> ; sort(x, method="shell")
> [1] "A"  " A" "B"  " B" "C"  " C"
> ; sort(x, method="quick")
> [1] "A"  " A" "B"  " B" "C"  " C"
> 
> I would expect the result to be " A" " B" " C" "A"  "B"  "C" instead, 
> going by ASCII codes (and a quick check with S-Plus 6.2 shows that this is 
> what S-Plus thinks the sorted sequence is).

That explicitly says it uses ASCII.  I believe that is a deficiency they 
plan to correct.

-- 
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to