I did look at the Comparison help but totally overlooked this part;

     Comparison of strings in character vectors is lexicographic within
     the strings using the collating sequence of the locale in use: see
     ‘locales’.  The collating sequence of locales such as ‘en_US’ is
     normally different from ‘C’ (which should use ASCII) and can be
     surprising.  

I've recently changed to a different Linux distribution and was trying
to work out why I was getting a different order of the factor levels
even though it was the same code.  I thought I'd inadvertantly changed
something somehow.

Much clearer now -- even if still confusing.  

Thanks for the pointer.


On Fri, 18-Mar-2016 at 10:19AM +0100, peter dalgaard wrote:

|> Ooops, that was answering the question you actually asked. The one you meant 
to ask is answered by this  part:
|> 
|> The sort order for character vectors will depend on the collating sequence 
of the locale in use: see Comparison. 
|> 
|> ...and collating sequences is a weird and woolly subject, where you cannot 
even be sure that locales of the same name on two different platforms sort 
strings in the same order.
|> 
|> -pd
|> 
|> 
|> 
|> On 18 Mar 2016, at 10:13 , peter dalgaard <pda...@gmail.com> wrote:
|> 
|> > 
|> > On 18 Mar 2016, at 10:02 , Patrick Connolly <p_conno...@slingshot.co.nz> 
wrote:
|> > 
|> >> I don't follow why this happens:
|> >> 
|> >>> sort(c(LETTERS[1:5], letters[1:5]))
|> >> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E"
|> >> 
|> >> The help for sort() says:
|> >> 
|> >> method: character string specifying the algorithm used.  Not
|> >>         available for partial sorting.  Can be abbreviated.
|> >> 
|> >> But what are the methods available?  The help mentions xtfrm but that
|> >> doesn't illuminate, I'd have thought that at least by default it would
|> >> have something to do with ASCII codes.  But that's not the case since
|> >> all the uppercase ones would be before the lowercase ones.
|> >> 
|> >> I know something different is happening but I don't know what it is
|> >> (do you, Mr Jones?).  Apologies to Bob Dylan.
|> >> 
|> > 
|> > 
|> > Um, read _all_ of the help file?
|> > 
|> > sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE,
|> >         method = c("shell", "quick"), index.return = FALSE)
|> > 
|> > [snip]
|> > 
|> > Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick 
(1986)). If x has names a stable modification is used, so ties are not 
reordered. (This only matters if names are present.)
|> > 
|> > Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort 
method and is only available when x is numeric (double or integer) and partial 
is NULL. (For other types of x Shellsort is used, silently.) It is normally 
somewhat faster than Shellsort (perhaps 50% faster on vectors of length a 
million and twice as fast at a billion) but has poor performance in the rare 
worst case. (Peto's modification using a pseudo-random midpoint is used to make 
the worst case rarer.) This is not a stable sort, and ties may be reordered.
|> > 
|> > Factors with less than 100,000 levels are sorted by radix sorting when 
method is not supplied: see sort.list.
|> > 
|> > -pd
|> > 
|> > 
|> > -- 
|> > Peter Dalgaard, Professor,
|> > Center for Statistics, Copenhagen Business School
|> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
|> > Phone: (+45)38153501
|> > Office: A 4.23
|> > Email: pd....@cbs.dk  Priv: pda...@gmail.com
|> 
|> -- 
|> Peter Dalgaard, Professor,
|> Center for Statistics, Copenhagen Business School
|> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
|> Phone: (+45)38153501
|> Office: A 4.23
|> Email: pd....@cbs.dk  Priv: pda...@gmail.com
|> 

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_                 Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                              ..... Eleanor Roosevelt
          
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to