I did look at the Comparison help but totally overlooked this part; Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see ‘locales’. The collating sequence of locales such as ‘en_US’ is normally different from ‘C’ (which should use ASCII) and can be surprising.
I've recently changed to a different Linux distribution and was trying to work out why I was getting a different order of the factor levels even though it was the same code. I thought I'd inadvertantly changed something somehow. Much clearer now -- even if still confusing. Thanks for the pointer. On Fri, 18-Mar-2016 at 10:19AM +0100, peter dalgaard wrote: |> Ooops, that was answering the question you actually asked. The one you meant to ask is answered by this part: |> |> The sort order for character vectors will depend on the collating sequence of the locale in use: see Comparison. |> |> ...and collating sequences is a weird and woolly subject, where you cannot even be sure that locales of the same name on two different platforms sort strings in the same order. |> |> -pd |> |> |> |> On 18 Mar 2016, at 10:13 , peter dalgaard <pda...@gmail.com> wrote: |> |> > |> > On 18 Mar 2016, at 10:02 , Patrick Connolly <p_conno...@slingshot.co.nz> wrote: |> > |> >> I don't follow why this happens: |> >> |> >>> sort(c(LETTERS[1:5], letters[1:5])) |> >> [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" |> >> |> >> The help for sort() says: |> >> |> >> method: character string specifying the algorithm used. Not |> >> available for partial sorting. Can be abbreviated. |> >> |> >> But what are the methods available? The help mentions xtfrm but that |> >> doesn't illuminate, I'd have thought that at least by default it would |> >> have something to do with ASCII codes. But that's not the case since |> >> all the uppercase ones would be before the lowercase ones. |> >> |> >> I know something different is happening but I don't know what it is |> >> (do you, Mr Jones?). Apologies to Bob Dylan. |> >> |> > |> > |> > Um, read _all_ of the help file? |> > |> > sort.int(x, partial = NULL, na.last = NA, decreasing = FALSE, |> > method = c("shell", "quick"), index.return = FALSE) |> > |> > [snip] |> > |> > Method "shell" uses Shellsort (an O(n^{4/3}) variant from Sedgewick (1986)). If x has names a stable modification is used, so ties are not reordered. (This only matters if names are present.) |> > |> > Method "quick" uses Singleton (1969)'s implementation of Hoare's Quicksort method and is only available when x is numeric (double or integer) and partial is NULL. (For other types of x Shellsort is used, silently.) It is normally somewhat faster than Shellsort (perhaps 50% faster on vectors of length a million and twice as fast at a billion) but has poor performance in the rare worst case. (Peto's modification using a pseudo-random midpoint is used to make the worst case rarer.) This is not a stable sort, and ties may be reordered. |> > |> > Factors with less than 100,000 levels are sorted by radix sorting when method is not supplied: see sort.list. |> > |> > -pd |> > |> > |> > -- |> > Peter Dalgaard, Professor, |> > Center for Statistics, Copenhagen Business School |> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark |> > Phone: (+45)38153501 |> > Office: A 4.23 |> > Email: pd....@cbs.dk Priv: pda...@gmail.com |> |> -- |> Peter Dalgaard, Professor, |> Center for Statistics, Copenhagen Business School |> Solbjerg Plads 3, 2000 Frederiksberg, Denmark |> Phone: (+45)38153501 |> Office: A 4.23 |> Email: pd....@cbs.dk Priv: pda...@gmail.com |> -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.