carslaw wrote:
Dear R users,

I'm a bit perplexed with the effect sort has here, as it is different on
Windows vs. linux. It makes my factor levels and subsequent plots different on the two systems.

You are using different collation orders.  On Linux, your sessionInfo shows

en_GB.utf8
while Windows shows

English_United Kingdom.1252


so you should be prepared for differences. That said, it certainly looks as though the string comparison is wrong on Linux. Using Ted Harding's examples, I get these results:

> "AB CD" > "ABCD"
[1] FALSE
> "AB CD" > "ABCD "
[1] FALSE

on Windows in the English_Canada.1252 locale and on Linux in the C locale. However, when I use the locale that's default on our system, en_US.UTF-8, I get

> "AB CD" > "ABCD"
[1] TRUE
> "AB CD" > "ABCD "
[1] FALSE

as Ted did, and that certainly looks wrong.

Duncan Murdoch
Given:

types <- c("PC-D-Euro-0", "PC-D-Euro-1", "PC-D-Euro-2", "PC-D-Euro-3", "PC-D-Euro-4", "PC-D-Euro-5", "PC-D-Euro-6", "LCV-D-Euro-0", "LCV-D-Euro-1", "LCV-D-Euro-2", "LCV-D-Euro-3", "LCV-D-Euro-4", "LCV-D-Euro-5", "LCV-D-Euro-6", "HGV-D-Euro-0", "HGV-D-Euro-I", "HGV-D-Euro-II", "HGV-D-Euro-III", "HGV-D-Euro-IV EGR", "HGV-D-Euro-IV SCR", "HGV-D-Euro-IV SCRb", "HGV-D-Euro-V EGR", "HGV-D-Euro-V SCR", "HGV-D-Euro-V SCRb", "HGV-D-Euro-VI", "HGV-D-Euro-VIb")

On linux, sort does:

sort(types)
[1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II" [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR" [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-VI" [10] "HGV-D-Euro-VIb" "HGV-D-Euro-V SCR" "HGV-D-Euro-V SCRb" [13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2" [16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5" [19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1" [22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4" [25] "PC-D-Euro-5" "PC-D-Euro-6"


And on Windows:

sort(types)

[1] "HGV-D-Euro-0" "HGV-D-Euro-I" "HGV-D-Euro-II" [4] "HGV-D-Euro-III" "HGV-D-Euro-IV EGR" "HGV-D-Euro-IV SCR" [7] "HGV-D-Euro-IV SCRb" "HGV-D-Euro-V EGR" "HGV-D-Euro-V SCR" [10] "HGV-D-Euro-V SCRb" "HGV-D-Euro-VI" "HGV-D-Euro-VIb" [13] "LCV-D-Euro-0" "LCV-D-Euro-1" "LCV-D-Euro-2" [16] "LCV-D-Euro-3" "LCV-D-Euro-4" "LCV-D-Euro-5" [19] "LCV-D-Euro-6" "PC-D-Euro-0" "PC-D-Euro-1" [22] "PC-D-Euro-2" "PC-D-Euro-3" "PC-D-Euro-4" [25] "PC-D-Euro-5" "PC-D-Euro-6"
Session info for both systems is below.  The order I actually want is the
Windows one, but looking at it,
 the linux order is perhaps more intuitive.  However, the problem is the
order is inconsistent between
 the two systems.  Any suggestions?

sessionInfo()
R version 2.11.0 (2010-04-22) x86_64-pc-linux-gnu
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=en_GB.utf8 [9] LC_ADDRESS=en_GB.utf8 LC_TELEPHONE=en_GB.utf8 [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=en_GB.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rkward_0.5.3

loaded via a namespace (and not attached):
[1] tools_2.11.0

sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-pc-mingw32

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base

Dr David Carslaw
King's College London
Environmental Research Group
Franklin Wilkins Building
150 Stamford Street
London
SE1 9NH

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to