Hello,

Inline.
Em 04-09-2012 12:24, Meyners, Michael escreveu:
> All,
>
> I realize from the archive that the sort argument in merge has been subject 
> to discussion before, though I couldn't find an explanation for this 
> behavior. I tried to simplify this to (kind of) minimal code from a real 
> example to the following (and I have no doubts that there are smart people 
> around achieving the same with smarter code :-)). I'm running R 2.15.1 64bit 
> under MS Windows 7, full session info below.
>       
> I do have a list with two dataframes:
>
> test <- list(structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
> 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
> 4L, 5L, 6L), .Label = c("Y1", "Y2", "G", "F", "L", "K"), class = "factor"),
>      cong = c(-1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 1, 1,
>      1, 1, 1, 1, 11, 11, 11, 11, 11, 11), x = c(5.85714285714286,
>      5.9, 7.3, 5.85714285714286, 7.27272727272727, 4.375, 3.875,
>      2.5, 4.8, 3.625, 6.25, 4.71428571428571, 3.53571428571429,
>      4.63888888888889, 4.42424242424242, 4.78260869565217, 4.875,
>      3.80434782608696, 5.73170731707317, 5.41935483870968, 5.78125,
>      6.30188679245283, 6.87755102040816, 5.56603773584906)), .Names = 
> c("product",
> "cong", "x"), row.names = c(NA, -24L), class = "data.frame"),
>      structure(list(product = structure(c(1L, 2L, 3L, 4L, 5L,
>      6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Y1", "Y2", "G",
>      "F", "L", "K"), class = "factor"), cong = c(-1, -1, -1, -1,
>      -1, -1, 0, 0, 0, 0, 0, 0), x = c(3.04347826086957, 4.18181818181818,
>      3.75, 4.31578947368421, 4.5, 3.73913043478261, 4.8876404494382,
>      5.20792079207921, 5.68, 5.70526315789474, 6.38636363636364,
>      4.96703296703297)), .Names = c("product", "cong", "x"), row.names = c(NA,
>      -12L), class = "data.frame"))
>
>
> The dataframes are pretty much the same but for the values in the x-column 
> and the fact that the second one has only half as many observations, missing 
> the second half of the expand.grid if you like. Now if I run
>
> lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2", "G", "F", 
> "L", "K"), cong=c(-1,0,1,11)), all=T, sort=TRUE))         # sort=TRUE is the 
> default, so could be omitted
>
> sorts the first dataframe according to the labels of factor "product"

No, it doesn't. It sorts according to the columns, i.e., the values, not 
according to the labels.
The help page clearly states that the argument 'sort' is "logical. 
Should the results be sorted on the |by| columns?"

And "Y1" is coded as 1, "Y2" as 2, etc. The output is right.

Try the following.

test2 <- test
test2[[1]]$product <- as.character(test[[1]]$product)
test2[[2]]$product <- as.character(test[[2]]$product)

# To make it more readable.
grd <- expand.grid(product=c("Y1", "Y2", "G", "F", "L", "K"), 
cong=c(-1,0,1,11))

lapply(test2, function(x) merge(x, grd, all=T, sort=TRUE))

And now 'product' sorts from "F" to "Y2", even if grd$product is still a 
factor with the same coding as in 'test'.

Hope this helps,

Rui Barradas
> , while for the second one the order is maintained from the first dataframes 
> (x) to merge (which is the difference that I could not find being 
> documented). Now I run the same code with sort=FALSE instead:
>
> lapply(test, function(x) merge(x, expand.grid(product=c("Y1", "Y2", "G", "F", 
> "L", "K"), cong=c(-1,0,1,11)), all=T, sort=FALSE))
>
> The results are at least consistent and fulfill my needs (this is, btw, not 
> unexpected from the documentation). Note that I get exactly the same behavior 
> if I apply merge subsequently to test[[1]] and test[[2]], so it is not an 
> issue from lapply. (I realize that my dataframes are ordered by levels of 
> product, but using test[[2]] <- test[[2]][sample(12),] and applying the same 
> code as above reveals that indeed no sorting is done but the order is 
> maintained from the first dataframe.)
>
> I have a working solution for myself, so I'm not after any advice on how to 
> achieve the sorting -- I'd just like to better understand what's going on 
> here and/or what I might have missed in the documentation or in the list 
> archives.
>
> Thanks in advance,
> Michael
>
>
>
> Session info:
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    
> LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    
> LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_2.15.1
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to