In the case of 1:1 merging with distinct sets of non-ID variables in two or more datasets, would the following code, which doesn't need to form the larger merged data frame, be useful or faster? [A generalization of with() would make this even better. I've often wondered about the utility of a "merged environment".]

> set.seed(1)
> a <- data.frame(id=c(1:3, 5, 7), x1=runif(5))
> b <- data.frame(id=c(1:3, 4, 6), x2=runif(5))
> a
  id        x1
1  1 0.2655087
2  2 0.3721239
3  3 0.5728534
4  5 0.9082078
5  7 0.2016819
> b
  id         x2
1  1 0.89838968
2  2 0.94467527
3  3 0.66079779
4  4 0.62911404
5  6 0.06178627
>
> ida <- a$id;  idb <- b$id
> ids <- sort(unique(c(ida, idb)))
> i <- match(ids, ida)
> j <- match(ids, idb)
> a[i,]$x1
[1] 0.2655087 0.3721239 0.5728534        NA 0.9082078        NA 0.2016819
> b[j,]$x2
[1] 0.89838968 0.94467527 0.66079779 0.62911404 NA 0.06178627 NA
>
> with(a[i,],
+      with(b[j,],
+           cbind(x1,x2)))
            x1         x2
[1,] 0.2655087 0.89838968
[2,] 0.3721239 0.94467527
[3,] 0.5728534 0.66079779
[4,]        NA 0.62911404
[5,] 0.9082078         NA
[6,]        NA 0.06178627
[7,] 0.2016819         NA

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to