Re: [R] merging data.frames of different length

Don MacQueen Thu, 18 Jun 2009 08:45:16 -0700

The word "merge" in the context of R suggests the use of the merge()function, but I don't think that's the right tool for what you want.The merge() function is for relational database type merges, whichfor your data would have a many to many merge. Not good.

In terms of the R language, you're looking for something using thecbind() function, not the merge() function (I think).

There are a couple of details that need to be clarified, and mysolution below made some assumptions.


1) Could a value in the first column appear in only one of the two data frames?

2) Is it always x1 that has more values (in your example, x1 had thenumber 1 appear three times in the first column, and x2 had it appearonly twice. Does x2 sometimes have more rows? (I think yourdescription implies that, but it's good to be explicit)

I added extra rows to your example data frames to test my assumptionsabout the answers.



After trying to be clever, I decided the easiest way is brute force.

Hopefully, this is what you want:

x1 <- as.data.frame( matrix(
c(
1,      4,
1,      3,
1,      6,
2,      9,
2,      2,
2,      5,
3,      6,
3,      7,
3,      4,
  4,0,
  4,1) , byrow=TRUE,ncol=2))

x2 <- as.data.frame( matrix(
c(
1,      -3,
1,      -7,
2,      -3,
2,      -2,
2,      -8,
3,      -1,
3,      -2,
3,      -1,
  4,0,
  4,1,
  4,2,
  4,3) , byrow=TRUE,ncol=2))

###
ivals <- sort(unique(c(x1$V1,x2$V1)))

for (i in ivals) {
  tmpx1 <- x1[x1$V1 == i , ]
  tmpx2 <- x2[x2$V1 == i , ]
  n.to.use <- min( nrow(tmpx1), nrow(tmpx2))
  if (n.to.use >= 1 ) {
    rtmp <- seq(n.to.use)
    tmpnew <- cbind( tmpx1[rtmp, ], V3=tmpx2[rtmp,'V2'])
    if (i==min(ivals)) {
      newx <- tmpnew
    } else {
      newx <- rbind( newx, tmpnew)
    }
  } else next
}

The loop could be written with fewer lines of code, but I found iteasier to read and understand this way.If x1 and x2 have a very large number of rows, the above shouldprobably be revised for better memory usage.


-Don

At 2:33 AM +0200 6/18/09, Martin Batholdy wrote:

hi,


I have two data.frames each with two columns;

x1

1       4
1       3
1       6
2       9
2       2
2       5
3       6
3       7
3       4


x2

1       -3
1       -7
2       -3
2       -2
2       -8
3       -1
3       -2
3       -1

now I want to merge this data.frames to one data.frame.
The problem is, that sometimes there is a different number ofelements per category.(like above x1 has 3 values for the value 1 in the first row, but x2has only 2 values for the value 1 in the first row).
Is there an easy way to merge this two data.frames by deleting therows that only one data.frame "has".In the example, that resulting data.frame would be the data.frame x1and x2 except the row 3 of data.frame x1.
thanks for any suggestions!

______________________________________________
R-help@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging data.frames of different length

Reply via email to