The word "merge" in the context of R suggests the use of the merge() function, but I don't think that's the right tool for what you want. The merge() function is for relational database type merges, which for your data would have a many to many merge. Not good.

In terms of the R language, you're looking for something using the cbind() function, not the merge() function (I think).

There are a couple of details that need to be clarified, and my solution below made some assumptions.

1) Could a value in the first column appear in only one of the two data frames?

2) Is it always x1 that has more values (in your example, x1 had the number 1 appear three times in the first column, and x2 had it appear only twice. Does x2 sometimes have more rows? (I think your description implies that, but it's good to be explicit)

I added extra rows to your example data frames to test my assumptions about the answers.


After trying to be clever, I decided the easiest way is brute force.

Hopefully, this is what you want:

x1 <- as.data.frame( matrix(
c(
1,      4,
1,      3,
1,      6,
2,      9,
2,      2,
2,      5,
3,      6,
3,      7,
3,      4,
  4,0,
  4,1) , byrow=TRUE,ncol=2))

x2 <- as.data.frame( matrix(
c(
1,      -3,
1,      -7,
2,      -3,
2,      -2,
2,      -8,
3,      -1,
3,      -2,
3,      -1,
  4,0,
  4,1,
  4,2,
  4,3) , byrow=TRUE,ncol=2))

###
ivals <- sort(unique(c(x1$V1,x2$V1)))

for (i in ivals) {
  tmpx1 <- x1[x1$V1 == i , ]
  tmpx2 <- x2[x2$V1 == i , ]
  n.to.use <- min( nrow(tmpx1), nrow(tmpx2))
  if (n.to.use >= 1 ) {
    rtmp <- seq(n.to.use)
    tmpnew <- cbind( tmpx1[rtmp, ], V3=tmpx2[rtmp,'V2'])
    if (i==min(ivals)) {
      newx <- tmpnew
    } else {
      newx <- rbind( newx, tmpnew)
    }
  } else next
}


The loop could be written with fewer lines of code, but I found it easier to read and understand this way. If x1 and x2 have a very large number of rows, the above should probably be revised for better memory usage.

-Don

At 2:33 AM +0200 6/18/09, Martin Batholdy wrote:
hi,


I have two data.frames each with two columns;

x1

1       4
1       3
1       6
2       9
2       2
2       5
3       6
3       7
3       4


x2

1       -3
1       -7
2       -3
2       -2
2       -8
3       -1
3       -2
3       -1

now I want to merge this data.frames to one data.frame.

The problem is, that sometimes there is a different number of elements per category. (like above x1 has 3 values for the value 1 in the first row, but x2 has only 2 values for the value 1 in the first row).

Is there an easy way to merge this two data.frames by deleting the rows that only one data.frame "has". In the example, that resulting data.frame would be the data.frame x1 and x2 except the row 3 of data.frame x1.

thanks for any suggestions!

______________________________________________
R-help@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to