The word "merge" in the context of R suggests the use of the merge()
function, but I don't think that's the right tool for what you want.
The merge() function is for relational database type merges, which
for your data would have a many to many merge. Not good.
In terms of the R language, you're looking for something using the
cbind() function, not the merge() function (I think).
There are a couple of details that need to be clarified, and my
solution below made some assumptions.
1) Could a value in the first column appear in only one of the two data frames?
2) Is it always x1 that has more values (in your example, x1 had the
number 1 appear three times in the first column, and x2 had it appear
only twice. Does x2 sometimes have more rows? (I think your
description implies that, but it's good to be explicit)
I added extra rows to your example data frames to test my assumptions
about the answers.
After trying to be clever, I decided the easiest way is brute force.
Hopefully, this is what you want:
x1 <- as.data.frame( matrix(
c(
1, 4,
1, 3,
1, 6,
2, 9,
2, 2,
2, 5,
3, 6,
3, 7,
3, 4,
4,0,
4,1) , byrow=TRUE,ncol=2))
x2 <- as.data.frame( matrix(
c(
1, -3,
1, -7,
2, -3,
2, -2,
2, -8,
3, -1,
3, -2,
3, -1,
4,0,
4,1,
4,2,
4,3) , byrow=TRUE,ncol=2))
###
ivals <- sort(unique(c(x1$V1,x2$V1)))
for (i in ivals) {
tmpx1 <- x1[x1$V1 == i , ]
tmpx2 <- x2[x2$V1 == i , ]
n.to.use <- min( nrow(tmpx1), nrow(tmpx2))
if (n.to.use >= 1 ) {
rtmp <- seq(n.to.use)
tmpnew <- cbind( tmpx1[rtmp, ], V3=tmpx2[rtmp,'V2'])
if (i==min(ivals)) {
newx <- tmpnew
} else {
newx <- rbind( newx, tmpnew)
}
} else next
}
The loop could be written with fewer lines of code, but I found it
easier to read and understand this way.
If x1 and x2 have a very large number of rows, the above should
probably be revised for better memory usage.
-Don
At 2:33 AM +0200 6/18/09, Martin Batholdy wrote:
hi,
I have two data.frames each with two columns;
x1
1 4
1 3
1 6
2 9
2 2
2 5
3 6
3 7
3 4
x2
1 -3
1 -7
2 -3
2 -2
2 -8
3 -1
3 -2
3 -1
now I want to merge this data.frames to one data.frame.
The problem is, that sometimes there is a different number of
elements per category.
(like above x1 has 3 values for the value 1 in the first row, but x2
has only 2 values for the value 1 in the first row).
Is there an easy way to merge this two data.frames by deleting the
rows that only one data.frame "has".
In the example, that resulting data.frame would be the data.frame x1
and x2 except the row 3 of data.frame x1.
thanks for any suggestions!
______________________________________________
R-help@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:// www. R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.