yes thanks, that's correct!

here a slight variation inspired by your solution: a cartesian product restricted to non duplicated records to get the logical vector i to be used in the next natural join

i<-!duplicated(merge(df1$id,df1$item, by=NULL))
merge(df1[i,],df2)

thanks

Il 08/05/2014 18:43, arun ha scritto:
Hi,
May be:
indx <- !duplicated(as.character(interaction(df1[,-3])))
merge(df1[indx,],df2)
A.K.




On Thursday, May 8, 2014 12:34 PM, Massimo Bressan<mbres...@arpa.veneto.it>  
wrote:
yes, thank you for all your replies, they worked out correctly indeed...

...but because of my fault, by then working on my real data I fully
realised that I should have mentioned something that is changing (quite
a lot, in fact) the terms of the problem...

please would you consider the following (consistent) variation ?

df1 <- data.frame(id=rep(1:3,each=2), item=c(rep("A",2), rep("B",2),
rep("C",2)), v=rnorm(6))
df2 <- data.frame(id=c(1,2,3), who=c("tizio","caio","sempronio"))

and again I need to group the first dataframe "df1" both by "id" and by
the first record of "v", and then merge with the second dataframe "df2"
(again by "id")

now, how to do that?
(that's why probably I was pointing in my first post to the use of sqldf)

thanks

ps: I'm in doubt wheter I must open another thread or keep going with
this one (really sorry for the eventual violation of the R-help netiquette)


Il 08/05/2014 17:14, arun ha scritto:
Hi,
May be this helps:
    merge(unique(df1),df2)
A.K.





On Thursday, May 8, 2014 5:46 AM, Massimo Bressan<mbres...@arpa.veneto.it>  
wrote:
given this "bare bone" example:

df1 <- data.frame(id=rep(1:3,each=2), item=c(rep("A",2), rep("B",2),
rep("C",2)))
df2 <- data.frame(id=c(1,2,3), who=c("tizio","caio","sempronio"))

I need to group the first dataframe "df1" by "id" and then merge with
the second dataframe "df2" (again by "id")
so far I've manged to accomplish the task by something like the following...

# start

require(sqldf)
tmp<-sqldf("select * from df1 group by id")
merge(tmp, df2)

#end

now I'm wonderng if there is a more efficient and/or elegant way to
perform it (also because in fact I'm dealing with much more "heavy"
dataframes);

may be possible through a single sql statement?  or by using a different
package functions (e.g. dplyr)?
my attempts towards these alternative approaches miserably failed ...

thanks

______________________________________________
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to