[R] merge dataframes with conditions formulated as logical expressions
I have a data.frame df containing two variables: GRP: Factor VAL: num I have a data.frame dp containing: GRP: Factor MIN.VAL: num MAX.VAL: num VAL2: num with several rows per GRP where dp[i-1, MAX.VAL] dp[i, MIN.VAL] within the same GRP. I want to create df[i, VAL2] - dpp[z, VAL2] withi along df and dpp - subset( dp, GRP = df[i, GRP] ) so that it is true for each i: df[i, VAL] dpp[z, MIN.VAL] and df[i, VAL] = dpp[z, MAX.VAL] Is there an easy/efficient way to do that? Example: df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) The result should be: df$VAL2 - c( 1.1, 2.2, 4.4 ) Thanks - Wolfram __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merge dataframes with conditions formulated as logical expressions
You have discontinuity between your MIN.VAL and MAX.VAL for a given group. If this is true in practise, then you may want to check and report when VAL is in the discontinuous region. Here is my solution that ignores that (and only uses MIN.VAL and completely disrespecting MAX.VAL). Not very elegant but should do the trick. df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) x - split(df, df$GRP) y - split(dp, dp$GRP) out - NULL for(g in names(x)){ xx - x[[g]] yy - y[[g]] w - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F) tmp - cbind(xx, yy[w, VAL2]) colnames(tmp) - c(GRP, VAL, VAL2) out - rbind(out, tmp) } out Regards, Adai On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote: I have a data.frame df containing two variables: GRP: Factor VAL: num I have a data.frame dp containing: GRP: Factor MIN.VAL: num MAX.VAL: num VAL2: num with several rows per GRP where dp[i-1, MAX.VAL] dp[i, MIN.VAL] within the same GRP. I want to create df[i, VAL2] - dpp[z, VAL2] withi along df and dpp - subset( dp, GRP = df[i, GRP] ) so that it is true for each i: df[i, VAL] dpp[z, MIN.VAL] and df[i, VAL] = dpp[z, MAX.VAL] Is there an easy/efficient way to do that? Example: df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) The result should be: df$VAL2 - c( 1.1, 2.2, 4.4 ) Thanks - Wolfram __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merge dataframes with conditions formulated as logical expressions
--- Reply to: --- Date:14.06.06 16:17 (+) From:Adaikalavan Ramasamy [EMAIL PROTECTED] Subject: Re: [R] merge dataframes with conditions formulated as logical expressions You have discontinuity between your MIN.VAL and MAX.VAL for a given group. If this is true in practise, then you may want to check and report when VAL is in the discontinuous region. Your solution without concerning discontinuity is better because it is more general. Here is my solution that ignores that (and only uses MIN.VAL and completely disrespecting MAX.VAL). Not very elegant but should do the trick. df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) x - split(df, df$GRP) y - split(dp, dp$GRP) out - NULL for(g in names(x)){ xx - x[[g]] yy - y[[g]] w - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F) tmp - cbind(xx, yy[w, VAL2]) colnames(tmp) - c(GRP, VAL, VAL2) out - rbind(out, tmp) } out Regards, Adai Thanks for this solution. I did not yet try to program a conventional solution because I thought there would be a nice shortcut in R to solve the problem comparably elegantly as in SQL: select df.*, dp.VAL2 from df, dp where df.GRP = dp.GRP and df.VAL dp.MIN_VAL and df.VAL = dp.MAX_VAL Wolfram On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote: I have a data.frame df containing two variables: GRP: Factor VAL: num I have a data.frame dp containing: GRP: Factor MIN.VAL: num MAX.VAL: num VAL2: num with several rows per GRP where dp[i-1, MAX.VAL] dp[i, MIN.VAL] within the same GRP. I want to create df[i, VAL2] - dpp[z, VAL2] withi along df and dpp - subset( dp, GRP = df[i, GRP] ) so that it is true for each i: df[i, VAL] dpp[z, MIN.VAL] and df[i, VAL] = dpp[z, MAX.VAL] Is there an easy/efficient way to do that? Example: df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) ) dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), VAL2=c( 1.1, 2.2, 3.3, 4.4 ) ) The result should be: df$VAL2 - c( 1.1, 2.2, 4.4 ) Thanks - Wolfram __ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html