[R] merge dataframes with conditions formulated as logical expressions

2006-06-14 Thread Wolfram Fischer
I have a data.frame df containing two variables:
GRP: Factor
VAL: num

I have a data.frame dp containing:
GRP: Factor
MIN.VAL: num
MAX.VAL: num
VAL2: num
with several rows per GRP
where dp[i-1, MAX.VAL]  dp[i, MIN.VAL]
within the same GRP.

I want to create df[i, VAL2] - dpp[z, VAL2] 
withi along df 
and dpp - subset( dp, GRP = df[i, GRP] )
so that it is true for each i:
df[i, VAL]  dpp[z, MIN.VAL]
   and  df[i, VAL] = dpp[z, MAX.VAL]

Is there an easy/efficient way to do that?

Example:
df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
dp - data.frame( GRP=c( A, A, B, B ),
MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )

The result should be:
df$VAL2 - c( 1.1, 2.2, 4.4 )

Thanks - Wolfram

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] merge dataframes with conditions formulated as logical expressions

2006-06-14 Thread Adaikalavan Ramasamy
You have discontinuity between your MIN.VAL and MAX.VAL for a given
group. If this is true in practise, then you may want to check and
report when VAL is in the discontinuous region.

Here is my solution that ignores that (and only uses MIN.VAL and
completely disrespecting MAX.VAL). Not very elegant but should do the
trick.


 df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
 dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1,
70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )

 x - split(df, df$GRP)
 y - split(dp, dp$GRP)

 out - NULL
 for(g in names(x)){

   xx - x[[g]]
   yy - y[[g]]

   w   - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
   tmp - cbind(xx, yy[w, VAL2])
   colnames(tmp) - c(GRP, VAL, VAL2)
   out - rbind(out, tmp)
 } 
 out

Regards, Adai



On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
 I have a data.frame df containing two variables:
 GRP: Factor
 VAL: num
 
 I have a data.frame dp containing:
 GRP: Factor
 MIN.VAL: num
 MAX.VAL: num
 VAL2: num
 with several rows per GRP
 where dp[i-1, MAX.VAL]  dp[i, MIN.VAL]
 within the same GRP.
 
 I want to create df[i, VAL2] - dpp[z, VAL2] 
 withi along df 
 and dpp - subset( dp, GRP = df[i, GRP] )
 so that it is true for each i:
 df[i, VAL]  dpp[z, MIN.VAL]
and  df[i, VAL] = dpp[z, MAX.VAL]
 
 Is there an easy/efficient way to do that?
 
 Example:
 df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
 dp - data.frame( GRP=c( A, A, B, B ),
 MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
 VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
 
 The result should be:
 df$VAL2 - c( 1.1, 2.2, 4.4 )
 
 Thanks - Wolfram
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] merge dataframes with conditions formulated as logical expressions

2006-06-14 Thread Wolfram Fischer
--- Reply to: ---
Date:14.06.06 16:17 (+)
From:Adaikalavan Ramasamy [EMAIL PROTECTED]
Subject: Re: [R] merge dataframes with conditions formulated as logical 
expressions

 You have discontinuity between your MIN.VAL and MAX.VAL for a given
 group. If this is true in practise, then you may want to check and
 report when VAL is in the discontinuous region.

Your solution without concerning discontinuity is better
because it is more general.

 Here is my solution that ignores that (and only uses MIN.VAL and
 completely disrespecting MAX.VAL). Not very elegant but should do
 the trick.
 
 
  df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
  dp - data.frame( GRP=c( A, A, B, B ), MIN.VAL=c( 1, 50, 1,
 70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
 
  x - split(df, df$GRP)
  y - split(dp, dp$GRP)
 
  out - NULL
  for(g in names(x)){
 
xx - x[[g]]
yy - y[[g]]
 
w   - cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
tmp - cbind(xx, yy[w, VAL2])
colnames(tmp) - c(GRP, VAL, VAL2)
out - rbind(out, tmp)
  } 
  out
 
 Regards, Adai

Thanks for this solution.

I did not yet try to program a conventional solution
because I thought there would be a nice shortcut in R
to solve the problem comparably elegantly as in SQL:
select df.*, dp.VAL2
from df, dp
where df.GRP = dp.GRP
  and df.VAL  dp.MIN_VAL
  and df.VAL = dp.MAX_VAL

Wolfram

 On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
  I have a data.frame df containing two variables:
  GRP: Factor
  VAL: num
  
  I have a data.frame dp containing:
  GRP: Factor
  MIN.VAL: num
  MAX.VAL: num
  VAL2: num
  with several rows per GRP
  where dp[i-1, MAX.VAL]  dp[i, MIN.VAL]
  within the same GRP.
  
  I want to create df[i, VAL2] - dpp[z, VAL2] 
  withi along df 
  and dpp - subset( dp, GRP = df[i, GRP] )
  so that it is true for each i:
  df[i, VAL]  dpp[z, MIN.VAL]
 and  df[i, VAL] = dpp[z, MAX.VAL]
  
  Is there an easy/efficient way to do that?
  
  Example:
  df - data.frame( GRP=c( A, A, B ), VAL=c( 10, 100, 200 ) )
  dp - data.frame( GRP=c( A, A, B, B ),
  MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
  
  The result should be:
  df$VAL2 - c( 1.1, 2.2, 4.4 )
  
  Thanks - Wolfram
  
  __

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html