Re: [R] Ranking within factor subgroups

2006-02-24 Thread Adaikalavan Ramasamy
Thank you! I did not know about the split and unsplit functions. It
looks like a very powerful and useful combination to master.

Regards, Adai



On Thu, 2006-02-23 at 07:28 +0100, Peter Dalgaard wrote:
 maneesh deshpande [EMAIL PROTECTED] writes:
 
  Hi Adai,
  
  I think your solution only works if the rows of the data frame are ordered 
  by date and
  the ordering function is the same used to order the levels of 
  factor(df$date) ?
  It turns out (as I implied in my question) my data is indeed organized in 
  this manner, so my
  current problem is solved.
  In the general case, I suppose, one could always order the data frame by 
  date before proceeding ?
  
  Thanks,
  
  Maneesh
 
 You might prefer to look at split/unsplit/split-, i.e. the z-scores
 by group line:
 
  z - unsplit(lapply(split(x, g), scale), g)
 
 with scale suitably replaced. Presumably (meaning: I didn't quite
 read your code closely enough)
 
 z - unsplit(lapply(split(x, g), bucket, 10), g)
 
 could do it.
  
  
  From: Adaikalavan Ramasamy [EMAIL PROTECTED]
  Reply-To: [EMAIL PROTECTED]
  To: maneesh deshpande [EMAIL PROTECTED]
  CC: r-help@stat.math.ethz.ch
  Subject: Re: [R]  Ranking within factor subgroups
  Date: Wed, 22 Feb 2006 03:44:45 +
  
  It might help to give a simple reproducible example in the future. For
  example
  
df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
B=rpois(500, 50), C=rpois(500, 30) )
  
  might generate something like
  
 date   A  B  C
   11  93 51 32
   21  95 51 30
   31 102 59 28
   41 105 52 32
   51 105 53 26
   61  99 59 37
 .... ... .. ..
 4955 100 57 19
 4965  96 47 44
 4975 111 56 35
 4985 105 49 23
 4995 105 61 30
 5005  92 53 32
  
  Here is my proposed solution. Can you double check with your existing
  functions to see if they are correct.
  
  decile.fn - function(x, nbreaks=10){
br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
br[1]  - -Inf
return( cut(x, br, labels=F) )
  }
  
  out - apply( df[ ,c(A, B, C)], 2,
function(v) unlist( tapply( v, df$date, decile.fn ) ) )
  
  rownames(out) - rownames(df)
  out - cbind(df$date, out)
  
  Regards, Adai
  
  
  
  On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
Hi,
   
I have a dataframe, x of the following form:
   
DateSymbol   AB  C
20041201 ABC  10  12 15
20041201 DEF   95   4
...
20050101 ABC 5  3   1
20050101 GHM   12 42

   
here A, B,C are properties of a set symbols recorded for a given date.
I wante to decile the symbols For each date and property and
create another set of columns bucketA,bucketB, bucketC containing 
  the
decile rank
for each symbol. The following non-vectorized code does what I want,
   
bucket - function(data,nBuckets) {
 q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
 q[1] - q[1] - 0.1 # need to do this to ensure there are no extra 
  NAs
 cut(data,q,include.lowest=T,labels=F)
}
   
calcDeciles - function(x,colNames) {
nBuckets - 10
dates - unique(x$Date)
for ( date in dates) {
  iVec - x$Date == date
  xx - x[iVec,]
  for (colName in colNames) {
 data - xx[,colName]
 bColName - paste(bucket,colName,sep=)
 x[iVec,bColName] - bucket(data,nBuckets)
  }
}
x
}
   
x - calcDeciles(x,c(A,B,C))
   
   
I was wondering if it is possible to vectorize the above function to 
  make it
more efficient.
I tried,
rlist - tapply(x$A,x$Date,bucket)
but I am not sure how to assign the contents of rlist to their 
  appropriate
slots in the original
dataframe.
   
Thanks,
   
Maneesh
   
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
   
  
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Ranking within factor subgroups

2006-02-23 Thread maneesh deshpande

Hi Peter,

That did the trick. Thank you very much.

Regards,

Maneesh



From: Peter Dalgaard [EMAIL PROTECTED]
To: maneesh deshpande [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], r-help@stat.math.ethz.ch
Subject: Re: [R] Ranking within factor subgroups
Date: 23 Feb 2006 07:28:13 +0100

maneesh deshpande [EMAIL PROTECTED] writes:

 Hi Adai,

 I think your solution only works if the rows of the data frame are 
ordered

 by date and
 the ordering function is the same used to order the levels of
 factor(df$date) ?
 It turns out (as I implied in my question) my data is indeed organized 
in

 this manner, so my
 current problem is solved.
 In the general case, I suppose, one could always order the data frame by
 date before proceeding ?

 Thanks,

 Maneesh

You might prefer to look at split/unsplit/split-, i.e. the z-scores
by group line:

 z - unsplit(lapply(split(x, g), scale), g)

with scale suitably replaced. Presumably (meaning: I didn't quite
read your code closely enough)

z - unsplit(lapply(split(x, g), bucket, 10), g)

could do it.


 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 To: maneesh deshpande [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Subject: Re: [R]  Ranking within factor subgroups
 Date: Wed, 22 Feb 2006 03:44:45 +
 
 It might help to give a simple reproducible example in the future. For
 example
 
   df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
   B=rpois(500, 50), C=rpois(500, 30) )
 
 might generate something like
 
date   A  B  C
  11  93 51 32
  21  95 51 30
  31 102 59 28
  41 105 52 32
  51 105 53 26
  61  99 59 37
.... ... .. ..
4955 100 57 19
4965  96 47 44
4975 111 56 35
4985 105 49 23
4995 105 61 30
5005  92 53 32
 
 Here is my proposed solution. Can you double check with your existing
 functions to see if they are correct.
 
 decile.fn - function(x, nbreaks=10){
   br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
   br[1]  - -Inf
   return( cut(x, br, labels=F) )
 }
 
 out - apply( df[ ,c(A, B, C)], 2,
   function(v) unlist( tapply( v, df$date, decile.fn ) ) 
)

 
 rownames(out) - rownames(df)
 out - cbind(df$date, out)
 
 Regards, Adai
 
 
 
 On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
   Hi,
  
   I have a dataframe, x of the following form:
  
   DateSymbol   AB  C
   20041201 ABC  10  12 15
   20041201 DEF   95   4
   ...
   20050101 ABC 5  3   1
   20050101 GHM   12 42
   
  
   here A, B,C are properties of a set symbols recorded for a given 
date.

   I wante to decile the symbols For each date and property and
   create another set of columns bucketA,bucketB, bucketC 
containing

 the
   decile rank
   for each symbol. The following non-vectorized code does what I want,
  
   bucket - function(data,nBuckets) {
q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
q[1] - q[1] - 0.1 # need to do this to ensure there are no 
extra

 NAs
cut(data,q,include.lowest=T,labels=F)
   }
  
   calcDeciles - function(x,colNames) {
   nBuckets - 10
   dates - unique(x$Date)
   for ( date in dates) {
 iVec - x$Date == date
 xx - x[iVec,]
 for (colName in colNames) {
data - xx[,colName]
bColName - paste(bucket,colName,sep=)
x[iVec,bColName] - bucket(data,nBuckets)
 }
   }
   x
   }
  
   x - calcDeciles(x,c(A,B,C))
  
  
   I was wondering if it is possible to vectorize the above function to
 make it
   more efficient.
   I tried,
   rlist - tapply(x$A,x$Date,bucket)
   but I am not sure how to assign the contents of rlist to their
 appropriate
   slots in the original
   dataframe.
  
   Thanks,
  
   Maneesh
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
  
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 
35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 
35327907


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Ranking within factor subgroups

2006-02-22 Thread maneesh deshpande
Hi Adai,

I think your solution only works if the rows of the data frame are ordered 
by date and
the ordering function is the same used to order the levels of 
factor(df$date) ?
It turns out (as I implied in my question) my data is indeed organized in 
this manner, so my
current problem is solved.
In the general case, I suppose, one could always order the data frame by 
date before proceeding ?

Thanks,

Maneesh


From: Adaikalavan Ramasamy [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
To: maneesh deshpande [EMAIL PROTECTED]
CC: r-help@stat.math.ethz.ch
Subject: Re: [R]  Ranking within factor subgroups
Date: Wed, 22 Feb 2006 03:44:45 +

It might help to give a simple reproducible example in the future. For
example

  df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
  B=rpois(500, 50), C=rpois(500, 30) )

might generate something like

   date   A  B  C
 11  93 51 32
 21  95 51 30
 31 102 59 28
 41 105 52 32
 51 105 53 26
 61  99 59 37
   .... ... .. ..
   4955 100 57 19
   4965  96 47 44
   4975 111 56 35
   4985 105 49 23
   4995 105 61 30
   5005  92 53 32

Here is my proposed solution. Can you double check with your existing
functions to see if they are correct.

decile.fn - function(x, nbreaks=10){
  br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
  br[1]  - -Inf
  return( cut(x, br, labels=F) )
}

out - apply( df[ ,c(A, B, C)], 2,
  function(v) unlist( tapply( v, df$date, decile.fn ) ) )

rownames(out) - rownames(df)
out - cbind(df$date, out)

Regards, Adai



On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
  Hi,
 
  I have a dataframe, x of the following form:
 
  DateSymbol   AB  C
  20041201 ABC  10  12 15
  20041201 DEF   95   4
  ...
  20050101 ABC 5  3   1
  20050101 GHM   12 42
  
 
  here A, B,C are properties of a set symbols recorded for a given date.
  I wante to decile the symbols For each date and property and
  create another set of columns bucketA,bucketB, bucketC containing 
the
  decile rank
  for each symbol. The following non-vectorized code does what I want,
 
  bucket - function(data,nBuckets) {
   q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
   q[1] - q[1] - 0.1 # need to do this to ensure there are no extra 
NAs
   cut(data,q,include.lowest=T,labels=F)
  }
 
  calcDeciles - function(x,colNames) {
  nBuckets - 10
  dates - unique(x$Date)
  for ( date in dates) {
iVec - x$Date == date
xx - x[iVec,]
for (colName in colNames) {
   data - xx[,colName]
   bColName - paste(bucket,colName,sep=)
   x[iVec,bColName] - bucket(data,nBuckets)
}
  }
  x
  }
 
  x - calcDeciles(x,c(A,B,C))
 
 
  I was wondering if it is possible to vectorize the above function to 
make it
  more efficient.
  I tried,
  rlist - tapply(x$A,x$Date,bucket)
  but I am not sure how to assign the contents of rlist to their 
appropriate
  slots in the original
  dataframe.
 
  Thanks,
 
  Maneesh
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Ranking within factor subgroups

2006-02-22 Thread Peter Dalgaard
maneesh deshpande [EMAIL PROTECTED] writes:

 Hi Adai,
 
 I think your solution only works if the rows of the data frame are ordered 
 by date and
 the ordering function is the same used to order the levels of 
 factor(df$date) ?
 It turns out (as I implied in my question) my data is indeed organized in 
 this manner, so my
 current problem is solved.
 In the general case, I suppose, one could always order the data frame by 
 date before proceeding ?
 
 Thanks,
 
 Maneesh

You might prefer to look at split/unsplit/split-, i.e. the z-scores
by group line:

 z - unsplit(lapply(split(x, g), scale), g)

with scale suitably replaced. Presumably (meaning: I didn't quite
read your code closely enough)

z - unsplit(lapply(split(x, g), bucket, 10), g)

could do it.
 
 
 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 Reply-To: [EMAIL PROTECTED]
 To: maneesh deshpande [EMAIL PROTECTED]
 CC: r-help@stat.math.ethz.ch
 Subject: Re: [R]  Ranking within factor subgroups
 Date: Wed, 22 Feb 2006 03:44:45 +
 
 It might help to give a simple reproducible example in the future. For
 example
 
   df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
   B=rpois(500, 50), C=rpois(500, 30) )
 
 might generate something like
 
  date   A  B  C
11  93 51 32
21  95 51 30
31 102 59 28
41 105 52 32
51 105 53 26
61  99 59 37
  .... ... .. ..
  4955 100 57 19
  4965  96 47 44
  4975 111 56 35
  4985 105 49 23
  4995 105 61 30
  5005  92 53 32
 
 Here is my proposed solution. Can you double check with your existing
 functions to see if they are correct.
 
 decile.fn - function(x, nbreaks=10){
   br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
   br[1]  - -Inf
   return( cut(x, br, labels=F) )
 }
 
 out - apply( df[ ,c(A, B, C)], 2,
   function(v) unlist( tapply( v, df$date, decile.fn ) ) )
 
 rownames(out) - rownames(df)
 out - cbind(df$date, out)
 
 Regards, Adai
 
 
 
 On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
   Hi,
  
   I have a dataframe, x of the following form:
  
   DateSymbol   AB  C
   20041201 ABC  10  12 15
   20041201 DEF   95   4
   ...
   20050101 ABC 5  3   1
   20050101 GHM   12 42
   
  
   here A, B,C are properties of a set symbols recorded for a given date.
   I wante to decile the symbols For each date and property and
   create another set of columns bucketA,bucketB, bucketC containing 
 the
   decile rank
   for each symbol. The following non-vectorized code does what I want,
  
   bucket - function(data,nBuckets) {
q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
q[1] - q[1] - 0.1 # need to do this to ensure there are no extra 
 NAs
cut(data,q,include.lowest=T,labels=F)
   }
  
   calcDeciles - function(x,colNames) {
   nBuckets - 10
   dates - unique(x$Date)
   for ( date in dates) {
 iVec - x$Date == date
 xx - x[iVec,]
 for (colName in colNames) {
data - xx[,colName]
bColName - paste(bucket,colName,sep=)
x[iVec,bColName] - bucket(data,nBuckets)
 }
   }
   x
   }
  
   x - calcDeciles(x,c(A,B,C))
  
  
   I was wondering if it is possible to vectorize the above function to 
 make it
   more efficient.
   I tried,
   rlist - tapply(x$A,x$Date,bucket)
   but I am not sure how to assign the contents of rlist to their 
 appropriate
   slots in the original
   dataframe.
  
   Thanks,
  
   Maneesh
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
  
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Ranking within factor subgroups

2006-02-21 Thread maneesh deshpande

Hi,

I have a dataframe, x of the following form:

DateSymbol   AB  C
20041201 ABC  10  12 15
20041201 DEF   95   4
...
20050101 ABC 5  3   1
20050101 GHM   12 42


here A, B,C are properties of a set symbols recorded for a given date.
I wante to decile the symbols For each date and property and
create another set of columns bucketA,bucketB, bucketC containing the 
decile rank
for each symbol. The following non-vectorized code does what I want,

bucket - function(data,nBuckets) {
 q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
 q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs
 cut(data,q,include.lowest=T,labels=F)
}

calcDeciles - function(x,colNames) {
nBuckets - 10
dates - unique(x$Date)
for ( date in dates) {
  iVec - x$Date == date
  xx - x[iVec,]
  for (colName in colNames) {
 data - xx[,colName]
 bColName - paste(bucket,colName,sep=)
 x[iVec,bColName] - bucket(data,nBuckets)
  }
}
x
}

x - calcDeciles(x,c(A,B,C))


I was wondering if it is possible to vectorize the above function to make it 
more efficient.
I tried,
rlist - tapply(x$A,x$Date,bucket)
but I am not sure how to assign the contents of rlist to their appropriate 
slots in the original
dataframe.

Thanks,

Maneesh

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Ranking within factor subgroups

2006-02-21 Thread Adaikalavan Ramasamy
It might help to give a simple reproducible example in the future. For
example

 df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
 B=rpois(500, 50), C=rpois(500, 30) )

might generate something like

date   A  B  C
  11  93 51 32
  21  95 51 30
  31 102 59 28
  41 105 52 32
  51 105 53 26
  61  99 59 37
.... ... .. ..
4955 100 57 19
4965  96 47 44
4975 111 56 35
4985 105 49 23
4995 105 61 30
5005  92 53 32

Here is my proposed solution. Can you double check with your existing
functions to see if they are correct.

   decile.fn - function(x, nbreaks=10){
 br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
 br[1]  - -Inf
 return( cut(x, br, labels=F) )
   }

   out - apply( df[ ,c(A, B, C)], 2,
 function(v) unlist( tapply( v, df$date, decile.fn ) ) )

   rownames(out) - rownames(df)
   out - cbind(df$date, out)

Regards, Adai



On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
 Hi,
 
 I have a dataframe, x of the following form:
 
 DateSymbol   AB  C
 20041201 ABC  10  12 15
 20041201 DEF   95   4
 ...
 20050101 ABC 5  3   1
 20050101 GHM   12 42
 
 
 here A, B,C are properties of a set symbols recorded for a given date.
 I wante to decile the symbols For each date and property and
 create another set of columns bucketA,bucketB, bucketC containing the 
 decile rank
 for each symbol. The following non-vectorized code does what I want,
 
 bucket - function(data,nBuckets) {
  q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
  q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs
  cut(data,q,include.lowest=T,labels=F)
 }
 
 calcDeciles - function(x,colNames) {
 nBuckets - 10
 dates - unique(x$Date)
 for ( date in dates) {
   iVec - x$Date == date
   xx - x[iVec,]
   for (colName in colNames) {
  data - xx[,colName]
  bColName - paste(bucket,colName,sep=)
  x[iVec,bColName] - bucket(data,nBuckets)
   }
 }
 x
 }
 
 x - calcDeciles(x,c(A,B,C))
 
 
 I was wondering if it is possible to vectorize the above function to make it 
 more efficient.
 I tried,
 rlist - tapply(x$A,x$Date,bucket)
 but I am not sure how to assign the contents of rlist to their appropriate 
 slots in the original
 dataframe.
 
 Thanks,
 
 Maneesh
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html