Re: [R] Ranking within factor subgroups
Thank you! I did not know about the split and unsplit functions. It looks like a very powerful and useful combination to master. Regards, Adai On Thu, 2006-02-23 at 07:28 +0100, Peter Dalgaard wrote: maneesh deshpande [EMAIL PROTECTED] writes: Hi Adai, I think your solution only works if the rows of the data frame are ordered by date and the ordering function is the same used to order the levels of factor(df$date) ? It turns out (as I implied in my question) my data is indeed organized in this manner, so my current problem is solved. In the general case, I suppose, one could always order the data frame by date before proceeding ? Thanks, Maneesh You might prefer to look at split/unsplit/split-, i.e. the z-scores by group line: z - unsplit(lapply(split(x, g), scale), g) with scale suitably replaced. Presumably (meaning: I didn't quite read your code closely enough) z - unsplit(lapply(split(x, g), bucket, 10), g) could do it. From: Adaikalavan Ramasamy [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: Wed, 22 Feb 2006 03:44:45 + It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
Hi Peter, That did the trick. Thank you very much. Regards, Maneesh From: Peter Dalgaard [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: [EMAIL PROTECTED], r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: 23 Feb 2006 07:28:13 +0100 maneesh deshpande [EMAIL PROTECTED] writes: Hi Adai, I think your solution only works if the rows of the data frame are ordered by date and the ordering function is the same used to order the levels of factor(df$date) ? It turns out (as I implied in my question) my data is indeed organized in this manner, so my current problem is solved. In the general case, I suppose, one could always order the data frame by date before proceeding ? Thanks, Maneesh You might prefer to look at split/unsplit/split-, i.e. the z-scores by group line: z - unsplit(lapply(split(x, g), scale), g) with scale suitably replaced. Presumably (meaning: I didn't quite read your code closely enough) z - unsplit(lapply(split(x, g), bucket, 10), g) could do it. From: Adaikalavan Ramasamy [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: Wed, 22 Feb 2006 03:44:45 + It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
Hi Adai, I think your solution only works if the rows of the data frame are ordered by date and the ordering function is the same used to order the levels of factor(df$date) ? It turns out (as I implied in my question) my data is indeed organized in this manner, so my current problem is solved. In the general case, I suppose, one could always order the data frame by date before proceeding ? Thanks, Maneesh From: Adaikalavan Ramasamy [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: Wed, 22 Feb 2006 03:44:45 + It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
maneesh deshpande [EMAIL PROTECTED] writes: Hi Adai, I think your solution only works if the rows of the data frame are ordered by date and the ordering function is the same used to order the levels of factor(df$date) ? It turns out (as I implied in my question) my data is indeed organized in this manner, so my current problem is solved. In the general case, I suppose, one could always order the data frame by date before proceeding ? Thanks, Maneesh You might prefer to look at split/unsplit/split-, i.e. the z-scores by group line: z - unsplit(lapply(split(x, g), scale), g) with scale suitably replaced. Presumably (meaning: I didn't quite read your code closely enough) z - unsplit(lapply(split(x, g), bucket, 10), g) could do it. From: Adaikalavan Ramasamy [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: maneesh deshpande [EMAIL PROTECTED] CC: r-help@stat.math.ethz.ch Subject: Re: [R] Ranking within factor subgroups Date: Wed, 22 Feb 2006 03:44:45 + It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ranking within factor subgroups
Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Ranking within factor subgroups
It might help to give a simple reproducible example in the future. For example df - cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100), B=rpois(500, 50), C=rpois(500, 30) ) might generate something like date A B C 11 93 51 32 21 95 51 30 31 102 59 28 41 105 52 32 51 105 53 26 61 99 59 37 .... ... .. .. 4955 100 57 19 4965 96 47 44 4975 111 56 35 4985 105 49 23 4995 105 61 30 5005 92 53 32 Here is my proposed solution. Can you double check with your existing functions to see if they are correct. decile.fn - function(x, nbreaks=10){ br - quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T ) br[1] - -Inf return( cut(x, br, labels=F) ) } out - apply( df[ ,c(A, B, C)], 2, function(v) unlist( tapply( v, df$date, decile.fn ) ) ) rownames(out) - rownames(df) out - cbind(df$date, out) Regards, Adai On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote: Hi, I have a dataframe, x of the following form: DateSymbol AB C 20041201 ABC 10 12 15 20041201 DEF 95 4 ... 20050101 ABC 5 3 1 20050101 GHM 12 42 here A, B,C are properties of a set symbols recorded for a given date. I wante to decile the symbols For each date and property and create another set of columns bucketA,bucketB, bucketC containing the decile rank for each symbol. The following non-vectorized code does what I want, bucket - function(data,nBuckets) { q - quantile(data,seq(0,1,len=nBuckets+1),na.rm=T) q[1] - q[1] - 0.1 # need to do this to ensure there are no extra NAs cut(data,q,include.lowest=T,labels=F) } calcDeciles - function(x,colNames) { nBuckets - 10 dates - unique(x$Date) for ( date in dates) { iVec - x$Date == date xx - x[iVec,] for (colName in colNames) { data - xx[,colName] bColName - paste(bucket,colName,sep=) x[iVec,bColName] - bucket(data,nBuckets) } } x } x - calcDeciles(x,c(A,B,C)) I was wondering if it is possible to vectorize the above function to make it more efficient. I tried, rlist - tapply(x$A,x$Date,bucket) but I am not sure how to assign the contents of rlist to their appropriate slots in the original dataframe. Thanks, Maneesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html