[R] Simple programming question
Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
According to your post you are assuming that there are only 3 unique values for var3 within each category. But category C and D have 4 unique values for var3. split(dfr, dfr$categ) ... $C id categ var3 score 3 3 C6 high 7 7 C5 mid 11 11 C3 low 15 15 C1 low ... If you meant something different, then just change myfun() below gmax - function(x, rnk=1){ ## generalized maximum with rnk=1 being the bigest value (i.e. max) return( sort( unique(x), decreasing=T )[rnk] ) } myfun - function(x){ ifelse( x==gmax(x,1), high, ifelse( x==gmax(x,2), med, low ) ) } out - lapply( split(dfr$var3, dfr$categ), myfun ) data.frame( dfr, my.score = unsplit(out, dfr$categ) ) Regards, Adai Lauri Nikkinen wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
try this: dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ), ] dfr$score - unlist(tapply(dfr$var3, dfr$categ, function (x) { sn - sort(unique(x), decreasing = TRUE) labs - c(high, mid, rep(low, length(sn) - 2)) labs[match(x, sn)] })) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Lauri Nikkinen [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Friday, May 18, 2007 3:15 PM Subject: [R] Simple programming question Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) Also note that the factor labels were arranged so that low, mid and high correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
The solution already calculates it as numeric and only after that does it convert it to factor so just omit the conversion: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) score - ave(dfr$var3, dfr$categ, FUN = f) As mentioned, this assigns 1 to low (everything other than the highest two numbers in a category), 2 to the second highest and 3 to the highest. If you want some other assignment, e.g. 3 is low, 1 is mid and 0 is high then try: c(3, 1, 0)[score] On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Thank you all for your answers. Actually Gabor's first post was right in that sense that I wanted to have low to all cases which are lower than second highest. But how about if I want to convert/recode those high, mid and low to numeric to make some calculations, e.g. 3, 1, 0 respectively. How do I have to modify your solutions? I would also like to apply this solution to many kinds of recoding situations. -Lauri 2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]: There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) Also note that the factor labels were arranged so that low, mid and high correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R: Simple programming question
try also this dfr$score-factor(dfr$var3 %in% sort(unique(dfr$var3), decr=T)[1:2] * dfr$var3, labels=c(low, mid, high)) Hope this helps, Stefano -Messaggio originale- Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] conto di Lauri Nikkinen Inviato: venerdì 18 maggio 2007 15.15 A: r-help@stat.math.ethz.ch Oggetto: [R] Simple programming question Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
Thank you all for your answers. Actually Gabor's first post was right in that sense that I wanted to have low to all cases which are lower than second highest. But how about if I want to convert/recode those high, mid and low to numeric to make some calculations, e.g. 3, 1, 0 respectively. How do I have to modify your solutions? I would also like to apply this solution to many kinds of recoding situations. -Lauri 2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]: There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) Also note that the factor labels were arranged so that low, mid and high correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple programming question
?cut This would recode to a factor with numeric labels for its levels. as.numeric(as.character(...))would then convert the labels to numeric values that you can manipulate. This presumes that the variable you are coding is numeric and you want to recode by binning the values into ordered bins. Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lauri Nikkinen Sent: Friday, May 18, 2007 8:02 AM To: Gabor Grothendieck Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Simple programming question Thank you all for your answers. Actually Gabor's first post was right in that sense that I wanted to have low to all cases which are lower than second highest. But how about if I want to convert/recode those high, mid and low to numeric to make some calculations, e.g. 3, 1, 0 respectively. How do I have to modify your solutions? I would also like to apply this solution to many kinds of recoding situations. -Lauri 2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]: There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) Also note that the factor labels were arranged so that low, mid and high correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote: Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high)) On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote: Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr - dfr[order(dfr$categ),] and I want to score values or points in variable named var3 following this kind of logic: 1. the highest value of var3 within category (variable named categ) - high 2. the second highest value - mid 3. lowest value - low This would be the output of this reasoning: dfr$score - factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low ,low,high,mid,low,low)) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.