Re: [R] Assigning cases to groupings based on the values of several variables
On 12-12-07 7:27 AM, Dimitri Liakhovitski wrote: Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Since your groups are so regular, you can compute the groups directly. Convert each column to a factor (this might have happened automatically, depending on your data and options), then use as.integer to convert to a numeric value. So a simple solution would be mydata$mygroup.m4 - with(mydata, 4*(2-as.integer(factor(sex))) + as.integer(factor(age))) It would be a little simpler if you wanted the sex factor in alphbetical order; then you wouldn't need to subtract from 2. If your real data wasn't so regular, another approach would be to set up a matrix, indexed by sex and age, that gives the desired group number. That is somewhat like your groupings solution; I'm not sure it would be preferable to what you did. Duncan Murdoch Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more efficient. Can you please tell me: 1. Which of my methods is more efficient? 2. Is there maybe an even more efficient r-like way of doing it? Imagine - mydata is actually a very tall data frame. Thanks a lot! Dimitri ### My Data: mydata-data.frame(sex=rep(c(rep(m,4),rep(f,4)),2),age=rep(c(1:4,1:4),2)) (mydata) ### My desired assignments (in column mygroup) groupings-data.frame(sex=c(rep(m,4),rep(f,4)),age=c(1:4,1:4),mygroup=1:8) (groupings) # No, I don't need a solution where the last column of groupings is stacked twice and bound to mydata # Method 1 of assigning to groups - requires a lot of lines of code: mydata$mygroup.m1-NA mydata[(mydata$sex %in% m)(mydata$age %in% 1),mygroup.m1]-1 mydata[(mydata$sex %in% m)(mydata$age %in% 2),mygroup.m1]-2 mydata[(mydata$sex %in% m)(mydata$age %in% 3),mygroup.m1]-3 mydata[(mydata$sex %in% m)(mydata$age %in% 4),mygroup.m1]-4 mydata[(mydata$sex %in% f)(mydata$age %in% 1),mygroup.m1]-5 mydata[(mydata$sex %in% f)(mydata$age %in% 2),mygroup.m1]-6 mydata[(mydata$sex %in% f)(mydata$age %in% 3),mygroup.m1]-7 mydata[(mydata$sex %in% f)(mydata$age %in% 4),mygroup.m1]-8 (mydata) # Method 2 of assigning to groups - very loopy: mydata$mygroup.m2-NA for(i in 1:nrow(mydata)){ # i-1 mysex-mydata[i,sex] myage-mydata[i,age] mydata[i,mygroup.m2]-groupings[(groupings$sex %in% mysex)(groupings$age %in% myage),mygroup] } (mydata) # Method 3 of assigning to groups - also loopy, but less than Method 2: mydata$mygroup.m3-NA for(i in 1:nrow(groupings)){ # i-1 mysex-groupings[i,sex] myage-groupings[i,age] mydata[(mydata$sex %in% mysex)(mydata$age %in% myage),mygroup.m3]-groupings[i,mygroup] } (mydata) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning cases to groupings based on the values of several variables
My example data indeed looks regular, but in reality neither the data nor the assignments are regular. E.g., sometimes all females would land in one grouping and males of different ages will land in different groupings. So, I am afraid the with solution won't work. Dimitri On Fri, Dec 7, 2012 at 7:54 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: mydata$mygroup.m4 - with(mydata, 4*(2-as.integer(factor(sex))) + as.integer(factor(age))) -- Dimitri Liakhovitski gfk.com http://marketfusionanalytics.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning cases to groupings based on the values of several variables
Wow, Arun I think I really like this solution. It allows me to create irregular groupings and is very parsimonious. Thank you very much! Dimitri On Fri, Dec 7, 2012 at 8:09 AM, arun smartpink...@yahoo.com wrote: HI, In your method2 and method3, you are using the groupings data. If that is the case, is it possible for you to use ?merge() or ?join() from library(plyr) join(mydata,groupings,by=c(sex,age),type=inner) # sex age mygroup #1m 1 1 #2m 2 2 #3m 3 3 #4m 4 4 #5f 1 5 #6f 2 6 #7f 3 7 #8f 4 8 #9m 1 1 #10 m 2 2 #11 m 3 3 #12 m 4 4 #13 f 1 5 #14 f 2 6 #15 f 3 7 #16 f 4 8 A.K. - Original Message - From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Friday, December 7, 2012 7:27 AM Subject: [R] Assigning cases to groupings based on the values of several variables Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more efficient. Can you please tell me: 1. Which of my methods is more efficient? 2. Is there maybe an even more efficient r-like way of doing it? Imagine - mydata is actually a very tall data frame. Thanks a lot! Dimitri ### My Data: mydata-data.frame(sex=rep(c(rep(m,4),rep(f,4)),2),age=rep(c(1:4,1:4),2)) (mydata) ### My desired assignments (in column mygroup) groupings-data.frame(sex=c(rep(m,4),rep(f,4)),age=c(1:4,1:4),mygroup=1:8) (groupings) # No, I don't need a solution where the last column of groupings is stacked twice and bound to mydata # Method 1 of assigning to groups - requires a lot of lines of code: mydata$mygroup.m1-NA mydata[(mydata$sex %in% m)(mydata$age %in% 1),mygroup.m1]-1 mydata[(mydata$sex %in% m)(mydata$age %in% 2),mygroup.m1]-2 mydata[(mydata$sex %in% m)(mydata$age %in% 3),mygroup.m1]-3 mydata[(mydata$sex %in% m)(mydata$age %in% 4),mygroup.m1]-4 mydata[(mydata$sex %in% f)(mydata$age %in% 1),mygroup.m1]-5 mydata[(mydata$sex %in% f)(mydata$age %in% 2),mygroup.m1]-6 mydata[(mydata$sex %in% f)(mydata$age %in% 3),mygroup.m1]-7 mydata[(mydata$sex %in% f)(mydata$age %in% 4),mygroup.m1]-8 (mydata) # Method 2 of assigning to groups - very loopy: mydata$mygroup.m2-NA for(i in 1:nrow(mydata)){ # i-1 mysex-mydata[i,sex] myage-mydata[i,age] mydata[i,mygroup.m2]-groupings[(groupings$sex %in% mysex)(groupings$age %in% myage),mygroup] } (mydata) # Method 3 of assigning to groups - also loopy, but less than Method 2: mydata$mygroup.m3-NA for(i in 1:nrow(groupings)){ # i-1 mysex-groupings[i,sex] myage-groupings[i,age] mydata[(mydata$sex %in% mysex)(mydata$age %in% myage),mygroup.m3]-groupings[i,mygroup] } (mydata) -- Dimitri Liakhovitski gfk.com http://marketfusionanalytics.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski gfk.com http://marketfusionanalytics.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assigning cases to groupings based on the values of several variables
HI, In your method2 and method3, you are using the groupings data. If that is the case, is it possible for you to use ?merge() or ?join() from library(plyr) join(mydata,groupings,by=c(sex,age),type=inner) # sex age mygroup #1 m 1 1 #2 m 2 2 #3 m 3 3 #4 m 4 4 #5 f 1 5 #6 f 2 6 #7 f 3 7 #8 f 4 8 #9 m 1 1 #10 m 2 2 #11 m 3 3 #12 m 4 4 #13 f 1 5 #14 f 2 6 #15 f 3 7 #16 f 4 8 A.K. - Original Message - From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Friday, December 7, 2012 7:27 AM Subject: [R] Assigning cases to groupings based on the values of several variables Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more efficient. Can you please tell me: 1. Which of my methods is more efficient? 2. Is there maybe an even more efficient r-like way of doing it? Imagine - mydata is actually a very tall data frame. Thanks a lot! Dimitri ### My Data: mydata-data.frame(sex=rep(c(rep(m,4),rep(f,4)),2),age=rep(c(1:4,1:4),2)) (mydata) ### My desired assignments (in column mygroup) groupings-data.frame(sex=c(rep(m,4),rep(f,4)),age=c(1:4,1:4),mygroup=1:8) (groupings) # No, I don't need a solution where the last column of groupings is stacked twice and bound to mydata # Method 1 of assigning to groups - requires a lot of lines of code: mydata$mygroup.m1-NA mydata[(mydata$sex %in% m)(mydata$age %in% 1),mygroup.m1]-1 mydata[(mydata$sex %in% m)(mydata$age %in% 2),mygroup.m1]-2 mydata[(mydata$sex %in% m)(mydata$age %in% 3),mygroup.m1]-3 mydata[(mydata$sex %in% m)(mydata$age %in% 4),mygroup.m1]-4 mydata[(mydata$sex %in% f)(mydata$age %in% 1),mygroup.m1]-5 mydata[(mydata$sex %in% f)(mydata$age %in% 2),mygroup.m1]-6 mydata[(mydata$sex %in% f)(mydata$age %in% 3),mygroup.m1]-7 mydata[(mydata$sex %in% f)(mydata$age %in% 4),mygroup.m1]-8 (mydata) # Method 2 of assigning to groups - very loopy: mydata$mygroup.m2-NA for(i in 1:nrow(mydata)){ # i-1 mysex-mydata[i,sex] myage-mydata[i,age] mydata[i,mygroup.m2]-groupings[(groupings$sex %in% mysex)(groupings$age %in% myage),mygroup] } (mydata) # Method 3 of assigning to groups - also loopy, but less than Method 2: mydata$mygroup.m3-NA for(i in 1:nrow(groupings)){ # i-1 mysex-groupings[i,sex] myage-groupings[i,age] mydata[(mydata$sex %in% mysex)(mydata$age %in% myage),mygroup.m3]-groupings[i,mygroup] } (mydata) -- Dimitri Liakhovitski gfk.com http://marketfusionanalytics.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.