Re: [R] lookup in R - possible to avoid loops?
Try this: merge(my.df, my.lookup) On Mon, Nov 8, 2010 at 5:43 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! Hope there is a nifty way to speed up my code by avoiding loops. My task is simple - analogous to the vlookup formula in Excel. Here is how I programmed it: # My example data frame: set.seed(1245) my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0)) my.df-my.df[order(my.df$names),] my.df$names-as.character(my.df$names) (my.df) # My example lookup table: my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC)) my.lookup$names-as.character(my.lookup$names) my.lookup$category-as.character(my.lookup$category) (my.lookup) # Just adding an extra column to my.df that contains the categories of the names in the column names: my.df2-my.df my.df2$category-NA for(i in unique(my.df$names)){ my.df2$category[my.df2$names %in% i]-my.lookup$category[my.lookup$names %in% i] } (my.df2) It does what I need, but it's way too slow - I need to run it for hundreds and hundreds of names in 100 of huge files (tens of thousands of rows in each). Any way to speed it up? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lookup in R - possible to avoid loops?
Dimitri - While merge is most likely the fastest way to solve your problem, I just want to point out that you can use a named vector as a lookup table. For your example: categories = my.lookup$category names(categories) = my.lookup$names creates the lookup table, and my.df$category = categories[my.df$names] creates the category column. - Phil On Mon, 8 Nov 2010, Dimitri Liakhovitski wrote: Hello! Hope there is a nifty way to speed up my code by avoiding loops. My task is simple - analogous to the vlookup formula in Excel. Here is how I programmed it: # My example data frame: set.seed(1245) my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0)) my.df-my.df[order(my.df$names),] my.df$names-as.character(my.df$names) (my.df) # My example lookup table: my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC)) my.lookup$names-as.character(my.lookup$names) my.lookup$category-as.character(my.lookup$category) (my.lookup) # Just adding an extra column to my.df that contains the categories of the names in the column names: my.df2-my.df my.df2$category-NA for(i in unique(my.df$names)){ my.df2$category[my.df2$names %in% i]-my.lookup$category[my.lookup$names %in% i] } (my.df2) It does what I need, but it's way too slow - I need to run it for hundreds and hundreds of names in 100 of huge files (tens of thousands of rows in each). Any way to speed it up? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lookup in R - possible to avoid loops?
Thanks a lot - extremely heplful! While I'll definitely try to use merge in the future, in my situation I run into problems with memory (files are too large). However, Phil's suggestion is perfect for me - sped me up considerably! Thank you, again! Dimitri On Mon, Nov 8, 2010 at 2:51 PM, Phil Spector spec...@stat.berkeley.edu wrote: Dimitri - While merge is most likely the fastest way to solve your problem, I just want to point out that you can use a named vector as a lookup table. For your example: categories = my.lookup$category names(categories) = my.lookup$names creates the lookup table, and my.df$category = categories[my.df$names] creates the category column. - Phil On Mon, 8 Nov 2010, Dimitri Liakhovitski wrote: Hello! Hope there is a nifty way to speed up my code by avoiding loops. My task is simple - analogous to the vlookup formula in Excel. Here is how I programmed it: # My example data frame: set.seed(1245) my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0)) my.df-my.df[order(my.df$names),] my.df$names-as.character(my.df$names) (my.df) # My example lookup table: my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC)) my.lookup$names-as.character(my.lookup$names) my.lookup$category-as.character(my.lookup$category) (my.lookup) # Just adding an extra column to my.df that contains the categories of the names in the column names: my.df2-my.df my.df2$category-NA for(i in unique(my.df$names)){ my.df2$category[my.df2$names %in% i]-my.lookup$category[my.lookup$names %in% i] } (my.df2) It does what I need, but it's way too slow - I need to run it for hundreds and hundreds of names in 100 of huge files (tens of thousands of rows in each). Any way to speed it up? Thanks a lot! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski Ninah Consulting www.ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.