Re: [R] lookup in R - possible to avoid loops?

2010-11-08 Thread Henrique Dallazuanna
Try this:

 merge(my.df, my.lookup)

On Mon, Nov 8, 2010 at 5:43 PM, Dimitri Liakhovitski 
dimitri.liakhovit...@gmail.com wrote:

 Hello!
 Hope there is a nifty way to speed up my code by avoiding loops.
 My task is simple - analogous to the vlookup formula in Excel. Here is
 how I programmed it:

 # My example data frame:
 set.seed(1245)

 my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0))
 my.df-my.df[order(my.df$names),]
 my.df$names-as.character(my.df$names)
 (my.df)

 # My example lookup table:
 my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC))
 my.lookup$names-as.character(my.lookup$names)
 my.lookup$category-as.character(my.lookup$category)
 (my.lookup)

 # Just adding an extra column to my.df that contains the categories of
 the names in the column names:
 my.df2-my.df
 my.df2$category-NA
 for(i in unique(my.df$names)){
my.df2$category[my.df2$names %in%
 i]-my.lookup$category[my.lookup$names %in% i]
 }
 (my.df2)

 It does what I need, but it's way too slow - I need to run it for
 hundreds and hundreds of names in 100 of huge files (tens of
 thousands of rows in each).
 Any way to speed it up?


 Thanks a lot!

 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lookup in R - possible to avoid loops?

2010-11-08 Thread Phil Spector

Dimitri -
   While merge is most likely the fastest way to solve
your problem, I just want to point out that you can use
a named vector as a lookup table.  For your example:

categories = my.lookup$category
names(categories) = my.lookup$names

creates the lookup table, and

my.df$category = categories[my.df$names]

creates the category column.
   - Phil



On Mon, 8 Nov 2010, Dimitri Liakhovitski wrote:


Hello!
Hope there is a nifty way to speed up my code by avoiding loops.
My task is simple - analogous to the vlookup formula in Excel. Here is
how I programmed it:

# My example data frame:
set.seed(1245)
my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0))
my.df-my.df[order(my.df$names),]
my.df$names-as.character(my.df$names)
(my.df)

# My example lookup table:
my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC))
my.lookup$names-as.character(my.lookup$names)
my.lookup$category-as.character(my.lookup$category)
(my.lookup)

# Just adding an extra column to my.df that contains the categories of
the names in the column names:
my.df2-my.df
my.df2$category-NA
for(i in unique(my.df$names)){
my.df2$category[my.df2$names %in%
i]-my.lookup$category[my.lookup$names %in% i]
}
(my.df2)

It does what I need, but it's way too slow - I need to run it for
hundreds and hundreds of names in 100 of huge files (tens of
thousands of rows in each).
Any way to speed it up?


Thanks a lot!

--
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lookup in R - possible to avoid loops?

2010-11-08 Thread Dimitri Liakhovitski
Thanks a lot - extremely heplful!
While I'll definitely try to use merge in the future, in my situation
I run into problems with memory (files are too large).
However, Phil's suggestion is perfect for me - sped me up considerably!
Thank you, again!
Dimitri

On Mon, Nov 8, 2010 at 2:51 PM, Phil Spector spec...@stat.berkeley.edu wrote:
 Dimitri -
   While merge is most likely the fastest way to solve
 your problem, I just want to point out that you can use
 a named vector as a lookup table.  For your example:

 categories = my.lookup$category
 names(categories) = my.lookup$names

 creates the lookup table, and

 my.df$category = categories[my.df$names]

 creates the category column.
                                           - Phil



 On Mon, 8 Nov 2010, Dimitri Liakhovitski wrote:

 Hello!
 Hope there is a nifty way to speed up my code by avoiding loops.
 My task is simple - analogous to the vlookup formula in Excel. Here is
 how I programmed it:

 # My example data frame:
 set.seed(1245)

 my.df-data.frame(names=rep(letters[1:3],3),value=round(rnorm(9,mean=20,sd=5),0))
 my.df-my.df[order(my.df$names),]
 my.df$names-as.character(my.df$names)
 (my.df)

 # My example lookup table:
 my.lookup-data.frame(names=letters[1:3],category=c(AAA,BBB,CCC))
 my.lookup$names-as.character(my.lookup$names)
 my.lookup$category-as.character(my.lookup$category)
 (my.lookup)

 # Just adding an extra column to my.df that contains the categories of
 the names in the column names:
 my.df2-my.df
 my.df2$category-NA
 for(i in unique(my.df$names)){
        my.df2$category[my.df2$names %in%
 i]-my.lookup$category[my.lookup$names %in% i]
 }
 (my.df2)

 It does what I need, but it's way too slow - I need to run it for
 hundreds and hundreds of names in 100 of huge files (tens of
 thousands of rows in each).
 Any way to speed it up?


 Thanks a lot!

 --
 Dimitri Liakhovitski
 Ninah Consulting
 www.ninah.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.