Re: [R] grep(pattern = each element of a vector) ?
Hi, res- ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) #Warning messages: #1: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used #2: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used #3: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used May be it is better to modify the function: find.class- function(x) df2[grep(unique(x),df2$Taxa),'Class'] res1- ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) #no warnings #though it doesn't have any effect in the end result. identical(res,res1) #[1] TRUE A.K. - Original Message - From: Allen, Joel allen.j...@epa.gov To: Beaulieu, Jake beaulieu.j...@epa.gov; r-help@r-project.org r-help@r-project.org Cc: Farrar, David farrar.da...@epa.gov; Green, Hyatt green.hy...@epa.gov; McManus, Michael mcmanus.mich...@epa.gov; Wahman, David wahman.da...@epa.gov Sent: Thursday, September 12, 2013 2:49 PM Subject: Re: [R] grep(pattern = each element of a vector) ? Jake, You can use the plyr library or some form of apply. If you are on a 64bit system you can multithread and it goes much faster. something like this(for 32bit): require(plyr) df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) #function to do the lookup find.class-function(x)df2[grep(x, df2$Taxa),'Class'] ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) Joel From: Beaulieu, Jake Sent: Thursday, September 12, 2013 12:06 PM To: r-help@r-project.org Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael Subject: grep(pattern = each element of a vector) ? Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep(pattern = each element of a vector) ?
Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep(pattern = each element of a vector) ?
Jake, You can use the plyr library or some form of apply. If you are on a 64bit system you can multithread and it goes much faster. something like this(for 32bit): require(plyr) df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) #function to do the lookup find.class-function(x)df2[grep(x, df2$Taxa),'Class'] ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) Joel From: Beaulieu, Jake Sent: Thursday, September 12, 2013 12:06 PM To: r-help@r-project.org Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael Subject: grep(pattern = each element of a vector) ? Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.