Re: [R] grep(pattern = each element of a vector) ?
Hi, res- ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) #Warning messages: #1: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used #2: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used #3: In grep(x, df2$Taxa) : # argument 'pattern' has length 1 and only the first element will be used May be it is better to modify the function: find.class- function(x) df2[grep(unique(x),df2$Taxa),'Class'] res1- ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) #no warnings #though it doesn't have any effect in the end result. identical(res,res1) #[1] TRUE A.K. - Original Message - From: Allen, Joel allen.j...@epa.gov To: Beaulieu, Jake beaulieu.j...@epa.gov; r-help@r-project.org r-help@r-project.org Cc: Farrar, David farrar.da...@epa.gov; Green, Hyatt green.hy...@epa.gov; McManus, Michael mcmanus.mich...@epa.gov; Wahman, David wahman.da...@epa.gov Sent: Thursday, September 12, 2013 2:49 PM Subject: Re: [R] grep(pattern = each element of a vector) ? Jake, You can use the plyr library or some form of apply. If you are on a 64bit system you can multithread and it goes much faster. something like this(for 32bit): require(plyr) df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) #function to do the lookup find.class-function(x)df2[grep(x, df2$Taxa),'Class'] ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) Joel From: Beaulieu, Jake Sent: Thursday, September 12, 2013 12:06 PM To: r-help@r-project.org Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael Subject: grep(pattern = each element of a vector) ? Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep(pattern = each element of a vector) ?
Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep(pattern = each element of a vector) ?
Jake, You can use the plyr library or some form of apply. If you are on a 64bit system you can multithread and it goes much faster. something like this(for 32bit): require(plyr) df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) #function to do the lookup find.class-function(x)df2[grep(x, df2$Taxa),'Class'] ddply(.data=df1, .variables='Taxa', .fun=transform, Class=find.class(Taxa)) Joel From: Beaulieu, Jake Sent: Thursday, September 12, 2013 12:06 PM To: r-help@r-project.org Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael Subject: grep(pattern = each element of a vector) ? Hi, I have a large dataframe that contains species names. I have a second dataframe that contains species names and some additional info, called 'Class', about each species. I would like match the species name is the first data frame with the 'Class' information contained in the second. Since the species names are often formatted differently between the data sets, merge doesn't work well. grep does the trick, but the function needs to be called separately for each observation in the first data frame. I put grep into a loop, but this is too slow. Is there a way to run grep repeatedly without resorting to a loop? Possibly something in the apply family? df1 - data.frame(Taxa = c('blue', 'red', NA)) df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A')) index - NULL for (i in 1:length(df1$Taxa)) { index[i] - grep(df1$Taxa[1], df2$Taxa) } index sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) == Jake J. Beaulieu, PhD US Environmental Protection Agency National Risk Management Research Lab 26 W. Martin Luther King Drive Cincinnati, OH 45268 USA 513-569-7842 (desk) 513-487-2511 (fax) beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
try this using strsplit: x - round(runif(10)*10, digits=0) y - as.Date(x, origin=1970-01-01) str(y) Class 'Date' num [1:10] 26551 37212 57285 90821 20168 ... y1 - as.character(y) str(y1) chr [1:10] 2042-09-11 2071-11-19 2126-11-04 2218-08-30 2025-03-21 2215-12-22 ... x - strsplit(y1, '-') x[1:3] [[1]] [1] 2042 09 11 [[2]] [1] 2071 11 19 [[3]] [1] 2126 11 04 x.1 - sapply(x, '[', 3) str(x.1) chr [1:10] 11 19 04 30 21 22 24 03 31 02 On Tue, May 24, 2011 at 10:19 AM, Kang Min ngokang...@gmail.com wrote: I have another question - I'd like to extract dates from a vector of -mm-dd, so I just want the dd. x - round(runif(10)*10, digits=0) y - as.Date(x, origin=1970-01-01) I tried this based on the code that Jim provided, but it just printed the whole date. I think I just need to tweak it a little, but haven't been able to figure it out. y[grep([[:digit:]]{2}$, y)] Thanks. Kang Min On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote: If you want to only match names of length 6, you will have to use thispattern: x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ, + ZZAZ, ZRITEZ) # match exactly values of length 6 len6 - ^Z[[:alpha:]]{4}Z$ grep(len6, x) [1] 2 5 9 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote: Thanks! On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote: On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset apatternin a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
I have another question - I'd like to extract dates from a vector of -mm-dd, so I just want the dd. x - round(runif(10)*10, digits=0) y - as.Date(x, origin=1970-01-01) I tried this based on the code that Jim provided, but it just printed the whole date. I think I just need to tweak it a little, but haven't been able to figure it out. y[grep([[:digit:]]{2}$, y)] Thanks. Kang Min On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote: If you want to only match names of length 6, you will have to use thispattern: x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ, + ZZAZ, ZRITEZ) # match exactly values of length 6 len6 - ^Z[[:alpha:]]{4}Z$ grep(len6, x) [1] 2 5 9 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote: Thanks! On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote: On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset apatternin a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
Thanks! On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote: On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset a pattern in a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
If you want to only match names of length 6, you will have to use this pattern: x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ, + ZZAZ, ZRITEZ) # match exactly values of length 6 len6 - ^Z[[:alpha:]]{4}Z$ grep(len6, x) [1] 2 5 9 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote: Thanks! On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote: On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset a pattern in a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep pattern
Hi all, I'm trying to subset a pattern in a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. Thanks. Kangmin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep pattern
On May 20, 2011, at 11:57 AM, Kang Min wrote: Hi all, I'm trying to subset a pattern in a vector. Each argument has 6 letters, and I need those that start with Z and end with Z. e.g. x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) I've looked up other discussions but still can't seem to find the answer. You may need to study the regex page a bit longer the ^ is the beginning of a string .+ will math can arbitrarily long string of anything and $ indicates the end of a string x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ) grep(^Z.+Z$, x) [1] 2 5 grep(^Z.+Z$, x, value=TRUE) [1] ZFHJKZ ZKFLPZ Thanks. Kangmin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.