Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread arun
Hi,
res- ddply(.data=df1,
  .variables='Taxa',
   .fun=transform,
   Class=find.class(Taxa))
#Warning messages:
#1: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#2: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used
#3: In grep(x, df2$Taxa) :
 # argument 'pattern' has length  1 and only the first element will be used

May be it is better to modify the function:
find.class- function(x) df2[grep(unique(x),df2$Taxa),'Class']
res1- ddply(.data=df1,
   .variables='Taxa',
    .fun=transform,
    Class=find.class(Taxa)) #no warnings

#though it doesn't have any effect in the end result.
 identical(res,res1) 
#[1] TRUE


A.K.





- Original Message -
From: Allen, Joel allen.j...@epa.gov
To: Beaulieu, Jake beaulieu.j...@epa.gov; r-help@r-project.org 
r-help@r-project.org
Cc: Farrar, David farrar.da...@epa.gov; Green, Hyatt 
green.hy...@epa.gov; McManus, Michael mcmanus.mich...@epa.gov; Wahman, 
David wahman.da...@epa.gov
Sent: Thursday, September 12, 2013 2:49 PM
Subject: Re: [R] grep(pattern = each element of a vector) ?

Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
      .variables='Taxa',
      .fun=transform,
      Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
    index[i] - grep(df1$Taxa[1], df2$Taxa)
    }
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Beaulieu, Jake
Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep(pattern = each element of a vector) ?

2013-09-12 Thread Allen, Joel
Jake,
You can use the plyr library or some form of apply.  If you are on a 64bit 
system you can multithread and it goes much faster.

something like this(for 32bit):
require(plyr)
df1 - data.frame(Taxa = c('blue', 'red', NA,'blue', 'red', NA,'blue', 'red', 
NA))
df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

#function to do the lookup
find.class-function(x)df2[grep(x, df2$Taxa),'Class']

ddply(.data=df1,
  .variables='Taxa',
  .fun=transform,
  Class=find.class(Taxa))

Joel

From: Beaulieu, Jake
Sent: Thursday, September 12, 2013 12:06 PM
To: r-help@r-project.org
Cc: Wahman, David; Farrar, David; Allen, Joel; Green, Hyatt; McManus, Michael
Subject: grep(pattern = each element of a vector) ?

Hi,

I have a large dataframe that contains species names.  I have a second 
dataframe that contains species names and some additional info, called 'Class', 
about each species.  I would like match the species name is the first data 
frame with the 'Class' information contained in the second.  Since the species 
names are often formatted differently between the data sets, merge doesn't work 
well.  grep does the trick, but the function needs to be called separately for 
each observation in the first data frame.  I put grep into a loop, but this is 
too slow.  Is there a way to run grep repeatedly without resorting to a loop?  
Possibly something in the apply family?

  df1 - data.frame(Taxa = c('blue', 'red', NA))
  df2 - data.frame(Taxa = c( 'blue', 'red', NA), Class = c('Z', 'HI', 'A'))

  index - NULL
  for (i in 1:length(df1$Taxa)) {
index[i] - grep(df1$Taxa[1], df2$Taxa)
}
  index

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

==
Jake J. Beaulieu, PhD
US Environmental Protection Agency
National Risk Management Research Lab
26 W. Martin Luther King Drive
Cincinnati, OH 45268
USA
513-569-7842  (desk)
513-487-2511 (fax)
beaulieu.j...@epa.govmailto:beaulieu.j...@epa.gov


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-25 Thread jim holtman
try this using strsplit:

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)
 str(y)
Class 'Date'  num [1:10] 26551 37212 57285 90821 20168 ...
 y1 - as.character(y)
 str(y1)
 chr [1:10] 2042-09-11 2071-11-19 2126-11-04 2218-08-30
2025-03-21 2215-12-22 ...
 x - strsplit(y1, '-')
 x[1:3]
[[1]]
[1] 2042 09   11

[[2]]
[1] 2071 11   19

[[3]]
[1] 2126 11   04

 x.1 - sapply(x, '[', 3)
 str(x.1)
 chr [1:10] 11 19 04 30 21 22 24 03 31 02



On Tue, May 24, 2011 at 10:19 AM, Kang Min ngokang...@gmail.com wrote:
 I have another question -

 I'd like to extract dates from a vector of -mm-dd, so I just want
 the dd.

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)

 I tried this based on the code that Jim provided, but it just printed
 the whole date. I think I just need to tweak it a little, but haven't
 been able to figure it out.

 y[grep([[:digit:]]{2}$, y)]

 Thanks.
 Kang Min

 On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote:
 If you want to only match names of length 6, you will have to use 
 thispattern:

  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,

 +     ZZAZ, ZRITEZ)









  # match exactly values of length 6
  len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
 [1] 2 5 9

 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
  Thanks!

  On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
  On May 20, 2011, at 11:57 AM, Kang Min wrote:

   Hi all,

   I'm trying to subset apatternin a vector. Each argument has 6
   letters, and I need those that start with Z and end with Z.

   e.g.
   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

   I've looked up other discussions but still can't seem to find the
   answer.

  You may need to study the regex page a bit longer

  the ^ is the beginning of a string
  .+ will math can arbitrarily long string of anything
  and $ indicates the end of a string

    x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
  [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
  [1] ZFHJKZ ZKFLPZ

   Thanks.
   Kangmin

   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  David Winsemius, MD
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting 
  guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-24 Thread Kang Min
I have another question -

I'd like to extract dates from a vector of -mm-dd, so I just want
the dd.

x - round(runif(10)*10, digits=0)
y - as.Date(x, origin=1970-01-01)

I tried this based on the code that Jim provided, but it just printed
the whole date. I think I just need to tweak it a little, but haven't
been able to figure it out.

y[grep([[:digit:]]{2}$, y)]

Thanks.
Kang Min

On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote:
 If you want to only match names of length 6, you will have to use thispattern:

  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,

 +     ZZAZ, ZRITEZ)









  # match exactly values of length 6
  len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
 [1] 2 5 9

 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
  Thanks!

  On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
  On May 20, 2011, at 11:57 AM, Kang Min wrote:

   Hi all,

   I'm trying to subset apatternin a vector. Each argument has 6
   letters, and I need those that start with Z and end with Z.

   e.g.
   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

   I've looked up other discussions but still can't seem to find the
   answer.

  You may need to study the regex page a bit longer

  the ^ is the beginning of a string
  .+ will math can arbitrarily long string of anything
  and $ indicates the end of a string

    x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
  [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
  [1] ZFHJKZ ZKFLPZ

   Thanks.
   Kangmin

   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  David Winsemius, MD
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-22 Thread Kang Min
Thanks!

On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
 On May 20, 2011, at 11:57 AM, Kang Min wrote:

  Hi all,

  I'm trying to subset a pattern in a vector. Each argument has 6
  letters, and I need those that start with Z and end with Z.

  e.g.
  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

  I've looked up other discussions but still can't seem to find the
  answer.

 You may need to study the regex page a bit longer

 the ^ is the beginning of a string
 .+ will math can arbitrarily long string of anything
 and $ indicates the end of a string

   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
 [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
 [1] ZFHJKZ ZKFLPZ



  Thanks.
  Kangmin

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-22 Thread jim holtman
If you want to only match names of length 6, you will have to use this pattern:

 x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,
+ ZZAZ, ZRITEZ)
 # match exactly values of length 6
 len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
[1] 2 5 9



On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
 Thanks!

 On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
 On May 20, 2011, at 11:57 AM, Kang Min wrote:

  Hi all,

  I'm trying to subset a pattern in a vector. Each argument has 6
  letters, and I need those that start with Z and end with Z.

  e.g.
  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

  I've looked up other discussions but still can't seem to find the
  answer.

 You may need to study the regex page a bit longer

 the ^ is the beginning of a string
 .+ will math can arbitrarily long string of anything
 and $ indicates the end of a string

   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
 [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
 [1] ZFHJKZ ZKFLPZ



  Thanks.
  Kangmin

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep pattern

2011-05-20 Thread Kang Min
Hi all,

I'm trying to subset a pattern in a vector. Each argument has 6
letters, and I need those that start with Z and end with Z.

e.g.
 x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

I've looked up other discussions but still can't seem to find the
answer.

Thanks.
Kangmin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-20 Thread David Winsemius


On May 20, 2011, at 11:57 AM, Kang Min wrote:


Hi all,

I'm trying to subset a pattern in a vector. Each argument has 6
letters, and I need those that start with Z and end with Z.

e.g.
x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

I've looked up other discussions but still can't seem to find the
answer.


You may need to study the regex page a bit longer

the ^ is the beginning of a string
.+ will math can arbitrarily long string of anything
and $ indicates the end of a string

 x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
 grep(^Z.+Z$, x)
[1] 2 5
 grep(^Z.+Z$, x, value=TRUE)
[1] ZFHJKZ ZKFLPZ




Thanks.
Kangmin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.