Yup, that does it. Let grep figure out what's a word rather than doing it manually. Forgot about "\b"
Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Thu, Jul 9, 2015 at 10:30 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Just add a word break marker before and after: > > zz$v5 <- grepl( paste0( "\\b(", paste0( alarm.words, collapse="|" ), ")\\b" > ), do.call( paste, zz[ , 2:3 ] ) ) ) > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On July 9, 2015 10:12:23 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >>Jeff: >> >>Well, it would be much better (no loops!) except, I think, for one >>issue: "red" would match "barred" and I don't think that this is what >>is wanted: the matches should be on whole "words" not just string >>patterns. >> >>So you would need to fix up the matching pattern to make this work, >>but it may be a little tricky, as arbitrary whitespace characters, >>e.g. " " or "\n" etc. could be in the strings to be matched separating >>the words or ending the "sentence." I'm sure it can be done, but I'll >>leave it to you or others to figure it out. >> >>Of course, if my diagnosis is wrong or silly, please point this out. >> >>Cheers, >>Bert >> >> >>Bert Gunter >> >>"Data is not information. Information is not knowledge. And knowledge >>is certainly not wisdom." >> -- Clifford Stoll >> >> >>On Thu, Jul 9, 2015 at 9:34 AM, Jeff Newmiller >><jdnew...@dcn.davis.ca.us> wrote: >>> I think grep is better suited to this: >>> >>> zz$v5 <- grepl( paste0( alarm.words, collapse="|" ), do.call( paste, >>zz[ , 2:3 ] ) ) ) >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> On July 9, 2015 8:51:10 AM PDT, Bert Gunter <bgunter.4...@gmail.com> >>wrote: >>>>Here's a way to do it that uses %in% (i.e. match() ) and uses only a >>>>single, not a double, loop. It should be more efficient. >>>> >>>>> sapply(strsplit(do.call(paste,zz[,2:3]),"[[:space:]]+"), >>>>+ function(x)any(x %in% alarm.words)) >>>> >>>> [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE >>>> >>>>The idea is to paste the strings in each row (do.call allows an >>>>arbitrary number of columns) into a single string and then use >>>>strsplit to break the string into individual "words" on whitespace. >>>>Then the matching is vectorized with the any( %in% ... ) call. >>>> >>>>Cheers, >>>>Bert >>>>Bert Gunter >>>> >>>>"Data is not information. Information is not knowledge. And knowledge >>>>is certainly not wisdom." >>>> -- Clifford Stoll >>>> >>>> >>>>On Thu, Jul 9, 2015 at 6:05 AM, John Fox <j...@mcmaster.ca> wrote: >>>>> Dear Chris, >>>>> >>>>> If I understand correctly what you want, how about the following? >>>>> >>>>>> rows <- apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words, >>>>grepl, x=x))) >>>>>> zz[rows, ] >>>>> >>>>> v1 v2 v3 v4 >>>>> 3 -1.022329 green turtle ronald weasley 2 >>>>> 6 0.336599 waffle the hamster red sparks 1 >>>>> 9 -1.631874 yellow giraffe with a long neck gandalf the white 1 >>>>> 10 1.130622 black bear gandalf the grey 2 >>>>> >>>>> I hope this helps, >>>>> John >>>>> >>>>> ------------------------------------------------ >>>>> John Fox, Professor >>>>> McMaster University >>>>> Hamilton, Ontario, Canada >>>>> http://socserv.mcmaster.ca/jfox/ >>>>> >>>>> >>>>> On Wed, 08 Jul 2015 22:23:37 -0400 >>>>> "Christopher W. Ryan" <cr...@binghamton.edu> wrote: >>>>>> Running R 3.1.1 on windows 7 >>>>>> >>>>>> I want to identify as a case any record in a dataframe that >>contains >>>>any >>>>>> of several keywords in any of several variables. >>>>>> >>>>>> Example: >>>>>> >>>>>> # create a dataframe with 4 variables and 10 records >>>>>> v2 <- c("white bird", "blue bird", "green turtle", "quick brown >>>>fox", >>>>>> "big black dog", "waffle the hamster", "benny likes food a lot", >>>>"hello >>>>>> world", "yellow giraffe with a long neck", "black bear") >>>>>> v3 <- c("harry potter", "hermione grainger", "ronald weasley", >>>>"ginny >>>>>> weasley", "dudley dursley", "red sparks", "blue sparks", "white >>>>dress >>>>>> robes", "gandalf the white", "gandalf the grey") >>>>>> zz <- data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, >>lambda=2), >>>>>> stringsAsFactors=FALSE) >>>>>> str(zz) >>>>>> zz >>>>>> >>>>>> # here are the keywords >>>>>> alarm.words <- c("red", "green", "turtle", "gandalf") >>>>>> >>>>>> # For each row/record, I want to test whether the string in v2 or >>>>the >>>>>> string in v3 contains any of the strings in alarm.words. And then >>if >>>>so, >>>>>> set zz$v5=TRUE for that record. >>>>>> >>>>>> # I'm thinking the str_detect function in the stringr package >>ought >>>>to >>>>>> be able to help, perhaps with some use of apply over the rows, but >>I >>>>>> obviously misunderstand something about how str_detect works >>>>>> >>>>>> library(stringr) >>>>>> >>>>>> str_detect(zz[,2:3], alarm.words) # error: the target of the >>>>search >>>>>> # must be a vector, not >>>>multiple >>>>>> # columns >>>>>> >>>>>> str_detect(zz[1:4,2:3], alarm.words) # same error >>>>>> >>>>>> str_detect(zz[,2], alarm.words) # error, length of >>alarm.words >>>>>> # is less than the number of >>>>>> # rows I am using for the >>>>>> # comparison >>>>>> >>>>>> str_detect(zz[1:4,2], alarm.words) # works as hoped when >>>>>> length(alarm.words) # confining nrows >>>>>> # to the length of >>alarm.words >>>>>> >>>>>> str_detect(zz, alarm.words) # obviously not right >>>>>> >>>>>> # maybe I need apply() ? >>>>>> my.f <- function(x){str_detect(x, alarm.words)} >>>>>> >>>>>> apply(zz[,2], 1, my.f) # again, a mismatch in lengths >>>>>> # between alarm.words and that >>>>>> # in which I am searching for >>>>>> # matching strings >>>>>> >>>>>> apply(zz, 2, my.f) # now I'm getting somewhere >>>>>> apply(zz[1:4,], 2, my.f) # but still only works with 4 >>>>>> # rows of the dataframe >>>>>> >>>>>> >>>>>> # perhaps %in% could do the job? >>>>>> >>>>>> Appreciate any advice. >>>>>> >>>>>> --Chris Ryan >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>>______________________________________________ >>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. >>> > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.