Jun: "My problem is the pattern has to be dynamically constructed on the input data of the function "
What does that mean? How can a pattern be "dynamically constructed" when you have not made clear (at least to me, perhaps also to yourself and/or others) *how* it is to be constructed? Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 6, 2016 at 8:56 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > Jeff: > > Not sure what you meant by this: > > "There is no other reason to put parentheses in the pattern... they > are not grouping symbols." > > ... but in fact, from ?regexp > > "Repetition takes precedence over concatenation, which in turn takes > precedence over alternation. A whole subexpression may be enclosed in > parentheses to override these precedence rules. " > > So parentheses *are* in fact "grouping symbols." > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Sep 6, 2016 at 5:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> > wrote: >> I am not near my computer today, but each parenthesis gets its own result >> number, so you should put the parenthesis around the whole pattern of >> alternatives instead of having many parentheses. >> >> I recommend thinking in terms of what common information you expect to find >> in these various strings, and place your parentheses to capture that >> information. There is no other reason to put parentheses in the pattern... >> they are not grouping symbols. >> -- >> Sent from my phone. Please excuse my brevity. >> >> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4...@gmail.com> >> wrote: >>>Jun: >>> >>>1. Tell us your desired result from your test vector and maybe someone >>>will help. >>> >>>2. As we played this game once already (you couldn't do it; I showed >>>you how), this seems to be a function of your limitations with regular >>>expressions. I'm probably not much better, but in any case, I don't >>>intend to be your consultant. See if you can find someone locally to >>>help you if you do not receive a satisfactory reply from the list. >>>There are many people here who are pretty good at this sort of thing, >>>but I don't know if they'll reply. Regex's are certainly complex. PERL >>>people tend to be pretty good at them, I believe. There are numerous >>>web sites and books on them if you need to acquire expertise for your >>>work. >>> >>>Cheers, >>>Bert >>>Bert Gunter >>> >>>"The trouble with having an open mind is that people keep coming along >>>and sticking things into it." >>>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>>On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen...@gmail.com> wrote: >>>> Hi Bert, >>>> >>>> I still couldn't make the multiple patterns to work. Here is an >>>example. I >>>> make the pattern as follows >>>> >>>> final.pattern <- >>>> >>>"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\.g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\\.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>110\\.kg)\\.(.*)" >>>> >>>> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05', >>>> '240.m.g.>50-70.kg.geo.mean') >>>> >>>> sub(final.pattern, '\\1', test.string) >>>> sub(final.pattern, '\\2', test.string) >>>> sub(final.pattern, '\\3', test.string) >>>> >>>> Only the third string has been correctly parsed, which matches the >>>first >>>> pattern. It seems the rest of the patterns are not called. >>>> >>>> Jun >>>> >>>> >>>> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4...@gmail.com> >>>wrote: >>>>> >>>>> Just noticed: My clumsy do.call() line in my previously posted code >>>>> below should be replaced with: >>>>> pat <- paste(pat,collapse = "|") >>>>> >>>>> >>>>> > pat <- c(pat1,pat2) >>>>> > paste(pat,collapse="|") >>>>> [1] "a+\\.*a+|b+\\.*b+" >>>>> >>>>> ************ replace this ************************** >>>>> > pat <- do.call(paste,c(as.list(pat), sep="|")) >>>>> ******************************************** >>>>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z) >>>>> [1] "a.a" "bb" "b.bbb" >>>>> >>>>> >>>>> -- Bert >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming >>>along >>>>> and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> >>>>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter >>><bgunter.4...@gmail.com> >>>>> wrote: >>>>> > Jun: >>>>> > >>>>> > You need to provide a clear specification via regular expressions >>>of >>>>> > the patterns you wish to match -- at least for me to decipher it. >>>>> > Others may be smarter than I, though... >>>>> > >>>>> > Jeff: Thanks. I have now convinced myself that it can be done (a >>>>> > "proof" of sorts): If pat1, pat2,..., patn are m different >>>patterns >>>>> > (in a vector of patterns) to be matched in a vector of n strings, >>>>> > where only one of the patterns will match in any string, then use >>>>> > paste() (probably via do.call()) or otherwise to paste them >>>together >>>>> > separated by "|" to form the concatenated pattern, pat. Then >>>>> > >>>>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector) >>>>> > >>>>> > should extract the matching pattern in each (perhaps with a little >>>>> > fiddling due to precedence rules); e.g. >>>>> > >>>>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy") >>>>> > >>>>> >> pat1 <- "a+\\.*a+" >>>>> >> pat2 <-"b+\\.*b+" >>>>> >> pat <- c(pat1,pat2) >>>>> > >>>>> >> pat <- do.call(paste,c(as.list(pat), sep="|")) >>>>> >> pat >>>>> > [1] "a+\\.*a+|b+\\.*b+" >>>>> > >>>>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z) >>>>> > [1] "a.a" "bb" "b.bbb" >>>>> > >>>>> > Cheers, >>>>> > Bert >>>>> > >>>>> > >>>>> > Bert Gunter >>>>> > >>>>> > "The trouble with having an open mind is that people keep coming >>>along >>>>> > and sticking things into it." >>>>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> > >>>>> > >>>>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com> >>>wrote: >>>>> >> Thanks for the reply, Bert. >>>>> >> >>>>> >> Your solution solves the example. I actually have a more general >>>>> >> situation >>>>> >> where I have this dot concatenated string from multiple >>>variables. The >>>>> >> problem is those variables may have values with dots in there. >>>The >>>>> >> number of >>>>> >> dots are not consistent for all values of a variable. So I am >>>thinking >>>>> >> to >>>>> >> define a vector of patterns for the vector of the string and >>>hopefully >>>>> >> to >>>>> >> find a way to use a pattern from the pattern vector for each >>>value of >>>>> >> the >>>>> >> string vector. The only way I can think of is "for" loop, which >>>can be >>>>> >> slow. >>>>> >> Also these are happening in a function I am writing. Just wonder >>>if >>>>> >> there is >>>>> >> another more efficient way. Thanks a lot. >>>>> >> >>>>> >> Jun >>>>> >> >>>>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter >>><bgunter.4...@gmail.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> Well, he did provide an example, and... >>>>> >>> >>>>> >>> >>>>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv') >>>>> >>> >>>>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z) >>>>> >>> [1] "WT.CUT" "tx" >>>>> >>> >>>>> >>> >>>>> >>> ## seems to do what was requested. >>>>> >>> >>>>> >>> Jeff would have to amplify on his initial statement however: do >>>you >>>>> >>> mean that separate patterns can always be combined via "|" ? Or >>>>> >>> something deeper? >>>>> >>> >>>>> >>> Cheers, >>>>> >>> Bert >>>>> >>> Bert Gunter >>>>> >>> >>>>> >>> "The trouble with having an open mind is that people keep coming >>>along >>>>> >>> and sticking things into it." >>>>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip >>>) >>>>> >>> >>>>> >>> >>>>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller >>>>> >>> <jdnew...@dcn.davis.ca.us> >>>>> >>> wrote: >>>>> >>> > Your opening assertion is false. >>>>> >>> > >>>>> >>> > Provide a reproducible example and someone will demonstrate. >>>>> >>> > -- >>>>> >>> > Sent from my phone. Please excuse my brevity. >>>>> >>> > >>>>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen >>>>> >>> > <jun.shen...@gmail.com> >>>>> >>> > wrote: >>>>> >>> >>Dear list, >>>>> >>> >> >>>>> >>> >>I have a vector of strings that cannot be described by one >>>pattern. >>>>> >>> >> So >>>>> >>> >>let's say I construct a vector of patterns in the same length >>>as the >>>>> >>> >>vector >>>>> >>> >>of strings, can I do the element wise pattern recognition and >>>string >>>>> >>> >>substitution. >>>>> >>> >> >>>>> >>> >>For example, >>>>> >>> >> >>>>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)" >>>>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)" >>>>> >>> >> >>>>> >>> >>patterns <- c(pattern1,pattern2) >>>>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv') >>>>> >>> >> >>>>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx" >>>from >>>>> >>> >> the >>>>> >>> >>second string. If I do >>>>> >>> >> >>>>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be >>>used. >>>>> >>> >> >>>>> >>> >>looping the patterns doesn't work the way I want. Appreciate >>>any >>>>> >>> >>comments. >>>>> >>> >>Thanks. >>>>> >>> >> >>>>> >>> >>Jun >>>>> >>> >> >>>>> >>> >> [[alternative HTML version deleted]] >>>>> >>> >> >>>>> >>> >>______________________________________________ >>>>> >>> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >>>see >>>>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help >>>>> >>> >>PLEASE do read the posting guide >>>>> >>> >>http://www.R-project.org/posting-guide.html >>>>> >>> >>and provide commented, minimal, self-contained, reproducible >>>code. >>>>> >>> > >>>>> >>> > ______________________________________________ >>>>> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >>>see >>>>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>>>> >>> > PLEASE do read the posting guide >>>>> >>> > http://www.R-project.org/posting-guide.html >>>>> >>> > and provide commented, minimal, self-contained, reproducible >>>code. >>>>> >> >>>>> >> >>>> >>>> >> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.