Jun:

"My problem is the pattern has to be dynamically constructed on the
input data of the function "

What does that mean? How can a pattern be "dynamically constructed"
when you have not made clear (at least to me, perhaps also to yourself
and/or others) *how* it is to be constructed?

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 6, 2016 at 8:56 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> Jeff:
>
> Not sure what you meant by this:
>
> "There is no other reason to put parentheses in the pattern... they
> are not grouping symbols."
>
> ... but in fact, from ?regexp
>
> "Repetition takes precedence over concatenation, which in turn takes
> precedence over alternation. A whole subexpression may be enclosed in
> parentheses to override these precedence rules. "
>
> So parentheses *are* in fact "grouping symbols."
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Sep 6, 2016 at 5:18 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> 
> wrote:
>> I am not near my computer today, but each parenthesis gets its own result 
>> number, so you should put the parenthesis around the whole pattern of 
>> alternatives instead of having many parentheses.
>>
>> I recommend thinking in terms of what common information you expect to find 
>> in these various strings, and place your parentheses to capture that 
>> information. There is no other reason to put parentheses in the pattern... 
>> they are not grouping symbols.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On September 6, 2016 5:01:04 PM PDT, Bert Gunter <bgunter.4...@gmail.com> 
>> wrote:
>>>Jun:
>>>
>>>1. Tell us your desired result from your test vector and maybe someone
>>>will help.
>>>
>>>2. As we played this game once already (you couldn't do it; I showed
>>>you how), this seems to be a function of your limitations with regular
>>>expressions. I'm probably not much better, but in any case, I don't
>>>intend to be your consultant. See if you can find someone locally to
>>>help you if you do not receive a satisfactory reply from the list.
>>>There are many people here who are pretty good at this sort of thing,
>>>but I don't know if they'll reply. Regex's are certainly complex. PERL
>>>people tend to be pretty good at them, I believe. There are numerous
>>>web sites and books on them if you need to acquire expertise for your
>>>work.
>>>
>>>Cheers,
>>>Bert
>>>Bert Gunter
>>>
>>>"The trouble with having an open mind is that people keep coming along
>>>and sticking things into it."
>>>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>>On Tue, Sep 6, 2016 at 3:59 PM, Jun Shen <jun.shen...@gmail.com> wrote:
>>>> Hi Bert,
>>>>
>>>> I still couldn't make the multiple patterns to work. Here is an
>>>example. I
>>>> make the pattern as follows
>>>>
>>>> final.pattern <-
>>>>
>>>"(240\\.m\\.g)\\.(>50-70\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>50-70\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>70-90\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>70-90\\.kg)\\.(.*)|(240\\.m\\.g)\\.(>90-110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>90-110\\.kg)\\.(.*)|(240\\.m\\.g)\\.(50\\.kg\\.or\\.less)\\.(.*)|(3\\.mg\\.kg)\\.(50\\.kg\\.or\\.less)\\.(.*)|(240\\.m\\.g)\\.(>110\\.kg)\\.(.*)|(3\\.mg\\.kg)\\.(>110\\.kg)\\.(.*)"
>>>>
>>>> test.string <- c('240.m.g.>110.kg.geo.mean', '3.mg.kg.>110.kg.P05',
>>>> '240.m.g.>50-70.kg.geo.mean')
>>>>
>>>> sub(final.pattern, '\\1', test.string)
>>>> sub(final.pattern, '\\2', test.string)
>>>> sub(final.pattern, '\\3', test.string)
>>>>
>>>> Only the third string has been correctly parsed, which matches the
>>>first
>>>> pattern. It seems the rest of the patterns are not called.
>>>>
>>>> Jun
>>>>
>>>>
>>>> On Mon, Sep 5, 2016 at 10:21 PM, Bert Gunter <bgunter.4...@gmail.com>
>>>wrote:
>>>>>
>>>>> Just noticed: My clumsy do.call() line in my previously posted code
>>>>> below should be replaced with:
>>>>> pat <- paste(pat,collapse = "|")
>>>>>
>>>>>
>>>>> > pat <- c(pat1,pat2)
>>>>> > paste(pat,collapse="|")
>>>>> [1] "a+\\.*a+|b+\\.*b+"
>>>>>
>>>>> ************ replace this **************************
>>>>> > pat <- do.call(paste,c(as.list(pat), sep="|"))
>>>>> ********************************************
>>>>> > sub(paste0("^[^b]*(",pat,").*$"),"\\1",z)
>>>>> [1] "a.a"   "bb"    "b.bbb"
>>>>>
>>>>>
>>>>> -- Bert
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming
>>>along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Mon, Sep 5, 2016 at 12:11 PM, Bert Gunter
>>><bgunter.4...@gmail.com>
>>>>> wrote:
>>>>> > Jun:
>>>>> >
>>>>> > You need to provide a clear specification via regular expressions
>>>of
>>>>> > the patterns you wish to match -- at least for me to decipher it.
>>>>> > Others may be smarter than I, though...
>>>>> >
>>>>> > Jeff: Thanks. I have now convinced myself that it can be done (a
>>>>> > "proof" of sorts): If pat1, pat2,..., patn are m different
>>>patterns
>>>>> > (in a vector of patterns)  to be matched in a vector of n strings,
>>>>> > where only one of the patterns will match in any string,  then use
>>>>> > paste() (probably via do.call()) or otherwise to paste them
>>>together
>>>>> > separated by "|" to form the concatenated pattern, pat. Then
>>>>> >
>>>>> > sub(paste0("^.*(",pat, ").*$"),"\\1",thevector)
>>>>> >
>>>>> > should extract the matching pattern in each (perhaps with a little
>>>>> > fiddling due to precedence rules); e.g.
>>>>> >
>>>>> >> z <-c(".fg.h.g.a.a", "bb..dd.ef.tgf.", "foo...b.bbb.tgy")
>>>>> >
>>>>> >> pat1 <- "a+\\.*a+"
>>>>> >> pat2 <-"b+\\.*b+"
>>>>> >> pat <- c(pat1,pat2)
>>>>> >
>>>>> >> pat <- do.call(paste,c(as.list(pat), sep="|"))
>>>>> >> pat
>>>>> > [1] "a+\\.*a+|b+\\.*b+"
>>>>> >
>>>>> >> sub(paste0("^[^b]*(",pat,").*$"), "\\1", z)
>>>>> > [1] "a.a"   "bb"    "b.bbb"
>>>>> >
>>>>> > Cheers,
>>>>> > Bert
>>>>> >
>>>>> >
>>>>> > Bert Gunter
>>>>> >
>>>>> > "The trouble with having an open mind is that people keep coming
>>>along
>>>>> > and sticking things into it."
>>>>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>> >
>>>>> >
>>>>> > On Mon, Sep 5, 2016 at 9:56 AM, Jun Shen <jun.shen...@gmail.com>
>>>wrote:
>>>>> >> Thanks for the reply, Bert.
>>>>> >>
>>>>> >> Your solution solves the example. I actually have a more general
>>>>> >> situation
>>>>> >> where I have this dot concatenated string from multiple
>>>variables. The
>>>>> >> problem is those variables may have values with dots in there.
>>>The
>>>>> >> number of
>>>>> >> dots are not consistent for all values of a variable. So I am
>>>thinking
>>>>> >> to
>>>>> >> define a vector of patterns for the vector of the string and
>>>hopefully
>>>>> >> to
>>>>> >> find a way to use a pattern from the pattern vector for each
>>>value of
>>>>> >> the
>>>>> >> string vector. The only way I can think of is "for" loop, which
>>>can be
>>>>> >> slow.
>>>>> >> Also these are happening in a function I am writing. Just wonder
>>>if
>>>>> >> there is
>>>>> >> another more efficient way. Thanks a lot.
>>>>> >>
>>>>> >> Jun
>>>>> >>
>>>>> >> On Mon, Sep 5, 2016 at 1:41 AM, Bert Gunter
>>><bgunter.4...@gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Well, he did provide an example, and...
>>>>> >>>
>>>>> >>>
>>>>> >>> > z <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>>> >>>
>>>>> >>> > sub("^.+?\\.(.+)\\.[^.]+$","\\1",z)
>>>>> >>> [1] "WT.CUT" "tx"
>>>>> >>>
>>>>> >>>
>>>>> >>> ## seems to do what was requested.
>>>>> >>>
>>>>> >>> Jeff would have to amplify on his initial statement however: do
>>>you
>>>>> >>> mean that separate patterns can always be combined via "|" ?  Or
>>>>> >>> something deeper?
>>>>> >>>
>>>>> >>> Cheers,
>>>>> >>> Bert
>>>>> >>> Bert Gunter
>>>>> >>>
>>>>> >>> "The trouble with having an open mind is that people keep coming
>>>along
>>>>> >>> and sticking things into it."
>>>>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
>>>)
>>>>> >>>
>>>>> >>>
>>>>> >>> On Sun, Sep 4, 2016 at 9:30 PM, Jeff Newmiller
>>>>> >>> <jdnew...@dcn.davis.ca.us>
>>>>> >>> wrote:
>>>>> >>> > Your opening assertion is false.
>>>>> >>> >
>>>>> >>> > Provide a reproducible example and someone will demonstrate.
>>>>> >>> > --
>>>>> >>> > Sent from my phone. Please excuse my brevity.
>>>>> >>> >
>>>>> >>> > On September 4, 2016 9:06:59 PM PDT, Jun Shen
>>>>> >>> > <jun.shen...@gmail.com>
>>>>> >>> > wrote:
>>>>> >>> >>Dear list,
>>>>> >>> >>
>>>>> >>> >>I have a vector of strings that cannot be described by one
>>>pattern.
>>>>> >>> >> So
>>>>> >>> >>let's say I construct a vector of patterns in the same length
>>>as the
>>>>> >>> >>vector
>>>>> >>> >>of strings, can I do the element wise pattern recognition and
>>>string
>>>>> >>> >>substitution.
>>>>> >>> >>
>>>>> >>> >>For example,
>>>>> >>> >>
>>>>> >>> >>pattern1 <- "([^.]*)\\.([^.]*\\.[^.]*)\\.(.*)"
>>>>> >>> >>pattern2 <- "([^.]*)\\.([^.]*)\\.(.*)"
>>>>> >>> >>
>>>>> >>> >>patterns <- c(pattern1,pattern2)
>>>>> >>> >>strings <- c('TX.WT.CUT.mean','mg.tx.cv')
>>>>> >>> >>
>>>>> >>> >>Say I want to extract "WT.CUT" from the first string and "tx"
>>>from
>>>>> >>> >> the
>>>>> >>> >>second string. If I do
>>>>> >>> >>
>>>>> >>> >>sub(patterns, '\\2', strings), only the first pattern will be
>>>used.
>>>>> >>> >>
>>>>> >>> >>looping the patterns doesn't work the way I want. Appreciate
>>>any
>>>>> >>> >>comments.
>>>>> >>> >>Thanks.
>>>>> >>> >>
>>>>> >>> >>Jun
>>>>> >>> >>
>>>>> >>> >>       [[alternative HTML version deleted]]
>>>>> >>> >>
>>>>> >>> >>______________________________________________
>>>>> >>> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>see
>>>>> >>> >>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> >>> >>PLEASE do read the posting guide
>>>>> >>> >>http://www.R-project.org/posting-guide.html
>>>>> >>> >>and provide commented, minimal, self-contained, reproducible
>>>code.
>>>>> >>> >
>>>>> >>> > ______________________________________________
>>>>> >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
>>>see
>>>>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> >>> > PLEASE do read the posting guide
>>>>> >>> > http://www.R-project.org/posting-guide.html
>>>>> >>> > and provide commented, minimal, self-contained, reproducible
>>>code.
>>>>> >>
>>>>> >>
>>>>
>>>>
>>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to