Look at the discussion of zero width lookahead assertions in ?regex . Use perl = TRUE as previously indicated.
On Sun, Jul 6, 2008 at 7:29 PM, Duncan Murdoch <[EMAIL PROTECTED]> wrote: > On 06/07/2008 5:37 PM, (Ted Harding) wrote: >> >> On 06-Jul-08 21:17:04, Duncan Murdoch wrote: >>> >>> I'm trying to write a gsub() call that takes a string and escapes all the >>> unescaped quote marks in it. So the string >>> >>> \" >>> >>> would be left unchanged, but >>> >>> \\" >>> >>> would be changed to >>> >>> \\\" >>> >>> because the double backslash doesn't act as an escape for the quote, >>> the first just escapes the second. I have the usual problems of >>> writing regular expressions involving backslashes which make >>> everything I write completely unreadable, so I'm going to change >>> the problem for this post: I will define E to be the escape >>> character, and q to be the quote; the gsub() call would leave >>> >>> Eq >>> >>> unchanged, but would change >>> >>> EEq >>> >>> to EEEq, etc. >>> >>> The expression I have come up with after this change is >>> >>> gsub( "((^|[^E])(EE)*)q", "\\1Eq", x) >>> >>> i.e. "(start of line, or non-escape, followed by an even number of >>> escapes), all of which we call expression 1, followed by a quote, >>> is replaced by expression 1 followed by an escape and a quote". >>> >>> This works sometimes, but not always: >>> >>> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq") >>> [1] "Eq" >>> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq") >>> [1] "EEEq" >>> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq") >>> [1] "EqaEq" >>> > gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq") >>> [1] "qEq" >>> >>> Notice that in the final example, the first quote doesn't get escaped. >>> Why not???? >> >> I think (without having done the "experimental diagnostics") >> that it's because in "qq" the first q mtaches (^|[^E]) because >> it matches [^E] (i.e. is a "non-escape"); since it is followed >> by q, it is the second q which gets the escape. Possibly you >> need to include "^q" as an additional alternative match at the >> start of the line. > > Thanks, that sounds right, but now I can't see how to fix it. Is there > syntax to say: match A only if it follows B, but don't match the B part? > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.