All these methods do assume that you don't have nested <tag>'s, like so:
<tag><tag>foo</tag>useful stuff</tag>some garbage</tag> For that you would really need a true parser. So I would double-check to make sure this doesn't happen. Do you have any control on where those XML files are generated though? It sounds to me it might be easier to fix the utility generating those XML files, since it clearly is doing something wrong. On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote: > I assume <tag> is known. > > This removes any occurrence </tag>.*</tag> where .* does not > contain <tag> or </tag>. > > The regular expression, re, matches </tag>, then does a greedy > match (?U) for anything followed by </tag> but uses a zero > width lookahead subexpression (?=...) for the second </tag> > so that it it can be rematched again. gsubfn in package > gsubfn is like the usual gsub except that instead of > replacing the match with a string it passes the match > to function f and then replaces the match with the output > of f. See the gsubfn home page: > http://code.google.com/p/gsubfn/ > and vignette. Haris Skiadas Department of Mathematics and Computer Science Hanover College ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.