I am trying to read a number of XML files using xmlTreeParse(). Unfortunately,
some of them are malformed in a way that makes R crash. The problem is that
closing tags are sometimes repeated like this:

<tag>value1</tag><tag>value2</tag>some garbage</tag></tag><tag>value3</tag>

I want to preprocess the contents of the XML file using gsub() before feeding
them to xmlTreeParse() to clean them up, but I can't figure out how to do it.
What I need is something that transforms the example above into:

<tag>value1</tag><tag>value2</tag><tag>value3</tag>

Some kind of "</tag>.*</tag>" that only matches if there is no "<tag>" in ".*".

Thanks in advance for you ideas,

Uli

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to