I am trying to read a number of XML files using xmlTreeParse(). Unfortunately, some of them are malformed in a way that makes R crash. The problem is that closing tags are sometimes repeated like this:
<tag>value1</tag><tag>value2</tag>some garbage</tag></tag><tag>value3</tag> I want to preprocess the contents of the XML file using gsub() before feeding them to xmlTreeParse() to clean them up, but I can't figure out how to do it. What I need is something that transforms the example above into: <tag>value1</tag><tag>value2</tag><tag>value3</tag> Some kind of "</tag>.*</tag>" that only matches if there is no "<tag>" in ".*". Thanks in advance for you ideas, Uli ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.