Yes but (I think -- I haven't confirmed) this basic escaping is being done by the DOM streaming. It at least is converting characters like 0xC to .

I'd have to look at the code and see how the XML is serialized... Most DOM streaming classes will encode entities somehow, so you shouldn't worry about it. But once we're at it, it doesn't make sense to build a DOM tree to output the XML -- it is much faster to simply serialize it directly to the output stream.

So, will I amend the patch in NUTCH-110 so it uses XMLSerializerHelper#toValidXmlText in place of #getLegalXml method?

Copy the method's contents. It doesn't really make sense to copy the entire class just for this method. Good luck.

D.

Reply via email to