Hi, On Sun, Jul 15, 2012 at 5:29 PM, John M <jfm.apa...@gmail.com> wrote: > So, is it a bug in the SAX library: that the line > "super.characters(new char[0], 0, 0);" in the XHTMLContentHandler > should work (but doesn't)?
Yes, or the SAX library you're using could treat that as a feature (automatically ignoring empty content). What's the SAX library you're using to serialize the output from Tika? You may also want to try the ToXMLContentHandler class in o.a.t.sax. It can serialize SAX events and doesn't suffer from this problem. BR, Jukka Zitting
