I'm using the command line and the Tika-app jar, as are the creators of Tika-895 and 914....so Java 6 with a latest Tika build; nothing more I suppose I got the JDK from Oracle's website. Does the problem not exist in an older version of Tika, or with a different version of Java that you know of?
John On Mon, Jul 16, 2012 at 7:34 AM, Jukka Zitting <jukka.zitt...@gmail.com> wrote: > Hi, > > On Sun, Jul 15, 2012 at 5:29 PM, John M <jfm.apa...@gmail.com> wrote: >> So, is it a bug in the SAX library: that the line >> "super.characters(new char[0], 0, 0);" in the XHTMLContentHandler >> should work (but doesn't)? > > Yes, or the SAX library you're using could treat that as a feature > (automatically ignoring empty content). > > What's the SAX library you're using to serialize the output from Tika? > You may also want to try the ToXMLContentHandler class in o.a.t.sax. > It can serialize SAX events and doesn't suffer from this problem. > > BR, > > Jukka Zitting
