XMLParser ContentHandler: multiple endDocument calls ----------------------------------------------------
Key: TIKA-578 URL: https://issues.apache.org/jira/browse/TIKA-578 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.8 Environment: N/A Reporter: Scott Severtson When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument() method is called twice; once by the SAXParser created within XMLParser, once explicitly by XMLParser itself. Sample code: --- InputStream inputStream = ... XMLParser parser = new DcXMLParser(); ParseContext context = new ParseContext(); Metadata metadata = new Metadata(); DOMResult result = new DOMResult(); TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler(); transformerHandler.setResult(result); parser.parse(inputStream, transformerHandler, metadata, context); --- The following exception is produced: --- java.util.EmptyStackException at java.util.Stack.peek(Stack.java:85) at java.util.Stack.pop(Stack.java:67) at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143) at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231) at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212) at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71) ... --- We have worked around the issue temporarily by passing in a ContentHandler that eats the first .endDocument() call, and allows the second to go through. However, we believe XMLParser should hide the extraneous .endDocument() call internally. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.