[ https://issues.apache.org/jira/browse/TIKA-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-578. -------------------------------- Resolution: Fixed Fix Version/s: 0.9 Assignee: Jukka Zitting Good point, thanks! Fixed in revision 1060818. > XMLParser ContentHandler: multiple endDocument calls > ---------------------------------------------------- > > Key: TIKA-578 > URL: https://issues.apache.org/jira/browse/TIKA-578 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.8 > Environment: N/A > Reporter: Scott Severtson > Assignee: Jukka Zitting > Fix For: 0.9 > > > When supplying a ContentHandler to a XMLParser instance, the ContentHandler's > .endDocument() method is called twice; once by the SAXParser created within > XMLParser, once explicitly by XMLParser itself. > Sample code: > --- > InputStream inputStream = ... > XMLParser parser = new DcXMLParser(); > ParseContext context = new ParseContext(); > Metadata metadata = new Metadata(); > DOMResult result = new DOMResult(); > TransformerHandler transformerHandler = ((SAXTransformerFactory) > SAXTransformerFactory.newInstance()).newTransformerHandler(); > transformerHandler.setResult(result); > parser.parse(inputStream, transformerHandler, metadata, context); > --- > The following exception is produced: > --- > java.util.EmptyStackException > at java.util.Stack.peek(Stack.java:85) > at java.util.Stack.pop(Stack.java:67) > at > com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143) > at > com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181) > at > com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231) > at > org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115) > at > org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212) > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71) > ... > --- > We have worked around the issue temporarily by passing in a ContentHandler > that eats the first .endDocument() call, and allows the second to go through. > However, we believe XMLParser should hide the extraneous .endDocument() call > internally. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.