[ https://issues.apache.org/jira/browse/TIKA-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850416#comment-13850416 ]
Uwe Schindler commented on TIKA-1211: ------------------------------------- There are multiple ways to fix this: - Make XHTMLContentHandler prevent multiple startDocument() events. I think thats easiest and most correct. XHTMLContentHandler already has some magic in there. - Add an additional contenthandler that removes subsequent startDocuments (this is the same as above, just in a separate handler) > OpenDocument (ODF) parser produces multiple startDocument() events > ------------------------------------------------------------------ > > Key: TIKA-1211 > URL: https://issues.apache.org/jira/browse/TIKA-1211 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.4 > Reporter: Uwe Schindler > > Related to SOLR-4809: Solr receives multiple startDocument events when > parsing OpenDocumentFiles. > The parser already prevents multiple endDocuments, but not multiple > startDocuments. > The bug was introduced when we added parsing content.xml and meta.xml > (TIKA-736, but both feed elements to the XHTML output, so we get multiple > start/endDocuments). -- This message was sent by Atlassian JIRA (v6.1.4#6159)