Stephen H created TIKA-4453:
-------------------------------
Summary: ForkParser fails on documents with more than 100 embedded
documents
Key: TIKA-4453
URL: https://issues.apache.org/jira/browse/TIKA-4453
Project: Tika
Issue Type: Bug
Components: core
Affects Versions: 3.2.1
Reporter: Stephen H
Attachments: forkparser-patch.txt
ForkParser uses RecursiveMetadataContentHandlerProxy, which overrides
endEmbeddedDocument() but does not call the superclass method. Because of this,
the embeddedDepth in AbstractRecursiveParserWrapperHandler gets incremented
with each new embedded document but never decremented. Once it hits 100
embedded documents and the maximum depth a SAXException is thrown by
AbstractRecursiveParserWrapperHandler startEmbeddedDocument().
The attached patch adds a new method to AbstractRecursiveParserWrapperHandler
to decrement the depth which is called by RecursiveMetadataContentHandlerProxy
endEmbeddedDocument(). There is a new ForkParser test for a document with 110
embedded documents.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)