Tim Allison created TIKA-4692:
---------------------------------

             Summary: Move PPTX and DOCX SAX parsers closer to parity with DOM 
based parsers
                 Key: TIKA-4692
                 URL: https://issues.apache.org/jira/browse/TIKA-4692
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


SAX processing of pptx and docx is more robust in cases where elements are 
embedded more than usual. We should work to promote the sax parsers so that we 
can have parity with features extracted by the DOM based parsers.

Ideally, we'd move to using sax based as the default in 4.x on a separate 
ticket), but we should back port the updates to 3.x as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to