[ https://issues.apache.org/jira/browse/TIKA-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoni Mylka updated TIKA-791: ------------------------------ Attachment: tika-791.zip A ZIP file with the patch and some test documents. They differ from the ones in test-documents folder in that they're are protected by a non-default password. The protectedFile.xlsx for instance ins protected with a default password. I made those example files myself. > Fix the detection of protected OOXML files > ------------------------------------------ > > Key: TIKA-791 > URL: https://issues.apache.org/jira/browse/TIKA-791 > Project: Tika > Issue Type: Improvement > Components: mime > Affects Versions: 1.1 > Environment: Windows 7 64 bit > Reporter: Antoni Mylka > Attachments: tika-791.zip > > > TIKA-437 patch allowed Tika to work with OOXML files protected with the > default VelvetSweatshop password. I feel there is room for improvement. > # The POIFSContainerDetector lies when it sees such a file. It should be able > to mark it as x-tika-ooxml > # The OOXMLParser can't work with such a file. It should: > ## If it's protected with the default password - it should be decrypted and > processed normally. > ## If it's protected with a non-default password - the file should be marked > as protected, no weird exceptions should appear. > Therefore I'd like to add an 'if' to POIFSContainerDetector which returns > x-tika-ooxml, and some code to OOXMLParser, which would be similar to the > code currently residing in OfficeParser. After this improvement both the > OfficeParser and the OOXMLParser will treat such files in the same way. > When I have that, I can add a hack in my application, which will say "If the > type is x-tika-ooxml and the name-based detection is a specialization of > ooxml, then use the name-based detection". This will be a workaround for the > fact that in MimeTypes, magic always trumps the name. With that, the > encrypted DOCX files will appear with the normal DOCX mimetype in my app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira