[ https://issues.apache.org/jira/browse/TIKA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158458#comment-13158458 ]
Nick Burch commented on TIKA-790: --------------------------------- Fixed in r1207196. The duplicated detection code between OfficeParser and POIFSContainerDetector was removed by following the pattern from TIKA-791 and adding a type for OLE10Native, then pushing the rest of the detection work to POIFSContainerDetector. POIFSDocumentType in OfficeParser still offers detection (used mostly in the handling of embedded files), but it delegates all the work to POIFSContainerDetector. > Reduce duplication between POIFSDocumentType (in OfficeParser) and > POIFSContainerDetector > ----------------------------------------------------------------------------------------- > > Key: TIKA-790 > URL: https://issues.apache.org/jira/browse/TIKA-790 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.0 > Reporter: Nick Burch > Assignee: Nick Burch > Fix For: 1.1 > > > For historical reasons, we now have two parts of Tika that handle trying to > identify the type of an OLE2 based file. > POIFSDocumentType is able to detect a few kinds of files that > POIFSContainerDetector is not able to (eg Encrypted and OLE Native), mostly > which may not map well onto mimetypes. POIFSDocumentType also lacks some of > the logic in the main detector, and only does the office parser supported > files > We should probably try to reduce the duplication. One option is to add the > extra few types into the Detector some how, the other is to use the detector > first and do additional specific checks after -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira