Pascal Essiembre created TIKA-2922:
--------------------------------------

             Summary: Regression issue with detecting .dotx and .xlam MS Office 
mime-types
                 Key: TIKA-2922
                 URL: https://issues.apache.org/jira/browse/TIKA-2922
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22
         Environment: N/A
            Reporter: Pascal Essiembre


After upgrading to 1.22, .dotx and .xlam files are no longer detected properly. 

They are now detected as:

 
{noformat}
.dotx -> vnd.ms-word.template.macroenabled.12
.xlam -> application/x-tika-ooxml{noformat}
 

They should be detected like they originally were: 
{noformat}
.dotx -> vnd.openxmlformats-officedocument.wordprocessingml.template
.xlam -> application/vnd.ms-excel.addin.macroenabled.12{noformat}
Reference: 
[https://docs.microsoft.com/en-us/previous-versions/office/office-2007-resource-kit/ee309278(v=office.12)]

It is happening in StreamingZipContainerDetector and ZipContainerDetectorBase.

I will submit a pull request shortly with the correct mapping.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to