[ 
https://issues.apache.org/jira/browse/TIKA-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166406#comment-15166406
 ] 

Antriksh Saxena commented on TIKA-1873:
---------------------------------------

[~gagravarr] You're right. My changes to the mimetypes.xml were such that it 
was parsing every office file as msword. Actually we made those changes based 
upon a pattern that we found in our dataset in our msword files. I realize it 
later that this might be a very specific case (with only our dataset) and the 
patterns that are already given in the mimetypes.xml do a much better job of 
classifying the different office files. Closing this issue!

> Test Cases failed when tika-mimetypes.xml is changed
> ----------------------------------------------------
>
>                 Key: TIKA-1873
>                 URL: https://issues.apache.org/jira/browse/TIKA-1873
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.13
>            Reporter: Antriksh Saxena
>              Labels: test
>
> The test cases were failing when tika was built after updating the 
> tika-mimetypes.xml. The failure logs are as follows.
> {code}
> TestContainerAwareDetector.testTruncatedFiles:395 
> expected:<application/x-tika-msoffice> but was:<application/msword>
>   TestMimeTypes.testOLE2Detection:138->assertTypeByData:1045 
> expected:<application/[x-tika-msoffice]> but was:<application/[msword]>
>   TestMimeTypes.testOldExcel:251->assertTypeByData:1045 
> expected:<application/[x-tika-msoffice]> but was:<application/[msword]>
>   TestMimeTypes.testVisioDetection:305->assertTypeByNameAndData:1071 
> expected:<application/[vnd.visio]> but was:<application/[msword]>
>   ExcelParserTest.testExcel95:320 expected:<application/[vnd.ms-excel]> but 
> was:<application/[msword]>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to