[ https://issues.apache.org/jira/browse/TIKA-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166406#comment-15166406 ]
Antriksh Saxena commented on TIKA-1873: --------------------------------------- [~gagravarr] You're right. My changes to the mimetypes.xml were such that it was parsing every office file as msword. Actually we made those changes based upon a pattern that we found in our dataset in our msword files. I realize it later that this might be a very specific case (with only our dataset) and the patterns that are already given in the mimetypes.xml do a much better job of classifying the different office files. Closing this issue! > Test Cases failed when tika-mimetypes.xml is changed > ---------------------------------------------------- > > Key: TIKA-1873 > URL: https://issues.apache.org/jira/browse/TIKA-1873 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Reporter: Antriksh Saxena > Labels: test > > The test cases were failing when tika was built after updating the > tika-mimetypes.xml. The failure logs are as follows. > {code} > TestContainerAwareDetector.testTruncatedFiles:395 > expected:<application/x-tika-msoffice> but was:<application/msword> > TestMimeTypes.testOLE2Detection:138->assertTypeByData:1045 > expected:<application/[x-tika-msoffice]> but was:<application/[msword]> > TestMimeTypes.testOldExcel:251->assertTypeByData:1045 > expected:<application/[x-tika-msoffice]> but was:<application/[msword]> > TestMimeTypes.testVisioDetection:305->assertTypeByNameAndData:1071 > expected:<application/[vnd.visio]> but was:<application/[msword]> > ExcelParserTest.testExcel95:320 expected:<application/[vnd.ms-excel]> but > was:<application/[msword]> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)