[ https://issues.apache.org/jira/browse/NUTCH-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13860803#comment-13860803 ]
Tejas Patil commented on NUTCH-1454: ------------------------------------ TIKA-1122 is fixed and I have verified that 'parsechecker' works fine with the same. Upgrading to Tika 1.5 (yet to be released) should fix this for Nutch. > parsing chm failed > ------------------ > > Key: NUTCH-1454 > URL: https://issues.apache.org/jira/browse/NUTCH-1454 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.5.1 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.9 > > > (reported by Jan Riewe, see > http://lucene.472066.n3.nabble.com/CHM-Files-and-Tika-td3999735.html) > Nutch fails to parse chm files with > {quote} > ERROR tika.TikaParser - Can't retrieve Tika parser for mime-type > application/vnd.ms-htmlhelp > {quote} > Tested with chm test files from Tika: > {code} > % bin/nutch parsechecker > file:/.../tika/trunk/tika-parsers/src/test/resources/test-documents/testChm.chm > {code} > Tika parses this document (but does not extract any content). -- This message was sent by Atlassian JIRA (v6.1.5#6160)