[ https://issues.apache.org/jira/browse/NUTCH-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517997#comment-16517997 ]
Markus Jelsma commented on NUTCH-2606: -------------------------------------- Ah, this is interesting. Nutch indeed believes it is a Word document, but my browser agrees and opens a word processor. Only cli command file correctly identifies it as plain text. > MIME detection is wrong for plain-text documents send as Content-Type > "application/msword" > ------------------------------------------------------------------------------------------ > > Key: NUTCH-2606 > URL: https://issues.apache.org/jira/browse/NUTCH-2606 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.14 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.16 > > > Plain-text documents send as Content-Type "application/msword" are tried to > parse as Word documents. The MIME detection should be fixed, so that these > are correctly identified as plain-text documents. See NUTCH-2603 and > https://www.atnf.csiro.au/computing/software/gipsy/doc/update.doc -- This message was sent by Atlassian JIRA (v7.6.3#76005)