[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158070#comment-13158070 ]
Nick Burch commented on TIKA-697: --------------------------------- Thanks for this I've tweaked the existing mime magic in r1206896, which should now correctly detect the file format (the previous one had an eronious = at the start, and lacked the \n). I've also added the alternate extension and alternate mimetype In r1206898 I've also added mime magic for .deb, based on the working one for archive. Ideally we should also add a very small .deb file to the test suite > Tika reports the content type of AR archives as "text/plain" > ------------------------------------------------------------ > > Key: TIKA-697 > URL: https://issues.apache.org/jira/browse/TIKA-697 > Project: Tika > Issue Type: Bug > Environment: Linux (CentOS 5.6) > Reporter: PNS > Priority: Trivial > Fix For: 1.1 > > Attachments: tika-697.diff > > > The Tika.detect(InputStream) method returns "text/plain" for AR archives > created with the Linux "Create Archive" option of Nautilus (available via > right-clicking on a file). > The Apache Commons Compress "autodetection" code of the ArchiveStreamFactory > looks at the first 12 bytes of the stream and correctly identifies the type > as AR. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira