[ 
https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090127#comment-13090127
 ] 

Nick Burch commented on TIKA-697:
---------------------------------

I've added a couple of test documents in r1161038.

I think from these that we want to look for the pattern "!<arch>\n" i.e. 21 3c 
61 72 63 68 3e 0a 

> Tika reports the content type of AR archives as "text/plain"
> ------------------------------------------------------------
>
>                 Key: TIKA-697
>                 URL: https://issues.apache.org/jira/browse/TIKA-697
>             Project: Tika
>          Issue Type: Bug
>         Environment: Linux (CentOS 5.6)
>            Reporter: PNS
>            Priority: Trivial
>
> The Tika.detect(InputStream) method returns "text/plain" for AR archives 
> created with the Linux "Create Archive" option of Nautilus (available via 
> right-clicking on a file).
> The Apache Commons Compress "autodetection" code of the ArchiveStreamFactory 
> looks at the first 12 bytes of the stream and correctly identifies the type 
> as AR.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to