[ 
https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989999#comment-15989999
 ] 

ASF GitHub Bot commented on TIKA-2099:
--------------------------------------

theobisproject closed pull request #135: Fix for TIKA-2099
URL: https://github.com/apache/tika/pull/135
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Tar files without magic bytes are sporadically detected as text
> ---------------------------------------------------------------
>
>                 Key: TIKA-2099
>                 URL: https://issues.apache.org/jira/browse/TIKA-2099
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.11
>            Reporter: Robin Schimpf
>            Assignee: Tim Allison
>             Fix For: 1.15
>
>
> When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. 
> Everything seems to work file if the tar contains Microsoft Office files. But 
> when only text files are contained Tika sporadically recognices it as 
> text/plain. It also seems to depend on the size of the first file in the tar. 
> This has to be several KB big.
> The problem was found in version 1.11 and also exists in the latest 
> 1.14-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to