Robin Schimpf created TIKA-2099:
-----------------------------------

             Summary: Tar files without magic bytes are sporadically detected 
as text
                 Key: TIKA-2099
                 URL: https://issues.apache.org/jira/browse/TIKA-2099
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.11
            Reporter: Robin Schimpf


When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. 
Everything seems to work file if the tar contains Microsoft Office files. But 
when only text files are contained Tika sporadically recognices it as 
text/plain. It also seems to depend on the size of the first file in the tar. 
This has to be several KB big.
The problem was found in version 1.11 and also exists in the latest 
1.14-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to