[ https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15528526#comment-15528526 ]
ASF GitHub Bot commented on TIKA-2099: -------------------------------------- GitHub user theobisproject opened a pull request: https://github.com/apache/tika/pull/135 Fix for TIKA-2099 A test file is provided. Unit test is missing because I don't know where it should be located You can merge this pull request into a Git repository by running: $ git pull https://github.com/theobisproject/tika TIKA-2099 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #135 ---- commit c911c3abc6875b4121d773b9ef8bf2d3093a2c35 Author: Robin Schimpf <theobisproj...@gmail.com> Date: 2016-09-28T05:53:35Z Fix for TIKA-2099 A test file is provided. Unit test is missing because I don't know where it should be located ---- > Tar files without magic bytes are sporadically detected as text > --------------------------------------------------------------- > > Key: TIKA-2099 > URL: https://issues.apache.org/jira/browse/TIKA-2099 > Project: Tika > Issue Type: Bug > Affects Versions: 1.11 > Reporter: Robin Schimpf > > When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. > Everything seems to work file if the tar contains Microsoft Office files. But > when only text files are contained Tika sporadically recognices it as > text/plain. It also seems to depend on the size of the first file in the tar. > This has to be several KB big. > The problem was found in version 1.11 and also exists in the latest > 1.14-SNAPSHOT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)