[ 
https://issues.apache.org/jira/browse/TIKA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704846#comment-14704846
 ] 

Martin Petricek commented on TIKA-1717:
---------------------------------------

I've reported the bug as COMPRESS-321

Not sure whether there would be some action on side of Tika on this bug beside 
changing the dependency version for commons compress once they fix it ...
(Tika maybe could catch exceptions from underlying libraries and in this case 
either return default "application/octet-stream" as detected type, or perhaps 
rethrow some DetectionErrorException?)

> Tika throws exception on detecting content-type of a zip file
> -------------------------------------------------------------
>
>                 Key: TIKA-1717
>                 URL: https://issues.apache.org/jira/browse/TIKA-1717
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Martin Petricek
>
> When trying to detect content type of a zip file with Tika 1.10 in manner 
> like this:
> {code}
>         byte[] content = ... // whole zip file.
>         String name = "TR_01.ZIP";
>         Tika tika = new Tika();
>         return tika.detect(content, name);
> {code}
> it throws an exception:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 13
>       at 
> org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromLocalFileData(X7875_NewUnix.java:199)
>       at 
> org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromCentralDirectoryData(X7875_NewUnix.java:220)
>       at 
> org.apache.commons.compress.archivers.zip.ExtraFieldUtils.parse(ExtraFieldUtils.java:174)
>       at 
> org.apache.commons.compress.archivers.zip.ZipArchiveEntry.setCentralDirectoryExtra(ZipArchiveEntry.java:476)
>       at 
> org.apache.commons.compress.archivers.zip.ZipFile.readCentralDirectoryEntry(ZipFile.java:575)
>       at 
> org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:492)
>       at 
> org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:216)
>       at 
> org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:192)
>       at 
> org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:153)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:141)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
>       at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
>       at org.apache.tika.Tika.detect(Tika.java:155)
>       at org.apache.tika.Tika.detect(Tika.java:183)
>       at org.apache.tika.Tika.detect(Tika.java:223)
> {code}
> The zip file does contain two .jpg images and is not a "special" (JAR, 
> Openoffice, ... ) zip file.
> Unfortunately, the contents of the zip file is confidential and so I cannot 
> attach it to this ticket as it is, although I can provide the parameters 
> supplied to
> org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromLocalFileData(X7875_NewUnix.java:199)
>  as caught by the debugger:
> {code}
> data = {byte[13]@2103}
>  0 = 85
>  1 = 84
>  2 = 5
>  3 = 0
>  4 = 7
>  5 = -112
>  6 = -108
>  7 = 51
>  8 = 85
>  9 = 117
>  10 = 120
>  11 = 0
>  12 = 0
> offset = 13
> length = 0
> {code}
> ... it seems the method tries to read more bytes than is actually available 
> in the buffer.
> Note that 7zip and unzip can unzip the file without even a warning, so it 
> does not seem like a corrupted file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to