[ https://issues.apache.org/jira/browse/TIKA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236923#comment-15236923 ]
Nick Burch commented on TIKA-1717: ---------------------------------- I've opened TIKA-1949 to track the upgrade > Tika throws exception on detecting content-type of a zip file > ------------------------------------------------------------- > > Key: TIKA-1717 > URL: https://issues.apache.org/jira/browse/TIKA-1717 > Project: Tika > Issue Type: Bug > Reporter: Martin Petricek > > When trying to detect content type of a zip file with Tika 1.10 in manner > like this: > {code} > byte[] content = ... // whole zip file. > String name = "TR_01.ZIP"; > Tika tika = new Tika(); > return tika.detect(content, name); > {code} > it throws an exception: > {code} > java.lang.ArrayIndexOutOfBoundsException: 13 > at > org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromLocalFileData(X7875_NewUnix.java:199) > at > org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromCentralDirectoryData(X7875_NewUnix.java:220) > at > org.apache.commons.compress.archivers.zip.ExtraFieldUtils.parse(ExtraFieldUtils.java:174) > at > org.apache.commons.compress.archivers.zip.ZipArchiveEntry.setCentralDirectoryExtra(ZipArchiveEntry.java:476) > at > org.apache.commons.compress.archivers.zip.ZipFile.readCentralDirectoryEntry(ZipFile.java:575) > at > org.apache.commons.compress.archivers.zip.ZipFile.populateFromCentralDirectory(ZipFile.java:492) > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:216) > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:192) > at > org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:153) > at > org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:141) > at > org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) > at org.apache.tika.Tika.detect(Tika.java:155) > at org.apache.tika.Tika.detect(Tika.java:183) > at org.apache.tika.Tika.detect(Tika.java:223) > {code} > The zip file does contain two .jpg images and is not a "special" (JAR, > Openoffice, ... ) zip file. > Unfortunately, the contents of the zip file is confidential and so I cannot > attach it to this ticket as it is, although I can provide the parameters > supplied to > org.apache.commons.compress.archivers.zip.X7875_NewUnix.parseFromLocalFileData(X7875_NewUnix.java:199) > as caught by the debugger: > {code} > data = {byte[13]@2103} > 0 = 85 > 1 = 84 > 2 = 5 > 3 = 0 > 4 = 7 > 5 = -112 > 6 = -108 > 7 = 51 > 8 = 85 > 9 = 117 > 10 = 120 > 11 = 0 > 12 = 0 > offset = 13 > length = 0 > {code} > ... it seems the method tries to read more bytes than is actually available > in the buffer. > Note that 7zip and unzip can unzip the file without even a warning, so it > does not seem like a corrupted file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)