[ https://issues.apache.org/jira/browse/TIKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Seva Alekseyev updated TIKA-2161: --------------------------------- Description: On the attached Powerpoint file, which opens fine with Powerpoint, the Tika parser throws the following error: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input stream at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at java.nio.file.Files.copy(Files.java:2908) at java.nio.file.Files.copy(Files.java:3027) at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587) at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:377) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:443) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:140) at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:116) at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:368) at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:138) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) Caused by: java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) ... 22 more EDIT: Tika 1.14 throws EOFException was: On the attached Powerpoint file, which opens fine with Powerpoint, the Tika parser throws the following error: org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input stream at org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at java.nio.file.Files.copy(Files.java:2908) at java.nio.file.Files.copy(Files.java:3027) at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587) at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:377) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:443) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:140) at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:116) at org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:368) at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:138) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) Caused by: java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) ... 22 more > EOFException on a valid Powerpoint file > --------------------------------------- > > Key: TIKA-2161 > URL: https://issues.apache.org/jira/browse/TIKA-2161 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.13 > Environment: Windows 7 x64, JVM 1.8.0_101 > Reporter: Seva Alekseyev > Attachments: Erik-LymeChipBranchSeminar.ppt > > > On the attached Powerpoint file, which opens fine with Powerpoint, the Tika > parser throws the following error: > org.apache.tika.io.TaggedIOException: Unexpected end of ZLIB input stream > at > org.apache.tika.io.TaggedInputStream.handleIOException(TaggedInputStream.java:133) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:103) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at java.io.FilterInputStream.read(FilterInputStream.java:107) > at java.nio.file.Files.copy(Files.java:2908) > at java.nio.file.Files.copy(Files.java:3027) > at org.apache.tika.io.TikaInputStream.getPath(TikaInputStream.java:587) > at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:615) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:377) > at > org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:443) > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) > at > org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) > at > org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:140) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedResource(AbstractPOIFSExtractor.java:116) > at > org.apache.tika.parser.microsoft.HSLFExtractor.handleSlideEmbeddedResources(HSLFExtractor.java:368) > at > org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:138) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) > Caused by: java.io.EOFException: Unexpected end of ZLIB input stream > at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at > org.apache.poi.util.BoundedInputStream.read(BoundedInputStream.java:121) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) > at java.io.BufferedInputStream.read(BufferedInputStream.java:345) > at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) > ... 22 more > EDIT: Tika 1.14 throws EOFException -- This message was sent by Atlassian JIRA (v6.3.4#6332)