[ https://issues.apache.org/jira/browse/TIKA-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391404#comment-16391404 ]
Tomasz L edited comment on TIKA-2530 at 3/8/18 3:36 PM: -------------------------------------------------------- Hi, I have similar issue in my project error {code:java} Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.executable.ExecutableParser@271f18d3 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159) at com.roche.medmap.search.FileContentExtractor.extract(FileContentExtractor.java:29) ... 1 more Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer underrun at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662) at org.apache.tika.parser.executable.ExecutableParser.parsePE(ExecutableParser.java:98) at org.apache.tika.parser.executable.ExecutableParser.parse(ExecutableParser.java:74) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) {code} Here is file which causes error: [^test_file.txt] was (Author: lenczykt): Hi, I have similar issue in my project error {code} Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.executable.ExecutableParser@271f18d3 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159) at com.roche.medmap.search.FileContentExtractor.extract(FileContentExtractor.java:29) ... 1 more Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer underrun at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662) at org.apache.tika.parser.executable.ExecutableParser.parsePE(ExecutableParser.java:98) at org.apache.tika.parser.executable.ExecutableParser.parse(ExecutableParser.java:74) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)[^test_file.txt] {code} > OutlookExtractor "buffer underrun" when parsing .msg with embedded .msg > ----------------------------------------------------------------------- > > Key: TIKA-2530 > URL: https://issues.apache.org/jira/browse/TIKA-2530 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.16, 1.17 > Environment: Reproduced with both Tika 1.16 and Tika 1.17 on Windows > but the problem is likely on all platform. > Reporter: Pascal Essiembre > Assignee: Tim Allison > Priority: Major > Attachments: test_file.txt > > > When parsing certain .msg files containing certain attachments (e.g. other > .msg files), I get this error: > {noformat} > ... > Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer > underrun > at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662) > at org.apache.poi.hmef.CompressedRTF.decompress(CompressedRTF.java:73) > at > org.apache.poi.util.LZWDecompresser.decompress(LZWDecompresser.java:81) > at > org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:42) > at > org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:270) > ... > {noformat} > I think the issue is with {{MAPIRtfAttribute}} not liking it when receiving > an empty byte array from {{OutlookExtractor}}. I was able to eliminate the > error at around line 269 of {{OutlookExtractor}} with Tika 1.16 code (or > around line 322 with Tika 1.17) with the following: > {code:java} > //--- START FIX --- > ByteChunk chunk = (ByteChunk) rtfChunk; > if (chunk != null && chunk.getValue() != null > && chunk.getValue().length > 0 && !doneBody) { > //ByteChunk chunk = (ByteChunk) rtfChunk; > //--- END FIX --- > {code} > I am not sure if that is a real fix or more should be done than just getting > rid of the error to make sure all is extracted properly from all files. > I cannot share the sample file I have to test since it was given to me as > sensitive content and I could not recreate a faulty msg file. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)