[ 
https://issues.apache.org/jira/browse/TIKA-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391404#comment-16391404
 ] 

Tomasz L edited comment on TIKA-2530 at 3/8/18 3:36 PM:
--------------------------------------------------------

Hi,

I have similar issue in my project

error
{code:java}
Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from org.apache.tika.parser.executable.ExecutableParser@271f18d3
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
 at 
com.roche.medmap.search.FileContentExtractor.extract(FileContentExtractor.java:29)
 ... 1 more
Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer 
underrun
 at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662)
 at 
org.apache.tika.parser.executable.ExecutableParser.parsePE(ExecutableParser.java:98)
 at 
org.apache.tika.parser.executable.ExecutableParser.parse(ExecutableParser.java:74)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

{code}

Here is file which causes error: [^test_file.txt]


was (Author: lenczykt):
Hi,

I have similar issue in my project

error

{code}

Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
IOException from org.apache.tika.parser.executable.ExecutableParser@271f18d3
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
 at 
com.roche.medmap.search.FileContentExtractor.extract(FileContentExtractor.java:29)
 ... 1 more
Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer 
underrun
 at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662)
 at 
org.apache.tika.parser.executable.ExecutableParser.parsePE(ExecutableParser.java:98)
 at 
org.apache.tika.parser.executable.ExecutableParser.parse(ExecutableParser.java:74)
 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)[^test_file.txt]

{code}

> OutlookExtractor "buffer underrun" when parsing .msg with embedded .msg
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2530
>                 URL: https://issues.apache.org/jira/browse/TIKA-2530
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.16, 1.17
>         Environment: Reproduced with both Tika 1.16 and Tika 1.17 on Windows 
> but the problem is likely on all platform.
>            Reporter: Pascal Essiembre
>            Assignee: Tim Allison
>            Priority: Major
>         Attachments: test_file.txt
>
>
> When parsing certain .msg files containing certain attachments (e.g. other 
> .msg files), I get this error:
> {noformat}
> ...
> Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer 
> underrun
>         at org.apache.poi.util.LittleEndian.readInt(LittleEndian.java:662)
>         at org.apache.poi.hmef.CompressedRTF.decompress(CompressedRTF.java:73)
>         at 
> org.apache.poi.util.LZWDecompresser.decompress(LZWDecompresser.java:81)
>         at 
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:42)
>         at 
> org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:270)
> ...
> {noformat}
> I think the issue is with {{MAPIRtfAttribute}} not liking it when receiving 
> an empty byte array from {{OutlookExtractor}}.  I was able to eliminate the 
> error at around line 269 of {{OutlookExtractor}} with Tika 1.16 code (or 
> around line 322 with Tika 1.17) with the following:
> {code:java}
>             //--- START FIX ---
>             ByteChunk chunk = (ByteChunk) rtfChunk;
>             if (chunk != null && chunk.getValue() != null 
>                     && chunk.getValue().length > 0 && !doneBody) {
>                 //ByteChunk chunk = (ByteChunk) rtfChunk;
>             //--- END FIX ---
> {code}
> I am not sure if that is a real fix or more should be done than just getting 
> rid of the error to make sure all is extracted properly from all files.
> I cannot share the sample file I have to test since it was given to me as 
> sensitive content and I could not recreate a faulty msg file.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to