Akash created TIKA-3154:
---------------------------

             Summary: Exception while extracting msg files
                 Key: TIKA-3154
                 URL: https://issues.apache.org/jira/browse/TIKA-3154
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.24.1
            Reporter: Akash


While parsing msg file containing some html text inside, we are getting 
exception from Tika.

Command : java -jar tika-app-1.24.1.jar html_code.msg

Exception coming : 

See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for 
the correct version.Exception in thread "main" 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at 
org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at 
org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by: 
org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 
1326748, but 1000000 is the maximum for this record type.If the file is not 
corrupt, please open an issue on bugzilla to request increasing the maximum 
allowable size for this record type.As a temporary workaround, consider setting 
a higher override value with IOUtils.setByteArrayMaxOverride() at 
org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at 
org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at 
org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at 
org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at 
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49) 
at 
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328)
 at 
org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) 
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 
more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to