[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174961#comment-17174961 ]
Tim Allison commented on TIKA-3154: ----------------------------------- Opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=64659 > Exception while extracting msg files > ------------------------------------ > > Key: TIKA-3154 > URL: https://issues.apache.org/jira/browse/TIKA-3154 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.24.1 > Reporter: Akash > Priority: Major > > While parsing msg file containing some html text inside, we are getting > exception from Tika. > Command : java -jar tika-app-1.24.1.jar html_code.msg > Exception coming : > {code:java} > /Aug 07, 2020 10:59:00 PM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > Exception in thread "main" org.apache.tika.exception.TikaException: > Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 > at > org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 > undefined) > at > org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 > undefined) > at > org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 > undefined) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 > undefined) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined) > Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an > array of length 1326748, but 1000000 is the maximum for this record type. > If the file is not corrupt, please open an issue on bugzilla to request > increasing the maximum allowable size for this record type. > As a temporary workaround, consider setting a higher override value with > IOUtils.setByteArrayMaxOverride() > at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined) > at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined) > at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 > undefined) > at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 > undefined) > at > org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49 > undefined) > at > org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328 > undefined) > at > org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247 > undefined) > at > org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 > undefined) > at > org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 > undefined) > at > org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 > undefined)/ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)