[ 
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash updated TIKA-3154:
------------------------
    Description: 
While parsing msg file containing some html text inside, we are getting 
exception from Tika.

Command : java -jar tika-app-1.24.1.jar html_code.msg

Exception coming : 
{code:java}
/Aug 07, 2020 10:59:00 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
        at 
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 undefined)
        at 
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
        at 
org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 
undefined)
        at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 
undefined)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an 
array of length 1326748, but 1000000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request 
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with 
IOUtils.setByteArrayMaxOverride()
        at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
        at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
        at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 
undefined)
        at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 
undefined)
        at 
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49 
undefined)
        at 
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
 undefined)
        at 
org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
 undefined)
        at 
org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 
undefined)
        at 
org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 
undefined)
        at 
org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 
undefined)/ 
{code}

  was:
While parsing msg file containing some html text inside, we are getting 
exception from Tika.

Command : java -jar tika-app-1.24.1.jar html_code.msg

Exception coming : 

See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for 
the correct version.Exception in thread "main" 
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at 
org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at 
org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by: 
org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 
1326748, but 1000000 is the maximum for this record type.If the file is not 
corrupt, please open an issue on bugzilla to request increasing the maximum 
allowable size for this record type.As a temporary workaround, consider setting 
a higher override value with IOUtils.setByteArrayMaxOverride()
{code:java}
/at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at 
org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at 
org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at 
org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at 
org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49) 
at 
org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328)
 at 
org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199) 
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) 
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 
more/ 
{code}


> Exception while extracting msg files
> ------------------------------------
>
>                 Key: TIKA-3154
>                 URL: https://issues.apache.org/jira/browse/TIKA-3154
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>
> While parsing msg file containing some html text inside, we are getting 
> exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming : 
> {code:java}
> /Aug 07, 2020 10:59:00 PM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 
> undefined)
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 
> undefined)
>       at 
> org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 
> undefined)
>       at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 
> undefined)
>       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
>       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an 
> array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request 
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with 
> IOUtils.setByteArrayMaxOverride()
>       at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
>       at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
>       at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 
> undefined)
>       at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 
> undefined)
>       at 
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
>  undefined)
>       at 
> org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
>  undefined)
>       at 
> org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
>  undefined)
>       at 
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 
> undefined)
>       at 
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 
> undefined)
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 
> undefined)/ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to