[ 
https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180782#comment-17180782
 ] 

Akash edited comment on TIKA-3154 at 8/19/20, 7:57 PM:
-------------------------------------------------------

Tried with below config. Still same error. Seems property is not considered.
{code:java}
/<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.microsoft.OfficeParser"/>
    </parser>
    <parser class="org.apache.tika.parser.microsoft.OfficeParser">
      <params>
        <param name="byteArrayMaxOverride" type="int">50000000</param>
      </params>
    </parser>
  </parsers>
</properties>
/ 
{code}

>From 
>https://github.com/apache/tika/blob/7f0394247c8f5a731b258adbd6683449bc5c757b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
We dont have any variable with name byteArrayMaxOverride


was (Author: akki1607):
Tried with below config. Still same error. Seems property is not considered.
{code:java}
/<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.DefaultParser">
      <parser-exclude class="org.apache.tika.parser.microsoft.OfficeParser"/>
    </parser>
    <parser class="org.apache.tika.parser.microsoft.OfficeParser">
      <params>
        <param name="byteArrayMaxOverride" type="int">50000000</param>
      </params>
    </parser>
  </parsers>
</properties>
/ 
{code}

> Exception while extracting msg files
> ------------------------------------
>
>                 Key: TIKA-3154
>                 URL: https://issues.apache.org/jira/browse/TIKA-3154
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>
> While parsing msg file containing some html text inside, we are getting 
> exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming : 
> {code:java}
> /Aug 07, 2020 10:59:00 PM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 
> undefined)
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 
> undefined)
>       at 
> org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 
> undefined)
>       at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 
> undefined)
>       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
>       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an 
> array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request 
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with 
> IOUtils.setByteArrayMaxOverride()
>       at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
>       at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
>       at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 
> undefined)
>       at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 
> undefined)
>       at 
> org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49
>  undefined)
>       at 
> org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328
>  undefined)
>       at 
> org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247
>  undefined)
>       at 
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 
> undefined)
>       at 
> org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 
> undefined)
>       at 
> org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 
> undefined)/ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to