[
https://issues.apache.org/jira/browse/TIKA-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240795#comment-17240795
]
Kenneth William Krugler commented on TIKA-3239:
-----------------------------------------------
Hi [~harirehm] - this is the expected behavior. There's no way to communicate
back that data was dropped due to the limit being hit, thus an exception is
thrown.
As a side comment, please ask questions like this on the mailing list, as
that's a lighter-weight way of handling, and others can benefit from the
exchange. See
https://tika.apache.org/mail-lists.html#:~:text=The%20user%20mailing%20list%20at,in%20contributing%20to%20Tika%20development.
> TikaException: data length must be < 1000000
> --------------------------------------------
>
> Key: TIKA-3239
> URL: https://issues.apache.org/jira/browse/TIKA-3239
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.24.1
> Reporter: HARI RAM
> Priority: Major
>
> Tika exception is thrown when trying to parse PSD files using the latest tika
> version (1.24.1).
>
>
> {code:java}
> org.apache.tika.exception.TikaException: data length must be < 1000000:
> 7108276
> at
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
> at
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
> at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at org.apache.tika.Tika.parseToString(Tika.java:527)
> at org.apache.tika.Tika.parseToString(Tika.java:602)
> {code}
>
> Is this limit configurable? Shouldn't that be parsing up to the limit and
> return the parsed data?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)