Shunfei Chen created TIKA-3243:
----------------------------------
Summary: PSDParser MAX_DATA_LENGTH_BYTES check causes TikaException
Key: TIKA-3243
URL: https://issues.apache.org/jira/browse/TIKA-3243
Project: Tika
Issue Type: Bug
Reporter: Shunfei Chen
We are using Tika library AutoDetectParser to extract metadata from a variety
of files. We have been seeing some TikaException(stack trace below) in the past
month since we upgraded to tika 1.24.1.
Caused by: org.apache.tika.exception.TikaException: data length must be <
1000000: 17777730
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
at
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
However, I think the PSD file we are parsing is a valid file. I can view it and
can open it with photoshop. After some digging, I believe the changes was
introduce as part of this jira https://issues.apache.org/jira/browse/TIKA-3050
and this commit
https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:
232).
The biggest size we have seen in from the files our users uploaded is 161548458
so far, which is way above 161548458.
Thanks
Shunfei.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)