[ 
https://issues.apache.org/jira/browse/TIKA-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shunfei Chen updated TIKA-3243:
-------------------------------
    Description: 
We are using Tika library AutoDetectParser to extract metadata from a variety 
of files. We have been seeing some TikaException(stack trace below) in the past 
month since we upgraded to tika 1.24.1.
  
{code:java}
Caused by: org.apache.tika.exception.TikaException: data length must be < 
1000000: 17777730
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
 at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
 at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
  {code}

 However, I think the PSD file we are parsing is a valid file. I can view it 
and can open it with photoshop. After some digging, I believe the changes was 
introduce as part of this jira https://issues.apache.org/jira/browse/TIKA-3050 
and this commit 
[https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:]
 232).
  
 The biggest size we have seen in from the files our users uploaded is 
161548458 so far, which is way above 161548458. 
  
 Thanks
 Shunfei. 

  was:
We are using Tika library AutoDetectParser to extract metadata from a variety 
of files. We have been seeing some TikaException(stack trace below) in the past 
month since we upgraded to tika 1.24.1.
 
Caused by: org.apache.tika.exception.TikaException: data length must be < 
1000000: 17777730
at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
at 
org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
 
However, I think the PSD file we are parsing is a valid file. I can view it and 
can open it with photoshop. After some digging, I believe the changes was 
introduce as part of this jira https://issues.apache.org/jira/browse/TIKA-3050 
and this commit 
https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:
 232).
 
The biggest size we have seen in from the files our users uploaded is 161548458 
so far, which is way above 161548458. 
 
Thanks
Shunfei. 


> PSDParser MAX_DATA_LENGTH_BYTES check causes TikaException
> ----------------------------------------------------------
>
>                 Key: TIKA-3243
>                 URL: https://issues.apache.org/jira/browse/TIKA-3243
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Shunfei Chen
>            Priority: Major
>
> We are using Tika library AutoDetectParser to extract metadata from a variety 
> of files. We have been seeing some TikaException(stack trace below) in the 
> past month since we upgraded to tika 1.24.1.
>   
> {code:java}
> Caused by: org.apache.tika.exception.TikaException: data length must be < 
> 1000000: 17777730
>  at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
>  at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
>  at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
>   {code}
>  However, I think the PSD file we are parsing is a valid file. I can view it 
> and can open it with photoshop. After some digging, I believe the changes was 
> introduce as part of this jira 
> https://issues.apache.org/jira/browse/TIKA-3050 and this commit 
> [https://github.com/apache/tika/commit/ab8a9ed830ec710a32e4ffdf4989aea3aaea92ef(line:]
>  232).
>   
>  The biggest size we have seen in from the files our users uploaded is 
> 161548458 so far, which is way above 161548458. 
>   
>  Thanks
>  Shunfei. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to