[ 
https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141390#comment-16141390
 ] 

ASF GitHub Bot commented on TIKA-2447:
--------------------------------------

bjrke commented on issue #200: TIKA-2447 reduce memory consumption of PSDParser
URL: https://github.com/apache/tika/pull/200#issuecomment-324862327
 
 
   and it is fixed with 
https://github.com/apache/poi/commit/c7db66a30dfb6cbbd5812ff3ae4c90ed2d9b9a27
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PSDParser creates unnecessary large byte array and discards it
> --------------------------------------------------------------
>
>                 Key: TIKA-2447
>                 URL: https://issues.apache.org/jira/browse/TIKA-2447
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15, 1.16
>         Environment: openjdk version "1.8.0_131"
> few memory (currently using 256M xmx)
>            Reporter: Jan Burkhardt
>            Priority: Critical
>             Fix For: 1.17
>
>
> PSD files (Adobe Photoshop) are split into ResourceBlock's which contain 
> different data, but only Caption Blocks are currently extracted into the 
> description.
> Parsing a file with very big blocks, i.e. for image data, a byte array of the 
> size of the block is allocated:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
> even if it is discarded after that:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116
>  and following lines
> This causes huge memory consumption and finally killed the App with an 
> OutOfMemoryError.
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191)
>  ~[tika-parsers-1.15.jar!/:1.15]
>         at 
> org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141)
>  ~[tika-parsers-1.15.jar!/:1.15]
>         at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) 
> ~[tika-parsers-1.15.jar!/:1.15]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) 
> ~[tika-core-1.15.jar!/:1.15]
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) 
> ~[tika-core-1.15.jar!/:1.15]
>         at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) 
> ~[tika-core-1.15.jar!/:1.15]
> {noformat}
> I am not able to deliver a file to reproduce that, since the file which 
> caused that issue is owned by one of our customers.
> I will prepare a pull request to fix that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to