[ https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Burkhardt updated TIKA-2447: -------------------------------- Description: PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description. Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated: https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191 even if it is discarded after that: https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117 and following lines This causes huge memory consumption and finally killed the App with an OutOfMemoryError. {noformat} java.lang.OutOfMemoryError: Java heap space at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) ~[tika-parsers-1.15.jar!/:1.15] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) ~[tika-core-1.15.jar!/:1.15] {noformat} I am not able to deliver a file to reproduce thate, since the file which caused that issue is owned by one of our customers. I will prepare a pull request to fix that. was: PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description. Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated: https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191 even if it is discarded after that: https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117 and following lines I am not able to deliver a file to reproduce thate, since the file which caused that issue is owned by one of our customers. I will prepare a pull request to fix that. > PSDParser creates unnecessary large byte array and discards it > -------------------------------------------------------------- > > Key: TIKA-2447 > URL: https://issues.apache.org/jira/browse/TIKA-2447 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.15, 1.16 > Environment: openjdk version "1.8.0_131" > few memory (currently using 256M xmx) > Reporter: Jan Burkhardt > Priority: Critical > > PSD files (Adobe Photoshop) are split into ResourceBlock's which contain > different data, but only Caption Blocks are currently extracted into the > description. > Parsing a file with very big blocks, i.e. for image data, a byte array of the > size of the block is allocated: > https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191 > even if it is discarded after that: > https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117 > and following lines > This causes huge memory consumption and finally killed the App with an > OutOfMemoryError. > {noformat} > java.lang.OutOfMemoryError: Java heap space > at > org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) > ~[tika-parsers-1.15.jar!/:1.15] > at > org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) > ~[tika-parsers-1.15.jar!/:1.15] > at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) > ~[tika-parsers-1.15.jar!/:1.15] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[tika-core-1.15.jar!/:1.15] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[tika-core-1.15.jar!/:1.15] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) > ~[tika-core-1.15.jar!/:1.15] > {noformat} > I am not able to deliver a file to reproduce thate, since the file which > caused that issue is owned by one of our customers. > I will prepare a pull request to fix that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)