[jira] [Updated] (TIKA-2447) PSDParser creates unnecessary large byte array and discards it

Jan Burkhardt (JIRA) Thu, 24 Aug 2017 07:34:43 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jan Burkhardt updated TIKA-2447:
--------------------------------
    Description: 
PSD (Adobe Photoshop) are split into ResourceBlock's which contain different 
data, but only Caption Blocks are currently extracted into the description.
Parsing a file with very big blocks, i.e. for image data, a byte array of the 
size of the block is allocated:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191

even if it is discarded after that:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117
 and following lines

I am not able to deliver a file to reproduce thate, since the file which caused 
that issue is owned by one of our customers.
I will prepare a pull request to fix that.

  was:
PSD (Adobe Photoshop) are split into ResourceBlock's which contain different 
data, but only Caption Blocks are currently extracted into the description.
Parsing a file with very big blocks, i.e. for image data, a byte array of the 
size of the block is allocated:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191

even if it is discarded after that:
https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117
 and following lines

I will prepare a pull request to fix that.


> PSDParser creates unnecessary large byte array and discards it
> --------------------------------------------------------------
>
>                 Key: TIKA-2447
>                 URL: https://issues.apache.org/jira/browse/TIKA-2447
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15, 1.16
>         Environment: openjdk version "1.8.0_131"
> few memory (currently using 256M xmx)
>            Reporter: Jan Burkhardt
>            Priority: Critical
>
> PSD (Adobe Photoshop) are split into ResourceBlock's which contain different 
> data, but only Caption Blocks are currently extracted into the description.
> Parsing a file with very big blocks, i.e. for image data, a byte array of the 
> size of the block is allocated:
> https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
> even if it is discarded after that:
> https://github.com/justsocialapps/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L117
>  and following lines
> I am not able to deliver a file to reproduce thate, since the file which 
> caused that issue is owned by one of our customers.
> I will prepare a pull request to fix that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (TIKA-2447) PSDParser creates unnecessary large byte array and discards it

Reply via email to