[ 
https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138744#comment-14138744
 ] 

Andreas Lehmkühler commented on PDFBOX-2301:
--------------------------------------------

IMHO we should start at the innermost point: the handling of 
compressed/uncompressed stream within COSStream

> RandomAccessBuffer consumes too much memory.
> --------------------------------------------
>
>                 Key: PDFBOX-2301
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2301
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: gee
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: clone.diff, clone2.diff, clone3.diff
>
>
> RandomAccessBuffer holds uncompressed image during operation because it is 
> what exactly pdfbox ExtractImages do.
> but holding uncompressed image instead of compressed one in memory consumes 
> too much memory, not excluding many PDF XObjects that can use filter to 
> compress itself. It would be good if pdfbox provides option that reverts to 
> COSObject state just before the RandomAccess object created(the state that 
> pdf XObject stream parsed and COSDictionary objects haven't created because 
> user doesn't requested it using get____() method.) It is crucial feature so 
> that pdfbox can analyze huge pdf file(>100MB).
> In current source, one must close COSStream unless required(and I know closed 
> stream cannot reopened again.)
> Class Name                                                                    
>                                                                               
>                                                                               
>                                                                          | 
> Shallow Heap | Retained Heap
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> org.apache.pdfbox.cos.COSObject @ 0x5ad4940                                   
>                                                                               
>                                                                               
>                                                                          |    
>        24 |     8,187,264
> |- <class> class org.apache.pdfbox.cos.COSObject @ 0x58c4020                  
>                                                                               
>                                                                               
>                                                                          |    
>         0 |             0
> |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080              
>                                                                               
>                                                                               
>                                                                          |    
>        24 |            24
> |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0                     
>                                                                               
>                                                                               
>                                                                          |    
>        32 |     8,187,216
> |  |- <class> class org.apache.pdfbox.cos.COSStream @ 0x58c3e00               
>                                                                               
>                                                                               
>                                                                          |    
>         8 |             8
> |  |- items java.util.LinkedHashMap @ 0x5b2a0f0                               
>                                                                               
>                                                                               
>                                                                          |    
>        56 |           552
> |  |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128                
>                                                                               
>                                                                               
>                                                                          |    
>        48 |     8,186,528
> |  |  |- <class> class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00    
>                                                                               
>                                                                               
>                                                                          |    
>         8 |             8
> |  |  |- currentBuffer byte[16384] @ 0x590f360      16,400 |        16,400
> |  |  |- bufferList java.util.ArrayList @ 0x5b2e200                           
>                                                                               
>                                                                               
>                                                                          |    
>        24 |     8,170,080
> |  |  '- Total: 3 entries                                                     
>                                                                               
>                                                                               
>                                                                          |    
>           |              
> |  |- filteredStream org.apache.pdfbox.io.RandomAccessFileOutputStream @ 
> 0x5b2a158                                                                     
>                                                                               
>                                                                               
> |           32 |            32
> |  |- decodeResult org.apache.pdfbox.filter.DecodeResult @ 0xa65f618          
>                                                                               
>                                                                               
>                                                                          |    
>        16 |            16
> |  |- unFilteredStream org.apache.pdfbox.io.RandomAccessFileOutputStream @ 
> 0xa71ab18                                                                     
>                                                                               
>                                                                             | 
>           32 |            32
> |  '- Total: 6 entries                                                        
>                                                                               
>                                                                               
>                                                                          |    
>           |              
> |- objectNumber org.apache.pdfbox.cos.COSInteger @ 0x5b25ec0                  
>                                                                               
>                                                                               
>                                                                          |    
>        24 |            24
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to