[ https://issues.apache.org/jira/browse/PDFBOX-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575819#comment-14575819 ]
Andreas Lehmkühler commented on PDFBOX-2301: -------------------------------------------- I've added Jesses proposed patch with some slight modifications and reformatting. I've added a license header to both new files too. Thanks for the contribution! > RandomAccessBuffer consumes too much memory. > -------------------------------------------- > > Key: PDFBOX-2301 > URL: https://issues.apache.org/jira/browse/PDFBOX-2301 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.8.6, 2.0.0 > Reporter: gee > Assignee: Andreas Lehmkühler > Priority: Blocker > Fix For: 2.0.0 > > Attachments: clone.diff, clone2.diff, clone3.diff, clone4.diff, > pdfbox-scratchfile.patch > > > RandomAccessBuffer holds uncompressed image during operation because it is > what exactly pdfbox ExtractImages do. > but holding uncompressed image instead of compressed one in memory consumes > too much memory, not excluding many PDF XObjects that can use filter to > compress itself. It would be good if pdfbox provides option that reverts to > COSObject state just before the RandomAccess object created(the state that > pdf XObject stream parsed and COSDictionary objects haven't created because > user doesn't requested it using get____() method.) It is crucial feature so > that pdfbox can analyze huge pdf file(>100MB). > In current source, one must close COSStream unless required(and I know closed > stream cannot reopened again.) > Class Name > > > | > Shallow Heap | Retained Heap > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > org.apache.pdfbox.cos.COSObject @ 0x5ad4940 > > > | > 24 | 8,187,264 > |- <class> class org.apache.pdfbox.cos.COSObject @ 0x58c4020 > > > | > 0 | 0 > |- generationNumber org.apache.pdfbox.cos.COSInteger @ 0x5ad0080 > > > | > 24 | 24 > |- baseObject org.apache.pdfbox.cos.COSStream @ 0x5b25ea0 > > > | > 32 | 8,187,216 > | |- <class> class org.apache.pdfbox.cos.COSStream @ 0x58c3e00 > > > | > 8 | 8 > | |- items java.util.LinkedHashMap @ 0x5b2a0f0 > > > | > 56 | 552 > | |- file org.apache.pdfbox.io.RandomAccessBuffer @ 0x5b2a128 > > > | > 48 | 8,186,528 > | | |- <class> class org.apache.pdfbox.io.RandomAccessBuffer @ 0x5ad2b00 > > > | > 8 | 8 > | | |- currentBuffer byte[16384] @ 0x590f360 16,400 | 16,400 > | | |- bufferList java.util.ArrayList @ 0x5b2e200 > > > | > 24 | 8,170,080 > | | '- Total: 3 entries > > > | > | > | |- filteredStream org.apache.pdfbox.io.RandomAccessFileOutputStream @ > 0x5b2a158 > > > | 32 | 32 > | |- decodeResult org.apache.pdfbox.filter.DecodeResult @ 0xa65f618 > > > | > 16 | 16 > | |- unFilteredStream org.apache.pdfbox.io.RandomAccessFileOutputStream @ > 0xa71ab18 > > | > 32 | 32 > | '- Total: 6 entries > > > | > | > |- objectNumber org.apache.pdfbox.cos.COSInteger @ 0x5b25ec0 > > > | > 24 | 24 > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org