[ 
https://issues.apache.org/jira/browse/PDFBOX-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433577#comment-17433577
 ] 

Andreas Lehmkühler commented on PDFBOX-5286:
--------------------------------------------

Yes, I agree, compressed object streams are one main reason. My last commit 
fixes an issue with recreating the COSStream itself again and again. If it 
comes to caching there are always two side of a coin. On one hand it is 
(hopefully) faster and on the other hand it will consume more memory. Let's 
start with caching the decompressed stream itself. In the long run we should 
cache the COS-objects and not the stream to avoid caching the data twice. I'm 
not sure how to do so, storing those objects within the COSStream, the object 
pool of COSDocument or by introducing a new class for compressed streams which 
inherits COSStream.

> Runtime degredation in RC1 and alpha2
> -------------------------------------
>
>                 Key: PDFBOX-5286
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5286
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Maruan Sahyoun
>            Priority: Critical
>
> working/reviewing PDFBOX-5068 and PDFBOX-5263 I've experiencing runtime 
> issues for both 3.0.0-RC1 and 3.0.0-alpha2 when loading and saving a large PDF
> https://crossasia-books.ub.uni-heidelberg.de/xasia/reader/download/506/506-42-86246-2-10-20190822.pdf
>  
> ||version||runtime in millis||
> |2.0.24 |2076|
> |3.0.0-RC1 |219472|
> |3.0.0-alpha2 |282284|
> Basic test:
> {code:java}
> long start = System.currentTimeMillis();
> PDDocument pdf = Loader.loadPDF(new File("506-42-86246-2-10-20190822.pdf"));
> pdf.save(new NullOutputStream());
> pdf.close();        
> long end = System.currentTimeMillis();      
> System.out.println("Elapsed Time in milliseconds: "+ (end-start));     
> {code}
> with NullOuputStream
> {code:java}
> package org.apache.pdfbox;
> import java.io.IOException;
> import java.io.OutputStream;
> public class NullOutputStream extends OutputStream {
>     @Override
>     public void write(byte[] b) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(byte[] b, int off, int len) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(int b) throws IOException {
>         // don't write anything
>     }
> }
> {code}
> I've also running tests using JMH - they support these numbers. The 
> difference in numbers for RC1/alpha2 are within a regular variation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to