[ 
https://issues.apache.org/jira/browse/PDFBOX-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433233#comment-17433233
 ] 

Maruan Sahyoun commented on PDFBOX-5286:
----------------------------------------

[~lehmi] I further looked into where the time is spent. COSStream.createView 
takes a lot of it which is if I'm not mistaken because for every object 
contained in a stream PDFStreamParser creates a new view which is reading the 
content, uncompresses it again and again. (I was printing the COSStream hash to 
the console for every createView call and see that indead this should be what 
happens)

I did a very quick hack and stored the initially parsed uncompressed content 
byte[] in a COSStream member as a cache and returning that contained in a new 
RandomAccesReadBuffer  when available instead of rereading/uncompressing.

That brings the numbers further down to 7s for the large file and 1.4s for the 
medium file.

I didn't look into possible caveats but was only looking for the performance 
gains so there might be side effects.

WDYT?

> Runtime degredation in RC1 and alpha2
> -------------------------------------
>
>                 Key: PDFBOX-5286
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5286
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Maruan Sahyoun
>            Priority: Critical
>
> working/reviewing PDFBOX-5068 and PDFBOX-5263 I've experiencing runtime 
> issues for both 3.0.0-RC1 and 3.0.0-alpha2 when loading and saving a large PDF
> https://crossasia-books.ub.uni-heidelberg.de/xasia/reader/download/506/506-42-86246-2-10-20190822.pdf
>  
> ||version||runtime in millis||
> |2.0.24 |2076|
> |3.0.0-RC1 |219472|
> |3.0.0-alpha2 |282284|
> Basic test:
> {code:java}
> long start = System.currentTimeMillis();
> PDDocument pdf = Loader.loadPDF(new File("506-42-86246-2-10-20190822.pdf"));
> pdf.save(new NullOutputStream());
> pdf.close();        
> long end = System.currentTimeMillis();      
> System.out.println("Elapsed Time in milliseconds: "+ (end-start));     
> {code}
> with NullOuputStream
> {code:java}
> package org.apache.pdfbox;
> import java.io.IOException;
> import java.io.OutputStream;
> public class NullOutputStream extends OutputStream {
>     @Override
>     public void write(byte[] b) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(byte[] b, int off, int len) throws IOException {
>         // don't write anything
>     }
>     @Override
>     public void write(int b) throws IOException {
>         // don't write anything
>     }
> }
> {code}
> I've also running tests using JMH - they support these numbers. The 
> difference in numbers for RC1/alpha2 are within a regular variation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to