[
https://issues.apache.org/jira/browse/PDFBOX-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433233#comment-17433233
]
Maruan Sahyoun commented on PDFBOX-5286:
----------------------------------------
[~lehmi] I further looked into where the time is spent. COSStream.createView
takes a lot of it which is if I'm not mistaken because for every object
contained in a stream PDFStreamParser creates a new view which is reading the
content, uncompresses it again and again. (I was printing the COSStream hash to
the console for every createView call and see that indead this should be what
happens)
I did a very quick hack and stored the initially parsed uncompressed content
byte[] in a COSStream member as a cache and returning that contained in a new
RandomAccesReadBuffer when available instead of rereading/uncompressing.
That brings the numbers further down to 7s for the large file and 1.4s for the
medium file.
I didn't look into possible caveats but was only looking for the performance
gains so there might be side effects.
WDYT?
> Runtime degredation in RC1 and alpha2
> -------------------------------------
>
> Key: PDFBOX-5286
> URL: https://issues.apache.org/jira/browse/PDFBOX-5286
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 3.0.0 PDFBox
> Reporter: Maruan Sahyoun
> Priority: Critical
>
> working/reviewing PDFBOX-5068 and PDFBOX-5263 I've experiencing runtime
> issues for both 3.0.0-RC1 and 3.0.0-alpha2 when loading and saving a large PDF
> https://crossasia-books.ub.uni-heidelberg.de/xasia/reader/download/506/506-42-86246-2-10-20190822.pdf
>
> ||version||runtime in millis||
> |2.0.24 |2076|
> |3.0.0-RC1 |219472|
> |3.0.0-alpha2 |282284|
> Basic test:
> {code:java}
> long start = System.currentTimeMillis();
> PDDocument pdf = Loader.loadPDF(new File("506-42-86246-2-10-20190822.pdf"));
> pdf.save(new NullOutputStream());
> pdf.close();
> long end = System.currentTimeMillis();
> System.out.println("Elapsed Time in milliseconds: "+ (end-start));
> {code}
> with NullOuputStream
> {code:java}
> package org.apache.pdfbox;
> import java.io.IOException;
> import java.io.OutputStream;
> public class NullOutputStream extends OutputStream {
> @Override
> public void write(byte[] b) throws IOException {
> // don't write anything
> }
> @Override
> public void write(byte[] b, int off, int len) throws IOException {
> // don't write anything
> }
> @Override
> public void write(int b) throws IOException {
> // don't write anything
> }
> }
> {code}
> I've also running tests using JMH - they support these numbers. The
> difference in numbers for RC1/alpha2 are within a regular variation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]