[ https://issues.apache.org/jira/browse/PDFBOX-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433233#comment-17433233 ]
Maruan Sahyoun commented on PDFBOX-5286: ---------------------------------------- [~lehmi] I further looked into where the time is spent. COSStream.createView takes a lot of it which is if I'm not mistaken because for every object contained in a stream PDFStreamParser creates a new view which is reading the content, uncompresses it again and again. (I was printing the COSStream hash to the console for every createView call and see that indead this should be what happens) I did a very quick hack and stored the initially parsed uncompressed content byte[] in a COSStream member as a cache and returning that contained in a new RandomAccesReadBuffer when available instead of rereading/uncompressing. That brings the numbers further down to 7s for the large file and 1.4s for the medium file. I didn't look into possible caveats but was only looking for the performance gains so there might be side effects. WDYT? > Runtime degredation in RC1 and alpha2 > ------------------------------------- > > Key: PDFBOX-5286 > URL: https://issues.apache.org/jira/browse/PDFBOX-5286 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 3.0.0 PDFBox > Reporter: Maruan Sahyoun > Priority: Critical > > working/reviewing PDFBOX-5068 and PDFBOX-5263 I've experiencing runtime > issues for both 3.0.0-RC1 and 3.0.0-alpha2 when loading and saving a large PDF > https://crossasia-books.ub.uni-heidelberg.de/xasia/reader/download/506/506-42-86246-2-10-20190822.pdf > > ||version||runtime in millis|| > |2.0.24 |2076| > |3.0.0-RC1 |219472| > |3.0.0-alpha2 |282284| > Basic test: > {code:java} > long start = System.currentTimeMillis(); > PDDocument pdf = Loader.loadPDF(new File("506-42-86246-2-10-20190822.pdf")); > pdf.save(new NullOutputStream()); > pdf.close(); > long end = System.currentTimeMillis(); > System.out.println("Elapsed Time in milliseconds: "+ (end-start)); > {code} > with NullOuputStream > {code:java} > package org.apache.pdfbox; > import java.io.IOException; > import java.io.OutputStream; > public class NullOutputStream extends OutputStream { > @Override > public void write(byte[] b) throws IOException { > // don't write anything > } > @Override > public void write(byte[] b, int off, int len) throws IOException { > // don't write anything > } > @Override > public void write(int b) throws IOException { > // don't write anything > } > } > {code} > I've also running tests using JMH - they support these numbers. The > difference in numbers for RC1/alpha2 are within a regular variation. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org