[jira] [Commented] (PDFBOX-5809) PDDocument#importPage slowed down by factor 1300

Tilman Hausherr (Jira) Thu, 02 May 2024 00:53:05 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842871#comment-17842871
 ]


Tilman Hausherr commented on PDFBOX-5809:
-----------------------------------------

(1), (2) and (4) are fixed once again, but not (3) due to the different 
approach. Let's wait if others have better ideas. If we can't solve this then 
you should stay with 2.0 for splitting which does not have the slowness 
problem, but will have bigger result files because compressed object streams 
aren't supported, which is mostly relevant for files with a structure tree 
(accessibility).

> PDDocument#importPage slowed down by factor 1300
> ------------------------------------------------
>
>                 Key: PDFBOX-5809
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5809
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.31, 3.0.2 PDFBox
>            Reporter: Marcus Korinth
>            Priority: Major
>             Fix For: 2.0.32, 4.0.0, 3.0.3 PDFBox
>
>         Attachments: image-2024-04-27-18-50-19-199.png
>
>
> We are using the *PDDocument#importPage* Method in our own splitter where we 
> split pages from a _SourceDocument_ to a _TargetDocument_. In order to do so 
> we first extract the page by using the following code:
> {code:java}
> final PDPage sourcePage = sourceDocument.getPage(pageNumber);
> {code}
> Immediatly afterwards we are calling:
> {code:java}
> final PDPage targetPage = targetDocument.importPage(sourcePage);
> {code}
> This approach worked just fine with *pdfbox 2.0.26*.
> We decided to upgrade to version *3.0.2* since it takles a lot of the 
> problems.
> Unfortunately the *PDDocument#importPage* method slowed down by around 1300 
> times. In Version *2.0.26* it took 15ms in an average. With the latest 
> *3.0.2* it takes 20000 ms in average. That is a huge deal breaker as we 
> usually have to split documents which have several thousand pages.
> Note: The same applies when using *PDDocument#addPage*.
> Note: The problem does not appear in *3.0.1*. But we can't use that since it 
> has other major problems which breaks our application.
> I have prepared an example document with which you can replicate the issue. 
> Due to the file size limitation I had to prepare a WeTransfer-Link for you: 
> https://we.tl/t-lfN2wz7cAs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5809) PDDocument#importPage slowed down by factor 1300

Reply via email to