Daniel Persson created PDFBOX-4296:
--------------------------------------

             Summary: Question: Performance
                 Key: PDFBOX-4296
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4296
             Project: PDFBox
          Issue Type: Improvement
          Components: Rendering
    Affects Versions: 2.0.11
            Reporter: Daniel Persson


Hi Team.

We use a tool we built using PDFBox to extract text for about 10k pages per 
day. Then we have another tool to extract images using Poppler.

We want to use PDFBox for both tasks but sadly we see a performance hit using 
PDFBox in the order of 3 times.

Do you have any backlog / technical dept / ideas on how to improve performance?

We have tried -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true and 
that made image generation much slower.
We have set System.setProperty("sun.java2d.cmm", 
"sun.java2d.cmm.kcms.KcmsServiceProvider") in code.

We use image libraries from twelvemonkeys, pdfbox and the standard jai project.

I've read in the code that we do double writes for images using transparency 
which might be a culprit.

I have been allowed to put some time into the project if we have some solid 
leads or a roadmap to reach better performance.

Hope it's okay to track this issue here instead of a question on the mailing 
list.

Best regards

Daniel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to