[ 
https://issues.apache.org/jira/browse/PDFBOX-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133221#comment-17133221
 ] 

Andreas Lehmkühler edited comment on PDFBOX-4877 at 6/11/20, 12:25 PM:
-----------------------------------------------------------------------

{quote}Oh, and another question is, why do we do ?

if (!Float.isFinite(c[0]) || !Float.isFinite(c[1]) || !Float.isFinite(c[2]) ...
{quote}
It is just a dream that all pdfs are wellformed. We are doing a lot of checks 
and magic repairs to avoid issues while parsing and of course to avoid 
questions like "Why can't PDFBox parse this pdf, adobe (or any other popular 
pdf tool) can. See PDFBOX-4778 for further information


was (Author: lehmi):
{quote}Oh, and another question is, why do we do ?

if (!Float.isFinite(c[0]) || !Float.isFinite(c[1]) || !Float.isFinite(c[2]) ...
{quote}
It is just a dream that all pdfs are wellformed. We are doing a lot of checks 
and magic repairs to avoid issues while parsing and of course to avoid 
questions like "Why can't PDFBox parse this pdf, adobe (or any other popular 
pdf tool) can. See PDFBOX-4778 for further informaation

> Matrix class performance improvements
> -------------------------------------
>
>                 Key: PDFBOX-4877
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4877
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing, Text extraction
>    Affects Versions: 2.0.20, 3.0.0 PDFBox
>            Reporter: Alfred
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>              Labels: Optimization
>         Attachments: PDFBOX-4877.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I am testing text extraction from PDF and profiling the execution.
> I found that the third major time consumer is with matrix multiplicaitons.
> The Matrix class spends large amounts of time copying results to new 
> instances. 
> Also, the if statements are slowing down execution as they kill performance 
> in modern CPUs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to