[jira] [Updated] (PDFBOX-4869) Reading standard 14 fonts is slow

Alfred (Jira) Tue, 09 Jun 2020 02:19:10 -0700


     [ 
https://issues.apache.org/jira/browse/PDFBOX-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alfred updated PDFBOX-4869:
---------------------------
    Description: 
I ham testing text extraction from PDF and profiling the execution.

I found that the second biggest time consumer is the static code in 
Standard14Fonts that loads fonts from the pdf box jar.

The culprit seems to be the direct use of the stream returned 
getResurceAsStream.
That would be a ZipInputStream when using PDFBox as a jar.

Using a buffered stream around it reduces the load time a lot.

 

  was:
I ham testing text extraction from PDF and profiling the execution.

I found that the second biggest time consumer is the static code in 
Standard14Fonts that loads fonts from the pdf box jar.

The culprit seems to be the direct use of the stream returned 
getResurceAsStream.

Using a buffered stream around it reduces the load time a lot.

 


> Reading standard 14 fonts is slow
> ---------------------------------
>
>                 Key: PDFBOX-4869
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4869
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing, Text extraction
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Alfred
>            Priority: Major
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I ham testing text extraction from PDF and profiling the execution.
> I found that the second biggest time consumer is the static code in 
> Standard14Fonts that loads fonts from the pdf box jar.
> The culprit seems to be the direct use of the stream returned 
> getResurceAsStream.
> That would be a ZipInputStream when using PDFBox as a jar.
> Using a buffered stream around it reduces the load time a lot.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-4869) Reading standard 14 fonts is slow

Reply via email to