[jira] [Commented] (PDFBOX-2860) NonSeq parser slower than Seq parser

JIRA Mon, 13 Jul 2015 08:52:00 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624833#comment-14624833
 ]


Andreas Lehmkühler commented on PDFBOX-2860:
--------------------------------------------

The non sequential parser of PDFBox needs random access to the pdf, so that an 
input stream is copied to a file (1.8.9) before parsing it. I guess that's one 
of the reasons/the reason for the different performance.

BTW: in 2.0.0 the user can decide if the stream is copied to the memory or a 
file (scratchfile = true)

> NonSeq parser slower than Seq parser
> ------------------------------------
>
>                 Key: PDFBOX-2860
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2860
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: simon steiner
>
> PDF from PDFBOX-797
>         for (int i=0; i<1000; i++) {
>             PDDocument.load(new FileInputStream(
>                     "4218.pdf")).close();
>         }
> Nonseq:
> real  0m23.691s
> Seq:
> real  0m9.705s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-2860) NonSeq parser slower than Seq parser

Reply via email to