[jira] [Commented] (PDFBOX-4215) Get pages from a HTTP stream of a large pdf file

Tilman Hausherr (JIRA) Wed, 09 May 2018 09:46:34 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469065#comment-16469065
 ]


Tilman Hausherr commented on PDFBOX-4215:
-----------------------------------------

If you don't have enough memory and can't use the disc for a scratch file, then 
you'll be limited. "Parse on demand" may be coming in the future, but we don't 
know when. You might try https://github.com/torakiki/sambox this is a fork of 
PDFBox.

> Get pages from a HTTP stream of a large pdf file
> ------------------------------------------------
>
>                 Key: PDFBOX-4215
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4215
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing
>    Affects Versions: 2.0.9
>            Reporter: Alexandre
>            Priority: Minor
>
> Hi Apache contributors,
> Suppose I have a very big pdf file and I want to split this file into file 
> chunks (e.g. one file per page). I cannot load the entire file into memory 
> and I cannot use the hard disk of the computer as described in the doc for 
> large files... :D. But I still have the stream of the file, line by line.(on)
> I read that it is not feasible to get the pages of the pdf in order (because 
> of the pdf specs), but is it feasible to load random pages if you read line 
> by line and look for page breaks in pdfbox?
> Hagd, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4215) Get pages from a HTTP stream of a large pdf file

Reply via email to