[
https://issues.apache.org/jira/browse/PDFBOX-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469065#comment-16469065
]
Tilman Hausherr commented on PDFBOX-4215:
-----------------------------------------
If you don't have enough memory and can't use the disc for a scratch file, then
you'll be limited. "Parse on demand" may be coming in the future, but we don't
know when. You might try https://github.com/torakiki/sambox this is a fork of
PDFBox.
> Get pages from a HTTP stream of a large pdf file
> ------------------------------------------------
>
> Key: PDFBOX-4215
> URL: https://issues.apache.org/jira/browse/PDFBOX-4215
> Project: PDFBox
> Issue Type: Wish
> Components: Parsing
> Affects Versions: 2.0.9
> Reporter: Alexandre
> Priority: Minor
>
> Hi Apache contributors,
> Suppose I have a very big pdf file and I want to split this file into file
> chunks (e.g. one file per page). I cannot load the entire file into memory
> and I cannot use the hard disk of the computer as described in the doc for
> large files... :D. But I still have the stream of the file, line by line.(on)
> I read that it is not feasible to get the pages of the pdf in order (because
> of the pdf specs), but is it feasible to load random pages if you read line
> by line and look for page breaks in pdfbox?
> Hagd, A.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]