[jira] [Updated] (PDFBOX-4215) Get pages from a HTTP stream of a large pdf file

Alexandre (JIRA) Wed, 09 May 2018 09:23:24 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexandre updated PDFBOX-4215:
------------------------------
    Description: 
Hi Apache contributors,

Suppose I have a very big pdf file and I want to split this file into file 
chunks (e.g. one file per page). I cannot load the entire file into memory and 
I cannot use the hard disk of the computer as described in the doc for large 
files... :D. But I still have the stream of the file, line by line.(on)

I read that it is not feasible to get the pages of the pdf in order (because of 
the pdf specs), but is it feasible to load random pages if you read line by 
line and look for page breaks. 

Is this implemented in pdfbox?

Hagd, A.

  was:
Hi Apache contributors,

Suppose I have a very big pdf file and I want to split this file into file 
chunks (e.g. one file per page). I cannot load the entire file into memory and 
I cannot use the hard disk of the computer as described in the doc for large 
files... :D. But I still have the stream of the file, line by line.(on)

I read that it is not possible to get in-order pages from a stream, but it is 
feasible to load random pages if you read line by line and look for page 
breaks. 

Is this implemented in pdfbox?

Hagd, A.


> Get pages from a HTTP stream of a large pdf file
> ------------------------------------------------
>
>                 Key: PDFBOX-4215
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4215
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing
>    Affects Versions: 2.0.9
>            Reporter: Alexandre
>            Priority: Minor
>
> Hi Apache contributors,
> Suppose I have a very big pdf file and I want to split this file into file 
> chunks (e.g. one file per page). I cannot load the entire file into memory 
> and I cannot use the hard disk of the computer as described in the doc for 
> large files... :D. But I still have the stream of the file, line by line.(on)
> I read that it is not feasible to get the pages of the pdf in order (because 
> of the pdf specs), but is it feasible to load random pages if you read line 
> by line and look for page breaks. 
> Is this implemented in pdfbox?
> Hagd, A.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (PDFBOX-4215) Get pages from a HTTP stream of a large pdf file

Reply via email to