[
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220923#comment-17220923
]
Michael Klink commented on PDFBOX-4297:
---------------------------------------
You cannot guarantee that you need less than 5 MB.
For example, one can simply blow up the *Catalog* object alone to more than 5
MB by adding a lot of simple entries whose sizes add up to more than 5 MB.
This example is not a common case, but if you have to handle arbitrary inputs
from the wild, you have to keep this possibility in mind as base of a possible
DOS attack.
> Allow to space efficiently analyse large PDFs
> ---------------------------------------------
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Reporter: Ralf Hauser
> Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle
> large PDF files"
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]