[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220923#comment-17220923 ]
Michael Klink commented on PDFBOX-4297: --------------------------------------- You cannot guarantee that you need less than 5 MB. For example, one can simply blow up the *Catalog* object alone to more than 5 MB by adding a lot of simple entries whose sizes add up to more than 5 MB. This example is not a common case, but if you have to handle arbitrary inputs from the wild, you have to keep this possibility in mind as base of a possible DOS attack. > Allow to space efficiently analyse large PDFs > --------------------------------------------- > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing > Reporter: Ralf Hauser > Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org