[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257744#comment-17257744 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1885065 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1885065 ] PDFBOX-4297: Sonar fix > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > Attachments: programWinter2015_20210103_091853-sig_LTV.pdf > > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257722#comment-17257722 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1885054 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1885054 ] PDFBOX-4297: use stream instead of byte buffer to handle huge files with a small memory footprint > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > Attachments: programWinter2015_20210103_091853-sig_LTV.pdf > > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709 ] Tilman Hausherr commented on PDFBOX-4297: - I was able to check your file with streams in the same amount of time. Btw the proposed method doesn't work because it returns a closed input stream. There is more work to do, the method you mentioned, and then the check for "adbe.x509.rsa_sha1". > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > Attachments: programWinter2015_20210103_091853-sig_LTV.pdf > > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257700#comment-17257700 ] Tilman Hausherr commented on PDFBOX-4297: - I've tried with your file, it takes almost two minutes to download the file. However after that the content is there. The rest is done in 1 second. I'll see what happens when going with streams. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > Attachments: programWinter2015_20210103_091853-sig_LTV.pdf > > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257693#comment-17257693 ] Ralf Hauser commented on PDFBOX-4297: - Please find with programWinter2015_20210103_091853-sig_LTV.pdf a bigger test file. > We could change the code of ShowSignature, but then we'd probably get > criticism for being slow. if streams are properly implemented, they might even be quicker as there will not be any memory-pages swapping by the operating system > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > Attachments: programWinter2015_20210103_091853-sig_LTV.pdf > > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257685#comment-17257685 ] Tilman Hausherr commented on PDFBOX-4297: - his is sample code, so nothing prevents you to write your own. I assume you mean the "buf" parameter. This is used only once, to calculate the digest. To calculate the digest from a stream, see here https://stackoverflow.com/a/304350/535646 We could change the code of ShowSignature, but then we'd probably get criticism for being slow. About the missing method, did you try to implement it yourself? Yes maybe we could do that, but I think getSignedContentAsStream() would be better. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257682#comment-17257682 ] Ralf Hauser commented on PDFBOX-4297: - Re 3b) "whether it is signed" [correctly] Looking at ShowSignature.java unfortunately, it is not yet memory-efficient. For example in ShowSignature.checkContentValueWithFile(File file, int[] byteRange, byte[] contents), the memory usage grows linearly with the file size due to the contents byte-array. But there is hope since a) in showSignature() when switch (subFilter) is executed, the "adbe.*" convert the "byte[] contents" back into a stream. (albeit I do not see that in this case, it is verified whether the document is altered or not) b) verifyPKCS7() probably could work with a stream instead of "byte[] contents" because the bouncycastle classes also have stream approaches. (CMSSignedData has constructors with streams instead of byte[] ) So to begin, i) PDSignature.getContents(InputStream pdfFile) should be amended with a sibling public InputStream getSignedContentStream(InputStream pdfFile) throws IOException { try (COSFilterInputStream fis = new COSFilterInputStream(pdfFile, getByteRange())) { return fis; } } ii) verifyETSIdotRFC3161() should be refactored to work with streams and not the content byte[] > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220923#comment-17220923 ] Michael Klink commented on PDFBOX-4297: --- You cannot guarantee that you need less than 5 MB. For example, one can simply blow up the *Catalog* object alone to more than 5 MB by adding a lot of simple entries whose sizes add up to more than 5 MB. This example is not a common case, but if you have to handle arbitrary inputs from the wild, you have to keep this possibility in mind as base of a possible DOS attack. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220916#comment-17220916 ] Tilman Hausherr commented on PDFBOX-4297: - 1) no test method. The EmbeddedFiles.java example shows how to add. This can be used to see what files there are. 2) there are test methods in the main project TestSymmetricKeyEncryption.java and TestPublicKeyEncryption.java 3) no test method but ShowSignature.java has most of this > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220789#comment-17220789 ] Ralf Hauser commented on PDFBOX-4297: - RE comment-17211657 What are the test methods for point 1) - 3) ? > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211657#comment-17211657 ] Tilman Hausherr commented on PDFBOX-4297: - [~hau...@acm.org] now that this is done, are you able to analyze a big file with only 5 MB ? > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211649#comment-17211649 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882388 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1882388 ] PDFBOX-4297: remove unused import > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211648#comment-17211648 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882387 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1882387 ] PDFBOX-4297: remove unused import > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211645#comment-17211645 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882386 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1882386 ] PDFBOX-4297: use the new method to get the signature contents > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211644#comment-17211644 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882385 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1882385 ] PDFBOX-4297: use the new method to get the signature contents > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211641#comment-17211641 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882384 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1882384 ] PDFBOX-4297: introduce new method to get the signature contents > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211640#comment-17211640 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882383 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1882383 ] PDFBOX-4297: introduce new method to get the signature contents; remove line forgotten in a previous refactoring > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211626#comment-17211626 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882382 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1882382 ] PDFBOX-4297: introduce fast method to get the signature contents, as suggested by Ralf Hauser > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211625#comment-17211625 ] ASF subversion and git services commented on PDFBOX-4297: - Commit 1882381 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1882381 ] PDFBOX-4297: introduce fast method to get the signature contents, as suggested by Ralf Hauser > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208852#comment-17208852 ] Andreas Lehmkühler commented on PDFBOX-4297: It looks like those methods got a wrong name, it should rather be something like "getSigContentFromPDF". How about adding a "getContents" method without parameter using the code Tilman proposed? > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208198#comment-17208198 ] Tilman Hausherr commented on PDFBOX-4297: - {\{code}} COSString contents = (COSString) sig.getCOSObject().getDictionaryObject(COSName.CONTENTS); byte [] ba = contents.getBytes(); {\{code}} But I guess you'd like a direct PDSignature method. I have no idea why the two methods we are offering read the whole file. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207887#comment-17207887 ] Ralf Hauser commented on PDFBOX-4297: - 4) it would be great to obtain the PKCS7 signature-block (e.g. as byte-array) as return object > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164537#comment-17164537 ] Tilman Hausherr commented on PDFBOX-4297: - [~hau...@acm.org] the trunk now has an on-demand parser. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164485#comment-17164485 ] Ralf Hauser commented on PDFBOX-4297: - see also the inverse [https://github.com/bcgit/bc-java/issues/326] > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018907#comment-17018907 ] Andreas Lehmkühler commented on PDFBOX-4297: PDFBOX-4569 implements an on demand parser which might help. It is still under development but IMHO a huge step forward especially in cases like this > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs
[ https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589087#comment-16589087 ] Tilman Hausherr commented on PDFBOX-4297: - I think that what you want is parsing on demand. If this would exist, then creating 1+2+3 would just be a tool like many others. > Allow to space efficiently analyse large PDFs > - > > Key: PDFBOX-4297 > URL: https://issues.apache.org/jira/browse/PDFBOX-4297 > Project: PDFBox > Issue Type: Improvement >Reporter: Ralf Hauser >Priority: Major > > Assume you get a 300+MB large pdf and need to know > 1) the file names of embedded files if any > 2) whether it is encrypted (symmetric or asymmetric) > 3) certification level (and whether it is signed) > This should not use more than 5 MB (extra) memory > > P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html "Handle > large PDF files" > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org