subject:"\[jira\] \[Commented\] \(PDFBOX\-4297\) Allow to space efficiently analyse large PDFs"

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257744#comment-17257744
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1885065 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885065 ]

PDFBOX-4297: Sonar fix

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257722#comment-17257722
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1885054 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885054 ]

PDFBOX-4297: use stream instead of byte buffer to handle huge files with a 
small memory footprint

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1".

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257700#comment-17257700
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

I've tried with your file, it takes almost two minutes to download the file. 
However after that the content is there. The rest is done in 1 second.

I'll see what happens when going with streams.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Ralf Hauser (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257693#comment-17257693
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

Please find with programWinter2015_20210103_091853-sig_LTV.pdf a bigger test 
file.

> We could change the code of ShowSignature, but then we'd probably get 
> criticism for being slow.
if streams are properly implemented, they might even be quicker as there will 
not be any memory-pages swapping by the operating system

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257685#comment-17257685
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

his is sample code, so nothing prevents you to write your own. I assume you 
mean the "buf" parameter. This is used only once, to calculate the digest. To 
calculate the digest from a stream, see here
https://stackoverflow.com/a/304350/535646

We could change the code of ShowSignature, but then we'd probably get criticism 
for being slow.

About the missing method, did you try to implement it yourself? Yes maybe we 
could do that, but I think getSignedContentAsStream() would be better.


> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-02 Thread Ralf Hauser (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257682#comment-17257682
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

Re 3b) "whether it is signed" [correctly]

Looking at ShowSignature.java unfortunately, it is not yet memory-efficient.
For example in ShowSignature.checkContentValueWithFile(File file, int[] 
byteRange, byte[] contents), the memory usage grows linearly with the file size 
due to the contents byte-array.

But there is hope since
a) in showSignature() when 
switch (subFilter)
is executed, the "adbe.*" convert the "byte[] contents" back into a stream. 
(albeit I do not see that in this case, it is verified whether the document 
is altered or not)
b) verifyPKCS7() probably could work with a stream instead of "byte[] contents" 
because the
bouncycastle classes also have stream approaches.
(CMSSignedData has constructors with streams instead of byte[] )

So to begin, 
i) PDSignature.getContents(InputStream pdfFile) should be amended with a sibling
   
 public InputStream getSignedContentStream(InputStream pdfFile) throws 
IOException
{
try (COSFilterInputStream fis = new COSFilterInputStream(pdfFile, 
getByteRange()))
{
return fis;
}
}

ii) verifyETSIdotRFC3161() should be refactored to work with streams and not 
the content byte[]

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-26 Thread Michael Klink (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220923#comment-17220923
 ] 

Michael Klink commented on PDFBOX-4297:
---

You cannot guarantee that you need less than 5 MB.

For example, one can simply blow up the *Catalog* object alone to more than 5 
MB by adding a lot of simple entries whose sizes add up to more than 5 MB.

This example is not a common case, but if you have to handle arbitrary inputs 
from the wild, you have to keep this possibility in mind as base of a possible 
DOS attack.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-26 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220916#comment-17220916
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

1) no test method. The EmbeddedFiles.java example shows how to add. This can be 
used to see what files there are.
2) there are test methods in the main project TestSymmetricKeyEncryption.java 
and TestPublicKeyEncryption.java
3) no test method but ShowSignature.java has most of this

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-26 Thread Ralf Hauser (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220789#comment-17220789
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

RE comment-17211657 

What are the test methods for point 1) - 3)  ?

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211657#comment-17211657
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

[~hau...@acm.org] now that this is done, are you able to analyze a big file 
with only 5 MB ?

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211649#comment-17211649
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882388 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1882388 ]

PDFBOX-4297: remove unused import

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211648#comment-17211648
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882387 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1882387 ]

PDFBOX-4297: remove unused import

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211645#comment-17211645
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882386 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1882386 ]

PDFBOX-4297: use the new method to get the signature contents

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211644#comment-17211644
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882385 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1882385 ]

PDFBOX-4297: use the new method to get the signature contents

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211641#comment-17211641
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882384 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1882384 ]

PDFBOX-4297: introduce new method to get the signature contents

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211640#comment-17211640
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882383 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1882383 ]

PDFBOX-4297: introduce new method to get the signature contents; remove line 
forgotten in a previous refactoring

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211626#comment-17211626
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882382 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1882382 ]

PDFBOX-4297: introduce fast method to get the signature contents, as suggested 
by Ralf Hauser

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211625#comment-17211625
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1882381 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1882381 ]

PDFBOX-4297: introduce fast method to get the signature contents, as suggested 
by Ralf Hauser

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-06 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208852#comment-17208852
 ] 

Andreas Lehmkühler commented on PDFBOX-4297:


It looks like those methods got a wrong name, it should rather be something 
like "getSigContentFromPDF".  How about adding a "getContents" method without 
parameter using the code Tilman proposed?

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-05 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208198#comment-17208198
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

{\{code}}
COSString contents = (COSString) 
sig.getCOSObject().getDictionaryObject(COSName.CONTENTS);
byte [] ba = contents.getBytes();
{\{code}}
But I guess you'd like a direct PDSignature method. I have no idea why the two 
methods we are offering read the whole file.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-10-05 Thread Ralf Hauser (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207887#comment-17207887
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

4) it would be great to obtain the PKCS7 signature-block (e.g. as byte-array) 
as return object

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-07-24 Thread Tilman Hausherr (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164537#comment-17164537
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

[~hau...@acm.org] the trunk now has an on-demand parser.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-07-24 Thread Ralf Hauser (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164485#comment-17164485
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

see also the inverse [https://github.com/bcgit/bc-java/issues/326]

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2020-01-19 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018907#comment-17018907
 ] 

Andreas Lehmkühler commented on PDFBOX-4297:


PDFBOX-4569 implements an on demand parser which might help. It is still under 
development but IMHO a huge step forward especially in cases like this

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2018-08-22 Thread Tilman Hausherr (JIRA)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589087#comment-16589087
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

I think that what you want is parsing on demand. If this would exist, then 
creating 1+2+3 would just be a tool like many others.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

26 matches

Site Navigation

Mail list logo

Footer information