[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257682#comment-17257682
 ] 

Ralf Hauser commented on PDFBOX-4297:
-------------------------------------

Re 3b) "whether it is signed" [correctly]

Looking at ShowSignature.java unfortunately, it is not yet memory-efficient.
For example in ShowSignature.checkContentValueWithFile(File file, int[] 
byteRange, byte[] contents), the memory usage grows linearly with the file size 
due to the contents byte-array.

But there is hope since
a) in showSignature() when 
        switch (subFilter)
    is executed, the "adbe.*" convert the "byte[] contents" back into a stream. 
    (albeit I do not see that in this case, it is verified whether the document 
is altered or not)
b) verifyPKCS7() probably could work with a stream instead of "byte[] contents" 
because the
    bouncycastle classes also have stream approaches.
    (CMSSignedData has constructors with streams instead of byte[] )

So to begin, 
i) PDSignature.getContents(InputStream pdfFile) should be amended with a sibling
   
 public InputStream getSignedContentStream(InputStream pdfFile) throws 
IOException
    {
        try (COSFilterInputStream fis = new COSFilterInputStream(pdfFile, 
getByteRange()))
        {
            return fis;
        }
    }

ii) verifyETSIdotRFC3161() should be refactored to work with streams and not 
the content byte[]

> Allow to space efficiently analyse large PDFs
> ---------------------------------------------
>
>                 Key: PDFBOX-4297
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4297
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>            Reporter: Ralf Hauser
>            Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to