[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258030#comment-17258030
 ] 

Ralf Hauser commented on PDFBOX-5067:
-

Thanks for the commit of the "uncontroversial".

Tried to make the 2nd half also acceptable with [^patch_PDFBOX-5067.txt]

What do you think ?

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5067:

Attachment: patch_PDFBOX-5067.txt

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257950#comment-17257950
 ] 

Tilman Hausherr commented on PDFBOX-5067:
-

I wanted to commit the "uncontroversial" parts before the new year "inbox 
madness" at works starts. Does this help?

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257949#comment-17257949
 ] 

ASF subversion and git services commented on PDFBOX-5067:
-

Commit 1885091 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885091 ]

PDFBOX-5067: allow the passing of a MemoryUsageSetting, as suggested by Ralf 
Hauser

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5059) java.io.IOException: expected number, actual=COSFloat{18446744073521659909} at offset 4932600

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257937#comment-17257937
 ] 

Tilman Hausherr commented on PDFBOX-5059:
-

It means that your PDF does not respect the PDF specification, due to a 
software bug in the software that created the file. The related issue 
PDFBOX-4495 has an explanation. Also try opening your file with NOTEPAD++ and 
go to byte offset (not line offset!) 4932600 and see how it looks.

Maybe the Adobe Viewer displays the file, but this only means that it has a 
better error recovery.

> java.io.IOException: expected number, actual=COSFloat{18446744073521659909} 
> at offset 4932600
> -
>
> Key: PDFBOX-5059
> URL: https://issues.apache.org/jira/browse/PDFBOX-5059
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
> Environment: linux
>Reporter: Ling Hock Hin, Daniel
>Priority: Major
>
> Encountered this error while trying to upload pdf. Seems to apply only for 
> certain pdfs. Can't share more due to confidentiality.
>  
> java.io.IOException: expected number, actual=COSFloat\{18446744073521659909} 
> at offset 4932600java.io.IOException: expected number, 
> actual=COSFloat\{18446744073521659909} at offset 4932600 at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:162)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:274)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:207)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:854) at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:757) at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:617) at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1093) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5059) java.io.IOException: expected number, actual=COSFloat{18446744073521659909} at offset 4932600

2021-01-03 Thread Ling Hock Hin, Daniel (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257883#comment-17257883
 ] 

Ling Hock Hin, Daniel commented on PDFBOX-5059:
---

May I have an brief description of what this error is/means so that I can 
explain to people I am working with?

> java.io.IOException: expected number, actual=COSFloat{18446744073521659909} 
> at offset 4932600
> -
>
> Key: PDFBOX-5059
> URL: https://issues.apache.org/jira/browse/PDFBOX-5059
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.3
> Environment: linux
>Reporter: Ling Hock Hin, Daniel
>Priority: Major
>
> Encountered this error while trying to upload pdf. Seems to apply only for 
> certain pdfs. Can't share more due to confidentiality.
>  
> java.io.IOException: expected number, actual=COSFloat\{18446744073521659909} 
> at offset 4932600java.io.IOException: expected number, 
> actual=COSFloat\{18446744073521659909} at offset 4932600 at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:162)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:274)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:207)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:854) at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:757) at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:726)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:657)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:617) at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:215) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:249) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1093) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257857#comment-17257857
 ] 

Ralf Hauser commented on PDFBOX-5067:
-

Thanks for the feedback

 

I tested it on the basis of CreateVisibleSignature2.java ( also with setupMixed 
15MB )

It needed 8 MB more Xmx (==> 69m). But PDDocument was only loaded once, so in 
this case, the getter is not used. Although when you close 
PDVisibleSignDesigner it would be deallocated anyway.

 

> Some of the constructors set that field, some don't (the one that calls 
> {{calculatePageSizeFromStream}})
I only added it where the test did a Loader.loadPDF() - there may well be more 
places it could be added. But still, I assume it is quicker and not that worse 
for the memory if the load only happens once.

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257834#comment-17257834
 ] 

Tilman Hausherr commented on PDFBOX-5067:
-

- The old PDVisibleSignDesigner should call the new one to avoid double code
- The new field should be on top
- Why the new "PDDocument" field? You're exposing internals, and this prevents 
closing of the document.
- Some of the constructors set that field, some don't (the one that calls 
{{calculatePageSizeFromStream}})

Why not use the CreateVisibleSignature2.java as starting point? This is easier 
to understand.

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch_PDFBOX-2512.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5068) OutOfMemory while signing large documents - continued

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5068:

Description: 
Continuation of PDFBOX-2512

 

in COSWriter.prepareIncrement(), for the test case 
cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
cosDoc.getObjectFromPool() gets an object that is not just referencing some 
part of the input document, but duplicates it (which is unavoidable in the case 
where they are decompressed with FlateFilter - albeit this could possibly be 
done "lazy")

-Xmx20m  746/5925
 -Xmx25m 1615/5925
 -Xmx30m 2800/5925
 -Xmx40m 3872/5925
 -Xmx55m 5773/5925

With 60m, it gets them all, but dies later with less telling

   java.lang.OutOfMemoryError: GC overhead limit exceeded

 

This assumes the patch of PDFBOX-5067 already in place

  was:
Continuation of PDFBOX-2512

 

in COSWriter.prepareIncrement(), for the test case 
cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
cosDoc.getObjectFromPool() gets an object that is not just referencing some 
part of the input document, but duplicates it (which is unavoidable in the case 
where they are decompressed with FlateFilter - albeit this could possibly be 
done "lazy")

-Xmx20m  746/5925
-Xmx25m 1615/5925
-Xmx30m 2800/5925
-Xmx40m 3872/5925
-Xmx55m 5773/5925

With 60m, it gets them all, but dies later with less telling

   java.lang.OutOfMemoryError: GC overhead limit exceeded

 


> OutOfMemory while signing large documents - continued
> -
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
>  -Xmx25m 1615/5925
>  -Xmx30m 2800/5925
>  -Xmx40m 3872/5925
>  -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> This assumes the patch of PDFBOX-5067 already in place



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5068) OutOfMemory while signing large documents - continued

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5068:

Summary: OutOfMemory while signing large documents - continued  (was: 
OutOfMemory while signing large documents)

> OutOfMemory while signing large documents - continued
> -
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
> -Xmx25m 1615/5925
> -Xmx30m 2800/5925
> -Xmx40m 3872/5925
> -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5068) OutOfMemory while signing large documents

2021-01-03 Thread Ralf Hauser (Jira)
Ralf Hauser created PDFBOX-5068:
---

 Summary: OutOfMemory while signing large documents
 Key: PDFBOX-5068
 URL: https://issues.apache.org/jira/browse/PDFBOX-5068
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 2.0.23
Reporter: Ralf Hauser


Continuation of PDFBOX-2512

 

in COSWriter.prepareIncrement(), for the test case 
cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
cosDoc.getObjectFromPool() gets an object that is not just referencing some 
part of the input document, but duplicates it (which is unavoidable in the case 
where they are decompressed with FlateFilter - albeit this could possibly be 
done "lazy")

-Xmx20m  746/5925
-Xmx25m 1615/5925
-Xmx30m 2800/5925
-Xmx40m 3872/5925
-Xmx55m 5773/5925

With 60m, it gets them all, but dies later with less telling

   java.lang.OutOfMemoryError: GC overhead limit exceeded

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2602) Enhance command line tools

2021-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257765#comment-17257765
 ] 

ASF subversion and git services commented on PDFBOX-2602:
-

Commit 1885066 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885066 ]

PDFBOX-2602: remove version override

> Enhance command line tools
> --
>
> Key: PDFBOX-2602
> URL: https://issues.apache.org/jira/browse/PDFBOX-2602
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 1.8.8, 2.0.0
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
> Fix For: 3.0.0 PDFBox
>
>
> The command line tools shall be enhanced to have the same behavior across all 
> tools.
> From the discussion on the dev mailing list
> - add an -h option to print the usage
> - print the usage to System.err and use an exit code of 1 if there was an 
> invalid command line parameter
> - print messages on exceptions to System.err
> - rethrow the exception so java can handle it if it will terminate afterwards 
> anyway
> - use an exit code of 1if rethrowing doesn't make sense
> Additional input:
> https://clig.dev/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257744#comment-17257744
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1885065 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885065 ]

PDFBOX-4297: Sonar fix

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5030) Create Migration guide for 3.0.0

2021-01-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257728#comment-17257728
 ] 

Andreas Lehmkühler commented on PDFBOX-5030:


(y) thanks for starting this

> Create Migration guide for 3.0.0
> 
>
> Key: PDFBOX-5030
> URL: https://issues.apache.org/jira/browse/PDFBOX-5030
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As to start educating about the migration efforts needed to get to 3.0.0 the 
> should be a migration guide (evolving over time) to prepare for the release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5066) ShowSignature: say which digest algorithm was used, detect forged content

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257725#comment-17257725
 ] 

Tilman Hausherr commented on PDFBOX-5066:
-

1. {{certFromSignedData.getSigAlgName()}} returns "SHA256withRSA". I can change 
the success line to
{code}
System.out.println(certFromSignedData.getSigAlgName() + " signature verified");
{code}

2. the check is missing, because this is based on code from another project. 
Here's the segment currently:
{code}
case "adbe.x509.rsa_sha1":
{
// example: PDFBOX-2693.pdf
COSString certString = (COSString) 
sigDict.getDictionaryObject(COSName.CERT);
//TODO this could also be an array.
if (certString == null)
{
System.err.println("The /Cert certificate string is missing in the 
signature dictionary");
return;
}
byte[] certData = certString.getBytes();
CertificateFactory factory = CertificateFactory.getInstance("X.509");
ByteArrayInputStream certStream = new ByteArrayInputStream(certData);
Collection certs = 
factory.generateCertificates(certStream);
System.out.println("certs=" + certs);

X509Certificate cert = (X509Certificate) certs.iterator().next();

// https://forums.adobe.com/thread/530277
// Contents = contains the crypted message digest
// Cert = contains the X509 certificate

// to verify signature, see code at
// https://stackoverflow.com/questions/43383859/

// inspired by:
// 
https://www.programcreek.com/java-api-examples/index.php?source_dir=pades_signing_2.1.5-master/src/main/java/com/opentrust/spi/pdf/PDFEnvelopedSignature.java
// 
https://github.com/OpenTrust/pades_signing_2.1.5/blob/master/src/main/java/com/opentrust/spi/pdf/PDFEnvelopedSignature.java
ASN1InputStream asn1IS = new ASN1InputStream(new 
ByteArrayInputStream(contents));
ASN1Primitive asn1prim = asn1IS.readObject();
if (!(asn1prim instanceof ASN1OctetString))
{
// 276434.pdf
throw new IOException("ASN1 octet string expected, but got " + 
asn1prim.getClass().getSimpleName());
}
ASN1OctetString oct = (ASN1OctetString) asn1prim;
Signature signature = Signature.getInstance("SHA1withRSA");
signature.initVerify(cert.getPublicKey());
int by;
while ((by = signedContentAsStream.read()) != -1)
{
signature.update((byte) by);
}
System.out.println("Verification result: " + 
signature.verify(oct.getOctets()));

// get digest algorithm
Cipher c = Cipher.getInstance("RSA/NONE/PKCS1Padding", 
SecurityProvider.getProvider());
c.init(Cipher.DECRYPT_MODE, cert.getPublicKey());
byte[] raw = c.doFinal(oct.getOctets());
DigestInfo di = DigestInfo.getInstance(raw);
String algID = di.getAlgorithmId().getAlgorithm().getId();


try
{
if (sig.getSignDate() != null)
{
cert.checkValidity(sig.getSignDate().getTime());
System.out.println("Certificate valid at signing time");
}
else
{
System.err.println("Certificate cannot be verified without signing 
time");
}
}
catch (CertificateExpiredException ex)
{
System.err.println("Certificate expired at signing time");
}
catch (CertificateNotYetValidException ex)
{
System.err.println("Certificate not yet valid at signing time");
}
if (CertificateVerifier.isSelfSigned(cert))
{
System.err.println("Certificate for " + 
cert.getSubjectX500Principal().getName() + " is self-signed, LOL!");
}
else
{
System.out.println("Certificate is not self-signed");

if (sig.getSignDate() != null)
{
@SuppressWarnings("unchecked")
Store store = new 
JcaCertStore(certs);
SigUtils.verifyCertificateChain(store, cert, 
sig.getSignDate().getTime());
}
}
break;
{code}

> ShowSignature: say which digest algorithm was used, detect forged content
> -
>
> Key: PDFBOX-5066
> URL: https://issues.apache.org/jira/browse/PDFBOX-5066
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Minor
>
> 1) SHA256 is was used by the signer to get the content digests of 
> target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
> mentioned like 
>      System.out.println("Signature found");
>  so maybe 
>      System.out.println("Signature algorithm: "+algo);
>  where 'algo' is for example "sha256WithRSAEncryption" (as per 
> [http://oidref.com/1.2.840.113549.1.1.11])
> 2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content 
> is altered.
>  
> See also PDFBOX-4297



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257722#comment-17257722
 ] 

ASF subversion and git services commented on PDFBOX-4297:
-

Commit 1885054 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1885054 ]

PDFBOX-4297: use stream instead of byte buffer to handle huge files with a 
small memory footprint

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2512) OutOfMemory while signing large documents

2021-01-03 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257721#comment-17257721
 ] 

Ralf Hauser commented on PDFBOX-2512:
-

ok, let's address this in PDFBOX-5067

> OutOfMemory while signing large documents
> -
>
> Key: PDFBOX-2512
> URL: https://issues.apache.org/jira/browse/PDFBOX-2512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Signing
>Affects Versions: 1.8.7
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
>Priority: Major
> Fix For: 1.8.8
>
> Attachments: keystore.p12
>
>
> While working with large documents, we found some memory issues.
> 1. The method close() in the COSDocument, clones the objectpool and does not 
> clean it properly. The cloning in getObjects() cause a OutOfMemory exception.
> 2.The COSWriter copy the whole pdf into the memory for signing and does not 
> use BufferedInputStream for the FileInputStream which also has a big 
> performance impact. (PDFBOX-1798)
> 3. The cloning of COSStreams cause a OutOfMemory exception
> I used the CreateSignature example with a about 150 MB big document from here:
> https://cdn-reichelt.de/bilder/downloads/reichelt_01-2015_DE_B_HQ.pdf
> Additionaly I add a RandomAccessFile to the PDDocument.load in the 
> CreateSignature class.
> PDDocument doc = PDDocument.load(document,new RandomAccessFile(new 
> File("d:\\temp.bin"), "rw")); (this prevent the OOM for the third case)
> The use of a BuffedInputStream in case two, will increase the signing speed 
> from more than 5 minutes to less than 1 minute. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-03 Thread Ralf Hauser (Jira)
Ralf Hauser created PDFBOX-5067:
---

 Summary: make PDVisibleSignDesigner memory aware
 Key: PDFBOX-5067
 URL: https://issues.apache.org/jira/browse/PDFBOX-5067
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 2.0.23
Reporter: Ralf Hauser
 Attachments: patch_PDFBOX-2512.txt

PDFBOX-2512 might have failed earlier if I hadn't used

  MemoryUsageSetting.setupMixed(1500)

to limit the memory usage of PDDocument document to 15 MB in 
CreateVisibleSignature in

 

a) setVisibleSignDesigner() and used the now memory-aware constructor of 
PDVisibleSignDesigner

    and

b) in signPDF(), reused PDDocument

   setTsaUrl(tsaUrl);
   PDDocument doc = null;
   if (null != visibleSignDesigner) {
   doc = visibleSignDesigner.getDocument();
   }
   if (null == doc) {
  doc = Loader.loadPDF(inputFile, memoryUsageSetting);
   }
   // creating output document and prepare the IO streams. 
   ...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709
 ] 

Tilman Hausherr edited comment on PDFBOX-4297 at 1/3/21, 10:55 AM:
---

I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1" (update: turns out that this one isn't part of our 
repository, because it contains code from another project).


was (Author: tilman):
I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1".

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2512) OutOfMemory while signing large documents

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257718#comment-17257718
 ] 

Tilman Hausherr commented on PDFBOX-2512:
-

This is a closed issue, please don't write there.

> OutOfMemory while signing large documents
> -
>
> Key: PDFBOX-2512
> URL: https://issues.apache.org/jira/browse/PDFBOX-2512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Signing
>Affects Versions: 1.8.7
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
>Priority: Major
> Fix For: 1.8.8
>
> Attachments: keystore.p12
>
>
> While working with large documents, we found some memory issues.
> 1. The method close() in the COSDocument, clones the objectpool and does not 
> clean it properly. The cloning in getObjects() cause a OutOfMemory exception.
> 2.The COSWriter copy the whole pdf into the memory for signing and does not 
> use BufferedInputStream for the FileInputStream which also has a big 
> performance impact. (PDFBOX-1798)
> 3. The cloning of COSStreams cause a OutOfMemory exception
> I used the CreateSignature example with a about 150 MB big document from here:
> https://cdn-reichelt.de/bilder/downloads/reichelt_01-2015_DE_B_HQ.pdf
> Additionaly I add a RandomAccessFile to the PDDocument.load in the 
> CreateSignature class.
> PDDocument doc = PDDocument.load(document,new RandomAccessFile(new 
> File("d:\\temp.bin"), "rw")); (this prevent the OOM for the third case)
> The use of a BuffedInputStream in case two, will increase the signing speed 
> from more than 5 minutes to less than 1 minute. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2512) OutOfMemory while signing large documents

2021-01-03 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257716#comment-17257716
 ] 

Ralf Hauser commented on PDFBOX-2512:
-

Did a quick test with [^programWinter2015_20210103_091853-sig_LTV.pdf] 35MB

 

when doing -Xmx70m , the signature works

with -Xmx50m 

  java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.lang.StringBuilder.toString(StringBuilder.java:407)
   at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1281)
   at 
org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1212)
   at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.privateReadObjectNumbers(PDFObjectStreamParser.java:104)
   at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parseObject(PDFObjectStreamParser.java:77)
   at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectStreamObject(COSParser.java:779)
   at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:637)
   at 
org.apache.pdfbox.pdfparser.COSParser.dereferenceCOSObject(COSParser.java:586)
   at org.apache.pdfbox.cos.COSObject.getObject(COSObject.java:115)
   at org.apache.pdfbox.pdfwriter.COSWriter.prepareIncrement(COSWriter.java:327)
   at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1425)
   at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:997)
   ...

 

with -Xmx30m 

  java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:3236)
   at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
   at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:218)
   at 
org.apache.pdfbox.pdfparser.PDFObjectStreamParser.(PDFObjectStreamParser.java:48)
   at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectStreamObject(COSParser.java:778)
   at 
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:637)
   at 
org.apache.pdfbox.pdfparser.COSParser.dereferenceCOSObject(COSParser.java:586)
   at org.apache.pdfbox.cos.COSObject.getObject(COSObject.java:115)
   at org.apache.pdfbox.pdfwriter.COSWriter.prepareIncrement(COSWriter.java:327)
   at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1425)
   at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:997)
   at 
org.apache.pdfbox.examples.signature.CreateVisibleSignature.signPDF(CreateVisibleSignature.java:...
   ...

> OutOfMemory while signing large documents
> -
>
> Key: PDFBOX-2512
> URL: https://issues.apache.org/jira/browse/PDFBOX-2512
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing, Signing
>Affects Versions: 1.8.7
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
>Priority: Major
> Fix For: 1.8.8
>
> Attachments: keystore.p12
>
>
> While working with large documents, we found some memory issues.
> 1. The method close() in the COSDocument, clones the objectpool and does not 
> clean it properly. The cloning in getObjects() cause a OutOfMemory exception.
> 2.The COSWriter copy the whole pdf into the memory for signing and does not 
> use BufferedInputStream for the FileInputStream which also has a big 
> performance impact. (PDFBOX-1798)
> 3. The cloning of COSStreams cause a OutOfMemory exception
> I used the CreateSignature example with a about 150 MB big document from here:
> https://cdn-reichelt.de/bilder/downloads/reichelt_01-2015_DE_B_HQ.pdf
> Additionaly I add a RandomAccessFile to the PDDocument.load in the 
> CreateSignature class.
> PDDocument doc = PDDocument.load(document,new RandomAccessFile(new 
> File("d:\\temp.bin"), "rw")); (this prevent the OOM for the third case)
> The use of a BuffedInputStream in case two, will increase the signing speed 
> from more than 5 minutes to less than 1 minute. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709
 ] 

Tilman Hausherr edited comment on PDFBOX-4297 at 1/3/21, 10:37 AM:
---

I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1".


was (Author: tilman):
I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1", and {{Signature.update()}} doesn't work with input 
streams.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709
 ] 

Tilman Hausherr edited comment on PDFBOX-4297 at 1/3/21, 9:58 AM:
--

I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1", and {{Signature.update()}} doesn't work with input 
streams.


was (Author: tilman):
I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1".

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257709#comment-17257709
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

I was able to check your file with streams in the same amount of time. Btw the 
proposed method doesn't work because it returns a closed input stream. There is 
more work to do, the method you mentioned, and then the check for 
"adbe.x509.rsa_sha1".

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5066) ShowSignature: say which digest algorithm was used, detect forged content

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5066:

Description: 
1) SHA256 is was used by the signer to get the content digests of 
target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
mentioned like 
     System.out.println("Signature found");
 so maybe 
     System.out.println("Signature algorithm: "+algo);
 where 'algo' is for example "sha256WithRSAEncryption" (as per 
[http://oidref.com/1.2.840.113549.1.1.11])

2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content is 
altered.

 

See also PDFBOX-4297

  was:
1) SHA256 is was used by the signer to get the content digests of 
target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
mentioned like 
    System.out.println("Signature found");
 so maybe 
    System.out.println("Signature algorithm: "+algo);
 where 'algo' is for example "sha256WithRSAEncryption" (as per 
[http://oidref.com/1.2.840.113549.1.1.11])


 2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content 
is altered.


> ShowSignature: say which digest algorithm was used, detect forged content
> -
>
> Key: PDFBOX-5066
> URL: https://issues.apache.org/jira/browse/PDFBOX-5066
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Minor
>
> 1) SHA256 is was used by the signer to get the content digests of 
> target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
> mentioned like 
>      System.out.println("Signature found");
>  so maybe 
>      System.out.println("Signature algorithm: "+algo);
>  where 'algo' is for example "sha256WithRSAEncryption" (as per 
> [http://oidref.com/1.2.840.113549.1.1.11])
> 2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content 
> is altered.
>  
> See also PDFBOX-4297



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257700#comment-17257700
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

I've tried with your file, it takes almost two minutes to download the file. 
However after that the content is there. The rest is done in 1 second.

I'll see what happens when going with streams.

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5066) ShowSignature: say which digest algorithm was used, detect forged content

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5066:

Description: 
1) SHA256 is was used by the signer to get the content digests of 
target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
mentioned like 
    System.out.println("Signature found");
 so maybe 
    System.out.println("Signature algorithm: "+algo);
 where 'algo' is for example "sha256WithRSAEncryption" (as per 
[http://oidref.com/1.2.840.113549.1.1.11])


 2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content 
is altered.

  was:
1) SHA256 is was used by the signer to get the content digests of  
target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
mentioned like 
   System.out.println("Signature found");
so maybe 
   System.out.println("Signature algorithm: "+algo);
where also is for example "sha256WithRSAEncryption" (as per 
http://oidref.com/1.2.840.113549.1.1.11)
2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content is 
altered.


> ShowSignature: say which digest algorithm was used, detect forged content
> -
>
> Key: PDFBOX-5066
> URL: https://issues.apache.org/jira/browse/PDFBOX-5066
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Minor
>
> 1) SHA256 is was used by the signer to get the content digests of 
> target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
> mentioned like 
>     System.out.println("Signature found");
>  so maybe 
>     System.out.println("Signature algorithm: "+algo);
>  where 'algo' is for example "sha256WithRSAEncryption" (as per 
> [http://oidref.com/1.2.840.113549.1.1.11])
>  2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content 
> is altered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5066) ShowSignature: say which digest algorithm was used, detect forged content

2021-01-03 Thread Ralf Hauser (Jira)
Ralf Hauser created PDFBOX-5066:
---

 Summary: ShowSignature: say which digest algorithm was used, 
detect forged content
 Key: PDFBOX-5066
 URL: https://issues.apache.org/jira/browse/PDFBOX-5066
 Project: PDFBox
  Issue Type: Improvement
  Components: Signing
Affects Versions: 2.0.23
Reporter: Ralf Hauser


1) SHA256 is was used by the signer to get the content digests of  
target/pdfs/notCertified_368835_Sig_en_201026090509.pdf , this should be 
mentioned like 
   System.out.println("Signature found");
so maybe 
   System.out.println("Signature algorithm: "+algo);
where also is for example "sha256WithRSAEncryption" (as per 
http://oidref.com/1.2.840.113549.1.1.11)
2) for subFilter="adbe.x509.rsa_sha1" it is not detected, if the pdf content is 
altered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257693#comment-17257693
 ] 

Ralf Hauser commented on PDFBOX-4297:
-

Please find with programWinter2015_20210103_091853-sig_LTV.pdf a bigger test 
file.

> We could change the code of ShowSignature, but then we'd probably get 
> criticism for being slow.
if streams are properly implemented, they might even be quicker as there will 
not be any memory-pages swapping by the operating system

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-4297:

Attachment: programWinter2015_20210103_091853-sig_LTV.pdf

> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: programWinter2015_20210103_091853-sig_LTV.pdf
>
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4297) Allow to space efficiently analyse large PDFs

2021-01-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257685#comment-17257685
 ] 

Tilman Hausherr commented on PDFBOX-4297:
-

his is sample code, so nothing prevents you to write your own. I assume you 
mean the "buf" parameter. This is used only once, to calculate the digest. To 
calculate the digest from a stream, see here
https://stackoverflow.com/a/304350/535646

We could change the code of ShowSignature, but then we'd probably get criticism 
for being slow.

About the missing method, did you try to implement it yourself? Yes maybe we 
could do that, but I think getSignedContentAsStream() would be better.


> Allow to space efficiently analyse large PDFs
> -
>
> Key: PDFBOX-4297
> URL: https://issues.apache.org/jira/browse/PDFBOX-4297
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Reporter: Ralf Hauser
>Priority: Major
>
> Assume you get a 300+MB large pdf and need to know
> 1) the file names of embedded files if any
> 2) whether it is encrypted (symmetric or asymmetric)
> 3) certification level (and whether it is signed)
> This should not use more than 5 MB (extra) memory
>  
> P.S.: seems to an exampe of https://pdfbox.apache.org/ideas.html  "Handle 
> large PDF files"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org