[jira] [Updated] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-07 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5067:

Attachment: patch_PDFBOX-5067_CreateVisibleSignature.a.txt

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch5067_CreateVisibleSignature.txt, 
> patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt, 
> patch_PDFBOX-5067_CreateVisibleSignature.a.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-07 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261037#comment-17261037
 ] 

Ralf Hauser commented on PDFBOX-5067:
-

Sorry, the 2nd use "doc" was forgotten - now it is there and should accelerate 
the execution [^patch5067_CreateVisibleSignature.txt]

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch5067_CreateVisibleSignature.txt, 
> patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt, 
> patch_PDFBOX-5067_CreateVisibleSignature.a.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to stable : PDFBox » PDFBox-Trunk-jdk15 #173

2021-01-07 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to stable : PDFBox » PDFBox-Trunk-jdk15 » Apache PDFBox examples #173

2021-01-07 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to stable : PDFBox » PDFBox-Trunk-jdk16 » Apache PDFBox examples #168

2021-01-07 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to stable : PDFBox » PDFBox-Trunk-jdk16 #168

2021-01-07 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5073) Enhance website to utilize new build tool

2021-01-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260823#comment-17260823
 ] 

ASF subversion and git services commented on PDFBOX-5073:
-

Commit 3fd6fc32180bd542a5eea72065c81e2ef3de9696 in pdfbox-docs's branch 
refs/heads/master from Maruan Sahyoun
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=3fd6fc3 ]

PDFBOX-5073: build references dynamically; update to https where supported


> Enhance website to utilize new build tool
> -
>
> Key: PDFBOX-5073
> URL: https://issues.apache.org/jira/browse/PDFBOX-5073
> Project: PDFBox
>  Issue Type: Task
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
>
> This shall track some enhancements to the website made possible using the new 
> build tools
> - utilize data files
> - update CSS/SASS
> - simplify page structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5073) Enhance website to utilize new build tool

2021-01-07 Thread Maruan Sahyoun (Jira)
Maruan Sahyoun created PDFBOX-5073:
--

 Summary: Enhance website to utilize new build tool
 Key: PDFBOX-5073
 URL: https://issues.apache.org/jira/browse/PDFBOX-5073
 Project: PDFBox
  Issue Type: Task
Reporter: Maruan Sahyoun
Assignee: Maruan Sahyoun


This shall track some enhancements to the website made possible using the new 
build tools

- utilize data files
- update CSS/SASS
- simplify page structure



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3330) Enhance and update PDFBox website & documentation

2021-01-07 Thread Maruan Sahyoun (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun resolved PDFBOX-3330.

Resolution: Fixed

Resolving this ticket. Further updates to the website will be tracked in newer 
tickets

> Enhance and update PDFBox website & documentation
> -
>
> Key: PDFBOX-3330
> URL: https://issues.apache.org/jira/browse/PDFBOX-3330
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Priority: Major
> Attachments: Bildschirmfoto von »2018-03-14 22-59-10«.png, 
> Bildschirmfoto von »2018-03-14 22-59-21«.png, PDFBox.Logo-0.1.0.png, 
> pdfbox-topbar.pdf, screenshot-1.png, toolbox.svg, topbar.png
>
>
> General purpose ticket to track enhancements to the website and documentation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5029) Tika - Issues extracting Arabic script from pdf

2021-01-07 Thread Christian (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260786#comment-17260786
 ] 

Christian  commented on PDFBOX-5029:


Hi Tilman, first of all Happy New Year - I have been very busy in the past 
weeks and only now I'm back on the issue of scraping PDF files using TIkka - I 
tried all the possible combinations - the only way to get the correct text is 
to copy and paste the PDF content in a txt file and run afterwards the script. 
If I do it with WORD there are still mistakes. In any case it won't solve the 
issue because I want to extract the text from the original PDF:

> Tika - Issues extracting Arabic script from pdf
> ---
>
> Key: PDFBOX-5029
> URL: https://issues.apache.org/jira/browse/PDFBOX-5029
> Project: PDFBox
>  Issue Type: Bug
> Environment: Windows - Anaconda / Spyder
>Reporter: Christian 
>Priority: Major
> Attachments: PDFBOX-5029-not-sorted-2.0.21.txt, 
> PDFBOX-5029-not-sorted-trunk.txt, PDFBOX-5029-sorted-2.0.21.txt, 
> PDFBOX-5029-sorted-trunk.txt, extracting_text_asian_pdf.py, test.pdf, 
> test_scraped.utf8
>
>
> I'm working on building a corpus of Uygur texts and some of the content is 
> coming from pdf files. I wrote a short python script to scrape text from pdf 
> using tika-python. The script is Arabic, and the output looks good but there 
> is one major problem: there are many missing spaces between words and I 
> really do not know how to address this issue. I am attaching a pdf file, the 
> script to scrape its text and the output (test_scraped.utf8). Thanks in 
> advance for your help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-07 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260713#comment-17260713
 ] 

Tilman Hausherr edited comment on PDFBOX-5067 at 1/7/21, 6:34 PM:
--

I looked at this and I don't get it - you load it into a local field "doc" and 
then do nothing with that object. (The old code isn't much better because there 
too, the file to be signed is opened twice) And while looking at this, I'm 
wondering why the "setVisibleSignDesigner" methods are all public. It makes no 
sense unless the author had bigger plans, e.g. passing a different file than 
the one to sign.


was (Author: tilman):
I looked at this and I don't get it - you load it into a local field "doc" and 
then do nothing with that one. (The old code isn't much better because in both 
cases, the file to be signed is opened twice) And while looking at this, I'm 
wondering why the "setVisibleSignDesigner" methods are all public. It makes no 
sense unless the author had bigger plans, e.g. passing a different file than 
the one to sign.

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch5067_CreateVisibleSignature.txt, 
> patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5067) make PDVisibleSignDesigner memory aware

2021-01-07 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260713#comment-17260713
 ] 

Tilman Hausherr commented on PDFBOX-5067:
-

I looked at this and I don't get it - you load it into a local field "doc" and 
then do nothing with that one. (The old code isn't much better because in both 
cases, the file to be signed is opened twice) And while looking at this, I'm 
wondering why the "setVisibleSignDesigner" methods are all public. It makes no 
sense unless the author had bigger plans, e.g. passing a different file than 
the one to sign.

> make PDVisibleSignDesigner memory aware
> ---
>
> Key: PDFBOX-5067
> URL: https://issues.apache.org/jira/browse/PDFBOX-5067
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: patch5067_CreateVisibleSignature.txt, 
> patch_PDFBOX-2512.txt, patch_PDFBOX-5067.txt
>
>
> PDFBOX-2512 might have failed earlier if I hadn't used
>   MemoryUsageSetting.setupMixed(1500)
> to limit the memory usage of PDDocument document to 15 MB in 
> CreateVisibleSignature in
>  
> a) setVisibleSignDesigner() and used the now memory-aware constructor of 
> PDVisibleSignDesigner
>     and
> b) in signPDF(), reused PDDocument
>    setTsaUrl(tsaUrl);
>    PDDocument doc = null;
>    if (null != visibleSignDesigner) {
>    doc = visibleSignDesigner.getDocument();
>    }
>    if (null == doc) {
>   doc = Loader.loadPDF(inputFile, memoryUsageSetting);
>    }
>    // creating output document and prepare the IO streams. 
>    ...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5072) java.lang.IndexOutOfBoundsException

2021-01-07 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5072.
-
  Assignee: Tilman Hausherr
Resolution: Fixed

> java.lang.IndexOutOfBoundsException
> ---
>
> Key: PDFBOX-5072
> URL: https://issues.apache.org/jira/browse/PDFBOX-5072
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel, Utilities
>Affects Versions: 2.0.22
>Reporter: Thomas B.
>Assignee: Tilman Hausherr
>Priority: Critical
> Fix For: 2.0.23, 3.0.0 PDFBox
>
> Attachments: image-2021-01-06-14-56-41-433.png
>
>
> I'm having a similar issue that have been fixed in PDFBOX-4969.
> In my case, the IndexOutOfBoundsException occurs inside PDNameTreeNode, and 
> not inside PDNumberTreeNode.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 81, Size: 81 at 
> java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
> java.util.ArrayList.get(ArrayList.java:429) at 
> org.apache.pdfbox.cos.COSArray.getObject(COSArray.java:188) at 
> org.apache.pdfbox.pdmodel.common.PDNameTreeNode.getNames(PDNameTreeNode.java:272)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.getIDTreeAsMap(PDFMergerUtility.java:1036)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.getIDTreeAsMap(PDFMergerUtility.java:1051)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.mergeIDTree(PDFMergerUtility.java:1008)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.appendDocument(PDFMergerUtility.java:877)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:459)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:346)
> {code}
> Unfortunately, I can't share the document that reproduce the issue.
> I tried the same approach to fix it :
> {code:java}
> public Map getNames() throws IOException
> {
>COSArray namesArray = node.getCOSArray(COSName.NAMES);
>if( namesArray != null )
>{
>  Map names = new LinkedHashMap();
>  if (namesArray.size() % 2 != 0)
>  {
>LOG.warn("Numbers array has odd size: " + namesArray.size());
>  }
>  for (int i = 0; i + 1 < namesArray.size(); i += 2)
>  {
>COSBase base = namesArray.getObject(i);
>if (!(base instanceof COSString))
>{
>   throw new IOException("Expected string, found " + base + " in name 
> tree at index " + i);
> {code}
> But I'm getting the IOException :
> {code:java}
> Caused by: java.io.IOException: Expected string, found COSDictionary{[...] in 
> name tree at index 0
> {code}
> And indeed, _namesArray_ contains a COSObject at first index :
> !image-2021-01-06-14-56-41-433.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4848) Automate building website without local install

2021-01-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260682#comment-17260682
 ] 

ASF subversion and git services commented on PDFBOX-4848:
-

Commit a4cbbfdbad85f6d466ac09b660989d4dac8b50fd in pdfbox-docs's branch 
refs/heads/master from Maruan Sahyoun
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=a4cbbfd ]

PDFBOX-4848: bump node/npm version to latest LTS


> Automate building website without local install
> ---
>
> Key: PDFBOX-4848
> URL: https://issues.apache.org/jira/browse/PDFBOX-4848
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Minor
>
> As discussed on the dev mailing list we are looking to utilize the [git - 
> .asf.yaml 
> features|https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features]
>  and/or other capabilities to simplify building the website without the need 
> to install the site generation locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5030) Create Migration guide for 3.0.0

2021-01-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260656#comment-17260656
 ] 

ASF subversion and git services commented on PDFBOX-5030:
-

Commit 5c65c5354a2ba298ef2ea4d5c0320404c5e4a8c5 in pdfbox-docs's branch 
refs/heads/master from Maruan Sahyoun
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=5c65c53 ]

PDFBOX-5030: fix typo


> Create Migration guide for 3.0.0
> 
>
> Key: PDFBOX-5030
> URL: https://issues.apache.org/jira/browse/PDFBOX-5030
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As to start educating about the migration efforts needed to get to 3.0.0 the 
> should be a migration guide (evolving over time) to prepare for the release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5030) Create Migration guide for 3.0.0

2021-01-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260654#comment-17260654
 ] 

ASF subversion and git services commented on PDFBOX-5030:
-

Commit 1ff145b0862fea1aca774b71fca40cc57b44d881 in pdfbox-docs's branch 
refs/heads/master from Maruan Sahyoun
[ https://gitbox.apache.org/repos/asf?p=pdfbox-docs.git;h=1ff145b ]

PDFBOX-5030: details for AcroForm fix up; add info for PDFBox app


> Create Migration guide for 3.0.0
> 
>
> Key: PDFBOX-5030
> URL: https://issues.apache.org/jira/browse/PDFBOX-5030
> Project: PDFBox
>  Issue Type: Task
>  Components: Documentation
>Reporter: Maruan Sahyoun
>Assignee: Maruan Sahyoun
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> As to start educating about the migration efforts needed to get to 3.0.0 the 
> should be a migration guide (evolving over time) to prepare for the release



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5068) OutOfMemory while signing large documents - continued

2021-01-07 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260435#comment-17260435
 ] 

Ralf Hauser commented on PDFBOX-5068:
-

Xmx12m was on the basis of CreateVisibleSignature.java

 

With CreateVisibleSignature2.java, it dies

java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:3236)
 at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
 at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
 at org.apache.fontbox.pfb.PfbParser.readPfbInput(PfbParser.java:171)
 at org.apache.fontbox.pfb.PfbParser.(PfbParser.java:101)
 at org.apache.fontbox.type1.Type1Font.createWithPFB(Type1Font.java:54)
 at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo.getType1Font(FileSystemFontProvider.java:267)
 at 
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider$FSFontInfo.getFont(FileSystemFontProvider.java:131)
 at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFont(FontMapperImpl.java:452)
 at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.findFontBoxFont(FontMapperImpl.java:392)
 at 
org.apache.pdfbox.pdmodel.font.FontMapperImpl.getFontBoxFont(FontMapperImpl.java:366)
 at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:146)
 at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:68)
 at 
org.apache.pdfbox.examples.signature.CreateVisibleSignature2.createVisualSignatureTemplate(CreateVisibleSignature2.java:398)
 at 
org.apache.pdfbox.examples.signature.CreateVisibleSignature2.signPDF(CreateVisibleSignature2.java:253)

For CreateVisibleSignature2.java, I had to increase to Xmx19m

 

> OutOfMemory while signing large documents - continued
> -
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: RandomAccessReadBufferDiag.java, minimum.pdf
>
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
>  -Xmx25m 1615/5925
>  -Xmx30m 2800/5925
>  -Xmx40m 3872/5925
>  -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> This assumes the patch of PDFBOX-5067 already in place - or using 
> CreateVisibleSignature2.java as starting point



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5072) java.lang.IndexOutOfBoundsException

2021-01-07 Thread Michael Klink (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260313#comment-17260313
 ] 

Michael Klink commented on PDFBOX-5072:
---

{quote}your will clients complain "but it opens with Adobe Reader!!1!"{quote}

That actually depends on one's clientele. In the context of PDF signatures I've 
had positive experiences with telling clients that while Adobe Reader displays 
broken documents, signatures on them are likely to be easier to manipulate, 
actually the act of signing may already change the way the contents are 
displayed. That usually silences any complaints.

> java.lang.IndexOutOfBoundsException
> ---
>
> Key: PDFBOX-5072
> URL: https://issues.apache.org/jira/browse/PDFBOX-5072
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel, Utilities
>Affects Versions: 2.0.22
>Reporter: Thomas B.
>Priority: Critical
> Fix For: 2.0.23, 3.0.0 PDFBox
>
> Attachments: image-2021-01-06-14-56-41-433.png
>
>
> I'm having a similar issue that have been fixed in PDFBOX-4969.
> In my case, the IndexOutOfBoundsException occurs inside PDNameTreeNode, and 
> not inside PDNumberTreeNode.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 81, Size: 81 at 
> java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
> java.util.ArrayList.get(ArrayList.java:429) at 
> org.apache.pdfbox.cos.COSArray.getObject(COSArray.java:188) at 
> org.apache.pdfbox.pdmodel.common.PDNameTreeNode.getNames(PDNameTreeNode.java:272)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.getIDTreeAsMap(PDFMergerUtility.java:1036)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.getIDTreeAsMap(PDFMergerUtility.java:1051)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.mergeIDTree(PDFMergerUtility.java:1008)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.appendDocument(PDFMergerUtility.java:877)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:459)
>  at 
> org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:346)
> {code}
> Unfortunately, I can't share the document that reproduce the issue.
> I tried the same approach to fix it :
> {code:java}
> public Map getNames() throws IOException
> {
>COSArray namesArray = node.getCOSArray(COSName.NAMES);
>if( namesArray != null )
>{
>  Map names = new LinkedHashMap();
>  if (namesArray.size() % 2 != 0)
>  {
>LOG.warn("Numbers array has odd size: " + namesArray.size());
>  }
>  for (int i = 0; i + 1 < namesArray.size(); i += 2)
>  {
>COSBase base = namesArray.getObject(i);
>if (!(base instanceof COSString))
>{
>   throw new IOException("Expected string, found " + base + " in name 
> tree at index " + i);
> {code}
> But I'm getting the IOException :
> {code:java}
> Caused by: java.io.IOException: Expected string, found COSDictionary{[...] in 
> name tree at index 0
> {code}
> And indeed, _namesArray_ contains a COSObject at first index :
> !image-2021-01-06-14-56-41-433.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5068) OutOfMemory while signing large documents - continued

2021-01-07 Thread Ralf Hauser (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260305#comment-17260305
 ] 

Ralf Hauser commented on PDFBOX-5068:
-

Just for reference, did a test with [^minimum.pdf]  . It still creates a 
correctly signed pdf with -Xmx as low as {color:#de350b}_*12*_{color}m .

Conclusion: albeit theoretically probably reachable, pdfbox is a long way away 
from "constant memory signing"

> OutOfMemory while signing large documents - continued
> -
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: RandomAccessReadBufferDiag.java, minimum.pdf
>
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
>  -Xmx25m 1615/5925
>  -Xmx30m 2800/5925
>  -Xmx40m 3872/5925
>  -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> This assumes the patch of PDFBOX-5067 already in place - or using 
> CreateVisibleSignature2.java as starting point



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5068) OutOfMemory while signing large documents - continued

2021-01-07 Thread Ralf Hauser (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ralf Hauser updated PDFBOX-5068:

Attachment: minimum.pdf

> OutOfMemory while signing large documents - continued
> -
>
> Key: PDFBOX-5068
> URL: https://issues.apache.org/jira/browse/PDFBOX-5068
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Signing
>Affects Versions: 2.0.23
>Reporter: Ralf Hauser
>Priority: Major
> Attachments: RandomAccessReadBufferDiag.java, minimum.pdf
>
>
> Continuation of PDFBOX-2512
>  
> in COSWriter.prepareIncrement(), for the test case 
> cosDoc.getXrefTable().keySet() has size 5925. For each of thes keys, 
> cosDoc.getObjectFromPool() gets an object that is not just referencing some 
> part of the input document, but duplicates it (which is unavoidable in the 
> case where they are decompressed with FlateFilter - albeit this could 
> possibly be done "lazy")
> -Xmx20m  746/5925
>  -Xmx25m 1615/5925
>  -Xmx30m 2800/5925
>  -Xmx40m 3872/5925
>  -Xmx55m 5773/5925
> With 60m, it gets them all, but dies later with less telling
>    java.lang.OutOfMemoryError: GC overhead limit exceeded
>  
> This assumes the patch of PDFBOX-5067 already in place - or using 
> CreateVisibleSignature2.java as starting point



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org