[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061436#comment-15061436 ] Tilman Hausherr commented on PDFBOX-3166: - {quote} But it cannot eliminate space before the 1 , if I add setSpacingTolerance value. {quote} Because the space is really there, see the image. The spacing tolerance helps to decide where characters are seperated or not. You can play with that one if you have some special documents where words appear split or always together. Try it on a document with english text, there it will be more obvious because one word = several glyphs: depending on the value, the sentence I just wrote would be extracted as "Tryitonadocumentwithwesterntext" or "Tr y it on a doc ume nt w ith wes te rn te xt". No, there is no API to remove the space before the "1" because it really exists in the PDF. PDF files are created by a wide variety of software and there are often surprises. As I said, just add a trim() to each line. If you need more help, tell us what your application is about and why the space is a problem. > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Issue Comment Deleted] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Luo updated PDFBOX-3166: - Comment: was deleted (was: Text extraction is very sensitive to changes. Yes ,I see. Is there API can adjust space char to appear or not? I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space before the 1 , if I add setSpacingTolerance value. PDFTextStripper stripper = new PDFTextStripper(); stripper.setSpacingTolerance(800.0f); //0.08f If I reduce the setSpacingTolerance value , it did add space after date number. The rest is pretty good.) > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Reopened] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Luo reopened PDFBOX-3166: -- Text extraction is very sensitive to changes. Yes ,I see. Is there API can adjust space char to appear or not? I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space before the 1 , if I add setSpacingTolerance value. PDFTextStripper stripper = new PDFTextStripper(); stripper.setSpacingTolerance(800.0f); //0.08f If I reduce the setSpacingTolerance value , it did add space after date number. The rest is pretty good. > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061254#comment-15061254 ] Gang Luo commented on PDFBOX-3166: -- Text extraction is very sensitive to changes. Yes ,I see. Is there API can adjust space char to appear or not? I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space before the 1 , if I add setSpacingTolerance value. PDFTextStripper stripper = new PDFTextStripper(); stripper.setSpacingTolerance(800.0f); //0.08f If I reduce the setSpacingTolerance value , it did add space after date number. The rest is pretty good. > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature
[ https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061206#comment-15061206 ] ASF subversion and git services commented on PDFBOX-3169: - Commit 1720482 from [~tchojecki] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1720482 ] PDFBOX-3169 add an additional method, that handle the saveIncremental write for non-signature cases. It will copy the origin document and the incremental update into the given Outputstream. > SaveIncremental does not work without signature > --- > > Key: PDFBOX-3169 > URL: https://issues.apache.org/jira/browse/PDFBOX-3169 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.0 >Reporter: Thomas Chojecki >Assignee: Thomas Chojecki > Fix For: 2.0.0 > > Attachments: saveIncremental.patch > > > I know this feature is ongoing, but with the 2.0.0-RC builds the > saveIncremental (without signature) stop working at all. A > ByteArrayOutputStream is used in the COSWriter for output. This OutputStream > will only be handled in the case, when we write a signature. Otherwise the > whole content will be discarded. > As I wrote some time ago on the mailinglist, incremental update work in a > limited way. At the moment we use it for augmenting signatures and this works > with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3165) Tab characters in PDTextField cause error when using .flatten()
[ https://issues.apache.org/jira/browse/PDFBOX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061139#comment-15061139 ] Aaron Eischeid commented on PDFBOX-3165: is there a specific way maybe to use a font file that has more characters in it? What we are doing is using Libre Office to work on the doc and export to PDF. I have noticed is that if we do this on a Ubuntu machine then after PdfBox handles the PDF some windows users will see black circles instead of letters. That problem goes away if we do the export from a windows machine. The error we encountered with the missing tab char was on a PDF that had been generated on a windows machine. So the question is is there a way to tell Libre Office or whatever PDF generating thing to use/declare/embed a font that has a broader set of characters? Or should I use a different font setting in the form elements? Arial as a font choice feels pretty safe really. If that one doesn't work it is hard to imagine what would do better. > Tab characters in PDTextField cause error when using .flatten() > --- > > Key: PDFBOX-3165 > URL: https://issues.apache.org/jira/browse/PDFBOX-3165 > Project: PDFBox > Issue Type: Bug > Components: AcroForm, FontBox >Affects Versions: 2.0.0 > Environment: Ubuntu, JDK7 >Reporter: Aaron Eischeid > Attachments: Sample_Template.pdf > > > pdf form gets filled in, then call I call .flatten(fields, true) which last I > knew was undocumented, but anyway I needed the refreshAppearences for > pdfViewers that don't support acroForms like pdf.js > If a tab character some how gets entered into the PDTextField it chokes. I am > worried other somewhat common characters might have similar issues, but > haven't experimented so far. > Using RC2 of pdfBox and fontBox. and fonts in pdfForm elements were all set > to Arial. > Relavent stacktrace: > U+0009 is not available in this font's Encoding. Stacktrace follows: > java.lang.IllegalArgumentException: U+0009 is not available in this font's > Encoding > at > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:358) > at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:283) > at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:312) > at > org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:193) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144) > at > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:287) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.flatten(PDAcroForm.java:211) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3165) Tab characters in PDTextField cause error when using .flatten()
[ https://issues.apache.org/jira/browse/PDFBOX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Eischeid updated PDFBOX-3165: --- Attachment: Sample_Template.pdf I haven't had time to test this file specifically, but it is just a very stripped down version of the one we were having the issue with > Tab characters in PDTextField cause error when using .flatten() > --- > > Key: PDFBOX-3165 > URL: https://issues.apache.org/jira/browse/PDFBOX-3165 > Project: PDFBox > Issue Type: Bug > Components: AcroForm, FontBox >Affects Versions: 2.0.0 > Environment: Ubuntu, JDK7 >Reporter: Aaron Eischeid > Attachments: Sample_Template.pdf > > > pdf form gets filled in, then call I call .flatten(fields, true) which last I > knew was undocumented, but anyway I needed the refreshAppearences for > pdfViewers that don't support acroForms like pdf.js > If a tab character some how gets entered into the PDTextField it chokes. I am > worried other somewhat common characters might have similar issues, but > haven't experimented so far. > Using RC2 of pdfBox and fontBox. and fonts in pdfForm elements were all set > to Arial. > Relavent stacktrace: > U+0009 is not available in this font's Encoding. Stacktrace follows: > java.lang.IllegalArgumentException: U+0009 is not available in this font's > Encoding > at > org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:358) > at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:283) > at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:312) > at > org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:193) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237) > at > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144) > at > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:287) > at > org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.flatten(PDAcroForm.java:211) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3169) SaveIncremental does not work without signature
[ https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-3169: Fix Version/s: 2.0.0 > SaveIncremental does not work without signature > --- > > Key: PDFBOX-3169 > URL: https://issues.apache.org/jira/browse/PDFBOX-3169 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.0 >Reporter: Thomas Chojecki >Assignee: Thomas Chojecki > Fix For: 2.0.0 > > Attachments: saveIncremental.patch > > > I know this feature is ongoing, but with the 2.0.0-RC builds the > saveIncremental (without signature) stop working at all. A > ByteArrayOutputStream is used in the COSWriter for output. This OutputStream > will only be handled in the case, when we write a signature. Otherwise the > whole content will be discarded. > As I wrote some time ago on the mailinglist, incremental update work in a > limited way. At the moment we use it for augmenting signatures and this works > with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC
[ https://issues.apache.org/jira/browse/PDFBOX-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-3170: Priority: Minor (was: Major) > Created PDF does not open in Adobe Reader DC > > > Key: PDFBOX-3170 > URL: https://issues.apache.org/jira/browse/PDFBOX-3170 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 - current SNAPSHOT >Reporter: Philip Helger >Priority: Minor > Attachments: MainIssue3170.java, issue-3170.pdf > > > When creating a PDF with a single vry long line, the resulting PDF cannot > be opened in Adobe Reader DC. > The code is the same as in PDFBOX-3168 except that the string is 300 times > the length. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2941) Improve PDFDebugger (2)
[ https://issues.apache.org/jira/browse/PDFBOX-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2941: Description: This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came up in GSoC2015, ideas that were not implemented due to lack of time, and new ideas. - save modified PDFs - refactor PDFDebugger.java - render glyphs of fonts - editing in hex viewer - ✓ refactor StreamPane to share stream filtering among Text view and hex view - password dialog when hitting protected PDF - remove nodes (e.g. elements from a COSDictionary) - show "pretty" XML - delete array or dictionary elements - edit & keep content streams - load content streams - display filtered streams even if the unfiltered stream is corrupt (PDFBOX-2976) - ✓ display the "caused by" part exception stack trace (nested exceptions) - keep zoom - integrate DrawPrintTextLocations into rendering - integrate area text extraction with a mouse-created rectangle that shows the coordinates in a status line was: This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came up in GSoC2015, ideas that were not implemented due to lack of time, and new ideas. - save modified PDFs - refactor PDFDebugger.java - render glyphs of fonts - editing in hex viewer - ✓ refactor StreamPane to share stream filtering among Text view and hex view - password dialog when hitting protected PDF - remove nodes (e.g. elements from a COSDictionary) - show "pretty" XML - delete array or dictionary elements - edit & keep content streams - load content streams - display filtered streams even if the unfiltered stream is corrupt (PDFBOX-2976) - ✓ display the "caused by" part exception stack trace (nested exceptions) > Improve PDFDebugger (2) > --- > > Key: PDFBOX-2941 > URL: https://issues.apache.org/jira/browse/PDFBOX-2941 > Project: PDFBox > Issue Type: Improvement > Components: Utilities >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr > Attachments: gs-bugzilla694570.pdf, osx-tabs.png, > screenshot_debugger_new.png, screenshot_debugger_not_aligned.png, > screenshot_debugger_old.png, screenshot_w7_fontsize.png, > separate_filter_choice_from_text_hex_views.diff, sonar_qube_resolve.diff, > sonar_qube_resolve_25_08.diff > > > This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came > up in GSoC2015, ideas that were not implemented due to lack of time, and new > ideas. > - save modified PDFs > - refactor PDFDebugger.java > - render glyphs of fonts > - editing in hex viewer > - ✓ refactor StreamPane to share stream filtering among Text view and hex view > - password dialog when hitting protected PDF > - remove nodes (e.g. elements from a COSDictionary) > - show "pretty" XML > - delete array or dictionary elements > - edit & keep content streams > - load content streams > - display filtered streams even if the unfiltered stream is corrupt > (PDFBOX-2976) > - ✓ display the "caused by" part exception stack trace (nested exceptions) > - keep zoom > - integrate DrawPrintTextLocations into rendering > - integrate area text extraction with a mouse-created rectangle that shows > the coordinates in a status line -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060435#comment-15060435 ] Tilman Hausherr commented on PDFBOX-3168: - In your file, object 12 is also compressed. {code} 12 0 obj << /Filter /FlateDecode /Length 71 /Length1 186 >> stream xœ}‹W €0ÇnÔ5öûßÓ%Ñ·åA¢ O)V7MÝ6jjú=†Qò$ZžÀÌÂJdcçàäâNÿÏ&ú{ endstream endobj {code} > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf > > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060432#comment-15060432 ] Tilman Hausherr commented on PDFBOX-3168: - Ignore my earlier numbers, my file was different. However I reverted all my test changes. 12 is compressed. 5,6,7,8 are not streams. Are do you mean compressed object streams? We do read these, but I don't think we write them. > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf > > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-3168: Attachment: example.pdf > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf > > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC
[ https://issues.apache.org/jira/browse/PDFBOX-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Helger updated PDFBOX-3170: -- Attachment: MainIssue3170.java issue-3170.pdf The source file to reproduce the output and the created output. > Created PDF does not open in Adobe Reader DC > > > Key: PDFBOX-3170 > URL: https://issues.apache.org/jira/browse/PDFBOX-3170 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 - current SNAPSHOT >Reporter: Philip Helger > Attachments: MainIssue3170.java, issue-3170.pdf > > > When creating a PDF with a single vry long line, the resulting PDF cannot > be opened in Adobe Reader DC. > The code is the same as in PDFBOX-3168 except that the string is 300 times > the length. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC
Philip Helger created PDFBOX-3170: - Summary: Created PDF does not open in Adobe Reader DC Key: PDFBOX-3170 URL: https://issues.apache.org/jira/browse/PDFBOX-3170 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Environment: 2.0.0-RC2 - current SNAPSHOT Reporter: Philip Helger When creating a PDF with a single vry long line, the resulting PDF cannot be opened in Adobe Reader DC. The code is the same as in PDFBOX-3168 except that the string is 300 times the length. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Helger updated PDFBOX-3168: -- Attachment: issue-3168.pdf MainIssue3168.java Source file to reproduce + created PDF file > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > Attachments: MainIssue3168.java, issue-3168.pdf > > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3131) Reduce amount of intermediate data and objects to reduce memory footprint/complexity
[ https://issues.apache.org/jira/browse/PDFBOX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060367#comment-15060367 ] ASF subversion and git services commented on PDFBOX-3131: - Commit 1720397 from [~lehmi] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1720397 ] PDFBOX-3131: reduce the amount of data when parsing AFM files as PDFBox uses char metrics only > Reduce amount of intermediate data and objects to reduce memory > footprint/complexity > > > Key: PDFBOX-3131 > URL: https://issues.apache.org/jira/browse/PDFBOX-3131 > Project: PDFBox > Issue Type: Improvement > Components: FontBox >Affects Versions: 2.0.0 >Reporter: Andreas Lehmkühler >Assignee: Andreas Lehmkühler > Fix For: 2.0.0 > > > The CFFParser holds a lot of intermediate data and produces a lot of objects > to do so. The idea is to reduce the amount of such objects and dat ot reduce > the memory footprint and the complexity. > - the class IndexData holds intermediate data creates byte array everytime > when getBytes is called. I'm going to replace the class with a simple list to > reduce the memory footprint and the complexity > - remove unused members of private classes > - create a list of strings instead of a list of byte arrays which is used to > create those strings -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060398#comment-15060398 ] Philip Helger commented on PDFBOX-3168: --- Objects 5,6,7,8,12 are not compressed > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > Attachments: MainIssue3168.java, issue-3168.pdf > > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter
[ https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-3143. - Resolution: Fixed Assignee: Tilman Hausherr Fix Version/s: 2.0.0 > Added PDEmbeddedFile constructor with COSName parameter > --- > > Key: PDFBOX-3143 > URL: https://issues.apache.org/jira/browse/PDFBOX-3143 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: Version 2.0.0-RC2 >Reporter: Philip Helger >Assignee: Tilman Hausherr > Fix For: 2.0.0 > > Attachments: 3143.patch > > > Since the "addCompression" method from PDStream got deprecated and instead > the "PDStream" constructor with "COSName" parameter should be used, please > also provide this constructor in all classes derived from "PDStream" where it > makes sense (especially in "PDEmbeddedFile") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2852) Improve code quality (2)
[ https://issues.apache.org/jira/browse/PDFBOX-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060369#comment-15060369 ] ASF subversion and git services commented on PDFBOX-2852: - Commit 1720400 from [~lehmi] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1720400 ] PDFBOX-2852: make the data of exposed collections unmodifiable > Improve code quality (2) > > > Key: PDFBOX-2852 > URL: https://issues.apache.org/jira/browse/PDFBOX-2852 > Project: PDFBox > Issue Type: Task >Affects Versions: 2.0.0 >Reporter: Tilman Hausherr > Attachments: winansiencoding.patch, winansiencoding2.patch > > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube > report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-2576, which was getting too long. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter
[ https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060364#comment-15060364 ] Tilman Hausherr commented on PDFBOX-3143: - No it doesn't :-) > Added PDEmbeddedFile constructor with COSName parameter > --- > > Key: PDFBOX-3143 > URL: https://issues.apache.org/jira/browse/PDFBOX-3143 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: Version 2.0.0-RC2 >Reporter: Philip Helger > Attachments: 3143.patch > > > Since the "addCompression" method from PDStream got deprecated and instead > the "PDStream" constructor with "COSName" parameter should be used, please > also provide this constructor in all classes derived from "PDStream" where it > makes sense (especially in "PDEmbeddedFile") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter
[ https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060362#comment-15060362 ] ASF subversion and git services commented on PDFBOX-3143: - Commit 1720395 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1720395 ] PDFBOX-3143: add PDEmbeddedFile constructor with COSName filter parameter, as suggested by Philip Helger > Added PDEmbeddedFile constructor with COSName parameter > --- > > Key: PDFBOX-3143 > URL: https://issues.apache.org/jira/browse/PDFBOX-3143 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: Version 2.0.0-RC2 >Reporter: Philip Helger > Attachments: 3143.patch > > > Since the "addCompression" method from PDStream got deprecated and instead > the "PDStream" constructor with "COSName" parameter should be used, please > also provide this constructor in all classes derived from "PDStream" where it > makes sense (especially in "PDEmbeddedFile") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed
[ https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060338#comment-15060338 ] Tilman Hausherr commented on PDFBOX-3168: - Could you please attach an example and some code? I ask because the example file you'll find after building in "PDFBox reactor/examples/example.pdf" does have compressed subsets (see objects 19 and 21). See also TrueTypeEmbedder.buildFontFile2(), it does compress. > Embedded TTF subsets are not compressed > --- > > Key: PDFBOX-3168 > URL: https://issues.apache.org/jira/browse/PDFBOX-3168 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > > When embedding font subsets, theses subsets are included uncompressed in the > PDF. > I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Closed] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-3166. --- Resolution: Not A Problem > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060317#comment-15060317 ] Tilman Hausherr commented on PDFBOX-3166: - I assume you mean the two spaces before the "1" at the beginning. There really is one space before and after the 1 in the PDF (look for Tj): {code} /Artifact << /Attached [ /Bottom ] /Type /Pagination >> BDC BT /TT0 9 Tf 90.02 51.72 Td ( ) Tj ET q 295.42 49.62 4.5 10.32 re W* n q 295.42 49.62 4.5 10.32 re W* n BT /TT0 9 Tf 295.42 51.78 Td (1) Tj ET Q q 295.42 49.62 4.5 10.32 re W* n BT /TT0 9 Tf 299.92 51.78 Td ( ) Tj ET EMC Q Q {code} The second space is because the real space is too far away from the 1. See also the attached image which shows where the space is. I am aware that Adobe Reader and PDF.js do not bring that space. But I don't consider this to be important - and fixing this "problem" might bring new problems, text extraction is very sensitive to changes. You can eliminate leading or trailing spaces with trim, or eliminate double spaces with replace. The good side of your issue is that if the only topic you're complaining is a space, it means that the rest is pretty good :-) > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
[ https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-3166: Attachment: 1201830823-marked-1.png > Unwanted spaces before number in chinese text extraction > > > Key: PDFBOX-3166 > URL: https://issues.apache.org/jira/browse/PDFBOX-3166 > Project: PDFBox > Issue Type: Bug > Components: Text extraction >Affects Versions: 2.0.0 > Environment: Windows >Reporter: Gang Luo > Labels: test > Attachments: 1201830823-marked-1.png > > Original Estimate: 72h > Remaining Estimate: 72h > > Unwanted spaces before number in chinese date text . > such as this pdf file > http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero
[ https://issues.apache.org/jira/browse/PDFBOX-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-3167. - Resolution: Fixed Assignee: Tilman Hausherr Fix Version/s: 2.0.0 Fixed - thanks! > IllegalArgumentException: dash lengths all zero > --- > > Key: PDFBOX-3167 > URL: https://issues.apache.org/jira/browse/PDFBOX-3167 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: simon steiner >Assignee: Tilman Hausherr > Fix For: 2.0.0 > > > PDF from PDFBOX-624 > java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage > documenta_math.pdf > Exception in thread "main" java.lang.IllegalArgumentException: dash lengths > all zero > at java.awt.BasicStroke.(BasicStroke.java:220) > at > org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929) > at > org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191) > at > org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) > at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero
[ https://issues.apache.org/jira/browse/PDFBOX-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060297#comment-15060297 ] ASF subversion and git services commented on PDFBOX-3167: - Commit 1720390 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1720390 ] PDFBOX-3167: check for invalid dash > IllegalArgumentException: dash lengths all zero > --- > > Key: PDFBOX-3167 > URL: https://issues.apache.org/jira/browse/PDFBOX-3167 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: simon steiner > > PDF from PDFBOX-624 > java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage > documenta_math.pdf > Exception in thread "main" java.lang.IllegalArgumentException: dash lengths > all zero > at java.awt.BasicStroke.(BasicStroke.java:220) > at > org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929) > at > org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191) > at > org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) > at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature
[ https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060015#comment-15060015 ] Thomas Chojecki commented on PDFBOX-3169: - Yes. This will not solve the main need for easy incremental update of pdfs, but make it possible to do some minor incremental updates like augmenting signatures or something else that isn't complicated. After applying this patch at home, the saveIncremental method should work as it does in the 1.8 branch. > SaveIncremental does not work without signature > --- > > Key: PDFBOX-3169 > URL: https://issues.apache.org/jira/browse/PDFBOX-3169 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.0 >Reporter: Thomas Chojecki >Assignee: Thomas Chojecki > Attachments: saveIncremental.patch > > > I know this feature is ongoing, but with the 2.0.0-RC builds the > saveIncremental (without signature) stop working at all. A > ByteArrayOutputStream is used in the COSWriter for output. This OutputStream > will only be handled in the case, when we write a signature. Otherwise the > whole content will be discarded. > As I wrote some time ago on the mailinglist, incremental update work in a > limited way. At the moment we use it for augmenting signatures and this works > with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature
[ https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059994#comment-15059994 ] Maruan Sahyoun commented on PDFBOX-3169: If I understand correctly that patch is necessary because of the current (limited) way PDFBox handles incremental updates? So we still need full incremental update support. > SaveIncremental does not work without signature > --- > > Key: PDFBOX-3169 > URL: https://issues.apache.org/jira/browse/PDFBOX-3169 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.0 >Reporter: Thomas Chojecki >Assignee: Thomas Chojecki > Attachments: saveIncremental.patch > > > I know this feature is ongoing, but with the 2.0.0-RC builds the > saveIncremental (without signature) stop working at all. A > ByteArrayOutputStream is used in the COSWriter for output. This OutputStream > will only be handled in the case, when we write a signature. Otherwise the > whole content will be discarded. > As I wrote some time ago on the mailinglist, incremental update work in a > limited way. At the moment we use it for augmenting signatures and this works > with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3169) SaveIncremental does not work without signature
[ https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Chojecki updated PDFBOX-3169: Attachment: saveIncremental.patch The patch add an additional method, that handle the saveIncremental write for non-signature cases. It will copy the origin document and the incremental update into the given Outputstream. > SaveIncremental does not work without signature > --- > > Key: PDFBOX-3169 > URL: https://issues.apache.org/jira/browse/PDFBOX-3169 > Project: PDFBox > Issue Type: Bug > Components: Writing >Affects Versions: 2.0.0 >Reporter: Thomas Chojecki >Assignee: Thomas Chojecki > Attachments: saveIncremental.patch > > > I know this feature is ongoing, but with the 2.0.0-RC builds the > saveIncremental (without signature) stop working at all. A > ByteArrayOutputStream is used in the COSWriter for output. This OutputStream > will only be handled in the case, when we write a signature. Otherwise the > whole content will be discarded. > As I wrote some time ago on the mailinglist, incremental update work in a > limited way. At the moment we use it for augmenting signatures and this works > with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3169) SaveIncremental does not work without signature
Thomas Chojecki created PDFBOX-3169: --- Summary: SaveIncremental does not work without signature Key: PDFBOX-3169 URL: https://issues.apache.org/jira/browse/PDFBOX-3169 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 2.0.0 Reporter: Thomas Chojecki Assignee: Thomas Chojecki I know this feature is ongoing, but with the 2.0.0-RC builds the saveIncremental (without signature) stop working at all. A ByteArrayOutputStream is used in the COSWriter for output. This OutputStream will only be handled in the case, when we write a signature. Otherwise the whole content will be discarded. As I wrote some time ago on the mailinglist, incremental update work in a limited way. At the moment we use it for augmenting signatures and this works with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3168) Embedded TTF subsets are not compressed
Philip Helger created PDFBOX-3168: - Summary: Embedded TTF subsets are not compressed Key: PDFBOX-3168 URL: https://issues.apache.org/jira/browse/PDFBOX-3168 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Environment: 2.0.0-RC2 Reporter: Philip Helger When embedding font subsets, theses subsets are included uncompressed in the PDF. I assume it would makes sense to flate-encode them for space reasons. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter
[ https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Helger updated PDFBOX-3143: -- Attachment: 3143.patch Voila a small patch to fulfil the task > Added PDEmbeddedFile constructor with COSName parameter > --- > > Key: PDFBOX-3143 > URL: https://issues.apache.org/jira/browse/PDFBOX-3143 > Project: PDFBox > Issue Type: Bug > Components: PDModel >Affects Versions: 2.0.0 > Environment: Version 2.0.0-RC2 >Reporter: Philip Helger > Attachments: 3143.patch > > > Since the "addCompression" method from PDStream got deprecated and instead > the "PDStream" constructor with "COSName" parameter should be used, please > also provide this constructor in all classes derived from "PDStream" where it > makes sense (especially in "PDEmbeddedFile") -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3162) IllegalStateException in TTFSubsetter
[ https://issues.apache.org/jira/browse/PDFBOX-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059811#comment-15059811 ] Philip Helger commented on PDFBOX-3162: --- It happens some times but I couldn't find more information. I assume it is a problem with 2.0.0-RC2 - I will update to the latest SNAPSHOT and see whether it might be originating from PDFBOX-2945 > IllegalStateException in TTFSubsetter > - > > Key: PDFBOX-3162 > URL: https://issues.apache.org/jira/browse/PDFBOX-3162 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.0 > Environment: 2.0.0-RC2 >Reporter: Philip Helger > > Hi encountered a rare exception with an empty TTF subset: > {code} > ==> [1] caused by java.lang.IllegalStateException: subset is empty > 1.: org.apache.fontbox.ttf.TTFSubsetter.writeToStream(TTFSubsetter.java:921) > 2.: > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:304) > 3.: org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:162) > 4.: org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1102) > 5.: com.helger.pdflayout.PageLayoutPDF.renderTo(PageLayoutPDF.java:276) > {code} > Unfortunately I don't know yet what was causing the problem, but I will > provide you with more details on Monday (if necessary). > If there is nothing to subset - I think the call should simply be ignored??? > Or maybe this is a problem because of the "uni" name bug (in 2.0.0-RC2) I > reopened lately? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero
simon steiner created PDFBOX-3167: - Summary: IllegalArgumentException: dash lengths all zero Key: PDFBOX-3167 URL: https://issues.apache.org/jira/browse/PDFBOX-3167 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner PDF from PDFBOX-624 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage documenta_math.pdf Exception in thread "main" java.lang.IllegalArgumentException: dash lengths all zero at java.awt.BasicStroke.(BasicStroke.java:220) at org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929) at org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94) at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction
Gang Luo created PDFBOX-3166: Summary: Unwanted spaces before number in chinese text extraction Key: PDFBOX-3166 URL: https://issues.apache.org/jira/browse/PDFBOX-3166 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.0 Environment: Windows Reporter: Gang Luo Unwanted spaces before number in chinese date text . such as this pdf file http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org