[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061436#comment-15061436
 ] 

Tilman Hausherr commented on PDFBOX-3166:
-

{quote}
But it cannot eliminate space before the 1 , if I add setSpacingTolerance value.
{quote}
Because the space is really there, see the image. The spacing tolerance helps 
to decide where characters are seperated or not. You can play with that one if 
you have some special documents where words appear split or always together. 
Try it on a document with english text, there it will be more obvious because 
one word = several glyphs: depending on the value, the sentence I just wrote 
would be extracted as "Tryitonadocumentwithwesterntext" or "Tr y it on a doc 
ume nt w ith wes te rn te xt".

No, there is no API to remove the space before the "1" because it really exists 
in the PDF. PDF files are created by a wide variety of software and there are 
often surprises.

As I said, just add a trim() to each line.

If you need more help, tell us what your application is about and why the space 
is a problem.

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Issue Comment Deleted] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Gang Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Luo updated PDFBOX-3166:
-
Comment: was deleted

(was: Text extraction is very sensitive to changes. Yes ,I see. Is there API 
can adjust space char to appear or not?
I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space 
before the 1 , if I add setSpacingTolerance value.

PDFTextStripper stripper = new PDFTextStripper();
stripper.setSpacingTolerance(800.0f); //0.08f

If I reduce the setSpacingTolerance value , it did add space after date number.

The rest is pretty good.)

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Reopened] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Gang Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Luo reopened PDFBOX-3166:
--

Text extraction is very sensitive to changes. Yes ,I see. Is there API can 
adjust space char to appear or not?
I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space 
before the 1 , if I add setSpacingTolerance value.

PDFTextStripper stripper = new PDFTextStripper();
stripper.setSpacingTolerance(800.0f); //0.08f

If I reduce the setSpacingTolerance value , it did add space after date number.

The rest is pretty good.


> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Gang Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061254#comment-15061254
 ] 

Gang Luo commented on PDFBOX-3166:
--

Text extraction is very sensitive to changes. Yes ,I see. Is there API can 
adjust space char to appear or not?
I try PDFTextStripper.setSpacingTolerance(). But it cannot eliminate space 
before the 1 , if I add setSpacingTolerance value.

PDFTextStripper stripper = new PDFTextStripper();
stripper.setSpacingTolerance(800.0f); //0.08f

If I reduce the setSpacingTolerance value , it did add space after date number.

The rest is pretty good.

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061206#comment-15061206
 ] 

ASF subversion and git services commented on PDFBOX-3169:
-

Commit 1720482 from [~tchojecki] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1720482 ]

PDFBOX-3169
add an additional method, that handle the saveIncremental write for 
non-signature cases. It will copy the origin document and the incremental 
update into the given Outputstream.

> SaveIncremental does not work without signature
> ---
>
> Key: PDFBOX-3169
> URL: https://issues.apache.org/jira/browse/PDFBOX-3169
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
> Fix For: 2.0.0
>
> Attachments: saveIncremental.patch
>
>
> I know this feature is ongoing, but with the 2.0.0-RC builds the 
> saveIncremental (without signature) stop working at all. A 
> ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
> will only be handled in the case, when we write a signature. Otherwise the 
> whole content will be discarded.
> As I wrote some time ago on the mailinglist, incremental update work in a 
> limited way. At the moment we use it for augmenting signatures and this works 
> with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3165) Tab characters in PDTextField cause error when using .flatten()

2015-12-16 Thread Aaron Eischeid (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061139#comment-15061139
 ] 

Aaron Eischeid commented on PDFBOX-3165:


is there a specific way maybe to use a font file that has more characters in 
it? What we are doing is using Libre Office to work on the doc and export to 
PDF. I have noticed is that if we do this on a Ubuntu machine then after PdfBox 
handles the PDF some windows users will see black circles instead of letters. 
That problem goes away if we do the export from a windows machine. The error we 
encountered with the missing tab char was on a PDF that had been generated on a 
windows machine. 

So the question is is there a way to tell Libre Office or whatever PDF 
generating thing to use/declare/embed a font that has a broader set of 
characters? Or should I use a different font setting in the form elements? 
Arial as a font choice feels pretty safe really. If that one doesn't work it is 
hard to imagine what would do better.

> Tab characters in PDTextField cause error when using .flatten()
> ---
>
> Key: PDFBOX-3165
> URL: https://issues.apache.org/jira/browse/PDFBOX-3165
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm, FontBox
>Affects Versions: 2.0.0
> Environment: Ubuntu, JDK7
>Reporter: Aaron Eischeid
> Attachments: Sample_Template.pdf
>
>
> pdf form gets filled in, then call I call .flatten(fields, true) which last I 
> knew was undocumented, but anyway I needed the refreshAppearences for 
> pdfViewers that don't support acroForms like pdf.js
> If a tab character some how gets entered into the PDTextField it chokes. I am 
> worried other somewhat common characters might have similar issues, but 
> haven't experimented so far.
> Using RC2 of pdfBox and fontBox. and fonts in pdfForm elements were all set 
> to Arial.
> Relavent stacktrace:
> U+0009 is not available in this font's Encoding. Stacktrace follows:
> java.lang.IllegalArgumentException: U+0009 is not available in this font's 
> Encoding
>   at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:358)
>   at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:283)
>   at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:312)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:193)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:287)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.flatten(PDAcroForm.java:211)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3165) Tab characters in PDTextField cause error when using .flatten()

2015-12-16 Thread Aaron Eischeid (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Eischeid updated PDFBOX-3165:
---
Attachment: Sample_Template.pdf

I haven't had time to test this file specifically, but it is just a very 
stripped down version of the one we were having the issue with

> Tab characters in PDTextField cause error when using .flatten()
> ---
>
> Key: PDFBOX-3165
> URL: https://issues.apache.org/jira/browse/PDFBOX-3165
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm, FontBox
>Affects Versions: 2.0.0
> Environment: Ubuntu, JDK7
>Reporter: Aaron Eischeid
> Attachments: Sample_Template.pdf
>
>
> pdf form gets filled in, then call I call .flatten(fields, true) which last I 
> knew was undocumented, but anyway I needed the refreshAppearences for 
> pdfViewers that don't support acroForms like pdf.js
> If a tab character some how gets entered into the PDTextField it chokes. I am 
> worried other somewhat common characters might have similar issues, but 
> haven't experimented so far.
> Using RC2 of pdfBox and fontBox. and fonts in pdfForm elements were all set 
> to Arial.
> Relavent stacktrace:
> U+0009 is not available in this font's Encoding. Stacktrace follows:
> java.lang.IllegalArgumentException: U+0009 is not available in this font's 
> Encoding
>   at 
> org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:358)
>   at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:283)
>   at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:312)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:193)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:287)
>   at 
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.flatten(PDAcroForm.java:211)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3169:

Fix Version/s: 2.0.0

> SaveIncremental does not work without signature
> ---
>
> Key: PDFBOX-3169
> URL: https://issues.apache.org/jira/browse/PDFBOX-3169
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
> Fix For: 2.0.0
>
> Attachments: saveIncremental.patch
>
>
> I know this feature is ongoing, but with the 2.0.0-RC builds the 
> saveIncremental (without signature) stop working at all. A 
> ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
> will only be handled in the case, when we write a signature. Otherwise the 
> whole content will be discarded.
> As I wrote some time ago on the mailinglist, incremental update work in a 
> limited way. At the moment we use it for augmenting signatures and this works 
> with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3170:

Priority: Minor  (was: Major)

> Created PDF does not open in Adobe Reader DC
> 
>
> Key: PDFBOX-3170
> URL: https://issues.apache.org/jira/browse/PDFBOX-3170
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2 - current SNAPSHOT
>Reporter: Philip Helger
>Priority: Minor
> Attachments: MainIssue3170.java, issue-3170.pdf
>
>
> When creating a PDF with a single vry long line, the resulting PDF cannot 
> be opened in Adobe Reader DC.
> The code is the same as in PDFBOX-3168 except that the string is 300 times 
> the length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2941) Improve PDFDebugger (2)

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2941:

Description: 
This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came up 
in GSoC2015, ideas that were not implemented due to lack of time, and new ideas.
- save modified PDFs
- refactor PDFDebugger.java
- render glyphs of fonts
- editing in hex viewer
- ✓ refactor StreamPane to share stream filtering among Text view and hex view
- password dialog when hitting protected PDF
- remove nodes (e.g. elements from a COSDictionary)
- show "pretty" XML
- delete array or dictionary elements
- edit & keep content streams
- load content streams
- display filtered streams even if the unfiltered stream is corrupt 
(PDFBOX-2976)
- ✓ display the "caused by" part exception stack trace (nested exceptions)
- keep zoom
- integrate DrawPrintTextLocations into rendering
- integrate area text extraction with a mouse-created rectangle that shows the 
coordinates in a status line


  was:
This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came up 
in GSoC2015, ideas that were not implemented due to lack of time, and new ideas.
- save modified PDFs
- refactor PDFDebugger.java
- render glyphs of fonts
- editing in hex viewer
- ✓ refactor StreamPane to share stream filtering among Text view and hex view
- password dialog when hitting protected PDF
- remove nodes (e.g. elements from a COSDictionary)
- show "pretty" XML
- delete array or dictionary elements
- edit & keep content streams
- load content streams
- display filtered streams even if the unfiltered stream is corrupt 
(PDFBOX-2976)
- ✓ display the "caused by" part exception stack trace (nested exceptions)



> Improve PDFDebugger (2)
> ---
>
> Key: PDFBOX-2941
> URL: https://issues.apache.org/jira/browse/PDFBOX-2941
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: gs-bugzilla694570.pdf, osx-tabs.png, 
> screenshot_debugger_new.png, screenshot_debugger_not_aligned.png, 
> screenshot_debugger_old.png, screenshot_w7_fontsize.png, 
> separate_filter_choice_from_text_hex_views.diff, sonar_qube_resolve.diff, 
> sonar_qube_resolve_25_08.diff
>
>
> This is a follow-up issue to PDFBOX-2530 to implement extra ideas that came 
> up in GSoC2015, ideas that were not implemented due to lack of time, and new 
> ideas.
> - save modified PDFs
> - refactor PDFDebugger.java
> - render glyphs of fonts
> - editing in hex viewer
> - ✓ refactor StreamPane to share stream filtering among Text view and hex view
> - password dialog when hitting protected PDF
> - remove nodes (e.g. elements from a COSDictionary)
> - show "pretty" XML
> - delete array or dictionary elements
> - edit & keep content streams
> - load content streams
> - display filtered streams even if the unfiltered stream is corrupt 
> (PDFBOX-2976)
> - ✓ display the "caused by" part exception stack trace (nested exceptions)
> - keep zoom
> - integrate DrawPrintTextLocations into rendering
> - integrate area text extraction with a mouse-created rectangle that shows 
> the coordinates in a status line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060435#comment-15060435
 ] 

Tilman Hausherr commented on PDFBOX-3168:
-

In your file, object 12 is also compressed.
{code}
12 0 obj
<<
/Filter /FlateDecode
/Length 71
/Length1 186
>>
stream
xœ}‹W
€0ÇnÔ5öûßÓ%с·åA¢ O)V7MÝ6jjú=†Qò$ZžÀÌÂJdcçàäâNÿÏ&ú{
endstream
endobj
{code}

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf
>
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060432#comment-15060432
 ] 

Tilman Hausherr commented on PDFBOX-3168:
-

Ignore my earlier numbers, my file was different.

However I reverted all my test changes. 12 is compressed. 5,6,7,8 are not 
streams. Are do you mean compressed object streams? We do read these, but I 
don't think we write them.

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf
>
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3168:

Attachment: example.pdf

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: MainIssue3168.java, example.pdf, issue-3168.pdf
>
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC

2015-12-16 Thread Philip Helger (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Helger updated PDFBOX-3170:
--
Attachment: MainIssue3170.java
issue-3170.pdf

The source file to reproduce the output and the created output.

> Created PDF does not open in Adobe Reader DC
> 
>
> Key: PDFBOX-3170
> URL: https://issues.apache.org/jira/browse/PDFBOX-3170
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2 - current SNAPSHOT
>Reporter: Philip Helger
> Attachments: MainIssue3170.java, issue-3170.pdf
>
>
> When creating a PDF with a single vry long line, the resulting PDF cannot 
> be opened in Adobe Reader DC.
> The code is the same as in PDFBOX-3168 except that the string is 300 times 
> the length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3170) Created PDF does not open in Adobe Reader DC

2015-12-16 Thread Philip Helger (JIRA)
Philip Helger created PDFBOX-3170:
-

 Summary: Created PDF does not open in Adobe Reader DC
 Key: PDFBOX-3170
 URL: https://issues.apache.org/jira/browse/PDFBOX-3170
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
 Environment: 2.0.0-RC2 - current SNAPSHOT
Reporter: Philip Helger


When creating a PDF with a single vry long line, the resulting PDF cannot 
be opened in Adobe Reader DC.
The code is the same as in PDFBOX-3168 except that the string is 300 times the 
length.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Philip Helger (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Helger updated PDFBOX-3168:
--
Attachment: issue-3168.pdf
MainIssue3168.java

Source file to reproduce + created PDF file

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: MainIssue3168.java, issue-3168.pdf
>
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3131) Reduce amount of intermediate data and objects to reduce memory footprint/complexity

2015-12-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060367#comment-15060367
 ] 

ASF subversion and git services commented on PDFBOX-3131:
-

Commit 1720397 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1720397 ]

PDFBOX-3131: reduce the amount of data when parsing AFM files as PDFBox uses 
char metrics only

> Reduce amount of intermediate data and objects to reduce memory 
> footprint/complexity
> 
>
> Key: PDFBOX-3131
> URL: https://issues.apache.org/jira/browse/PDFBOX-3131
> Project: PDFBox
>  Issue Type: Improvement
>  Components: FontBox
>Affects Versions: 2.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
> Fix For: 2.0.0
>
>
> The CFFParser holds a lot of intermediate data and produces a lot of objects 
> to do so. The idea is to reduce the amount of such objects and dat ot reduce 
> the memory footprint and the complexity.
> - the class IndexData holds intermediate data creates byte array everytime 
> when getBytes is called. I'm going to replace the class with a simple list to 
> reduce the memory footprint and the complexity
> - remove unused members of private classes
> - create a list of strings instead of a list of byte arrays which is used to 
> create those strings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Philip Helger (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060398#comment-15060398
 ] 

Philip Helger commented on PDFBOX-3168:
---

Objects 5,6,7,8,12 are not compressed

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: MainIssue3168.java, issue-3168.pdf
>
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3143.
-
   Resolution: Fixed
 Assignee: Tilman Hausherr
Fix Version/s: 2.0.0

> Added PDEmbeddedFile constructor with COSName parameter
> ---
>
> Key: PDFBOX-3143
> URL: https://issues.apache.org/jira/browse/PDFBOX-3143
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Version 2.0.0-RC2
>Reporter: Philip Helger
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
> Attachments: 3143.patch
>
>
> Since the "addCompression" method from PDStream got deprecated and instead 
> the "PDStream" constructor with "COSName" parameter should be used, please 
> also provide this constructor in all classes derived from "PDStream" where it 
> makes sense (especially in "PDEmbeddedFile")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2852) Improve code quality (2)

2015-12-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060369#comment-15060369
 ] 

ASF subversion and git services commented on PDFBOX-2852:
-

Commit 1720400 from [~lehmi] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1720400 ]

PDFBOX-2852: make the data of exposed collections unmodifiable

> Improve code quality (2)
> 
>
> Key: PDFBOX-2852
> URL: https://issues.apache.org/jira/browse/PDFBOX-2852
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.0
>Reporter: Tilman Hausherr
> Attachments: winansiencoding.patch, winansiencoding2.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2576, which was getting too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060364#comment-15060364
 ] 

Tilman Hausherr commented on PDFBOX-3143:
-

No it doesn't :-)

> Added PDEmbeddedFile constructor with COSName parameter
> ---
>
> Key: PDFBOX-3143
> URL: https://issues.apache.org/jira/browse/PDFBOX-3143
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Version 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: 3143.patch
>
>
> Since the "addCompression" method from PDStream got deprecated and instead 
> the "PDStream" constructor with "COSName" parameter should be used, please 
> also provide this constructor in all classes derived from "PDStream" where it 
> makes sense (especially in "PDEmbeddedFile")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter

2015-12-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060362#comment-15060362
 ] 

ASF subversion and git services commented on PDFBOX-3143:
-

Commit 1720395 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1720395 ]

PDFBOX-3143: add PDEmbeddedFile constructor with COSName filter parameter, as 
suggested by Philip Helger

> Added PDEmbeddedFile constructor with COSName parameter
> ---
>
> Key: PDFBOX-3143
> URL: https://issues.apache.org/jira/browse/PDFBOX-3143
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Version 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: 3143.patch
>
>
> Since the "addCompression" method from PDStream got deprecated and instead 
> the "PDStream" constructor with "COSName" parameter should be used, please 
> also provide this constructor in all classes derived from "PDStream" where it 
> makes sense (especially in "PDEmbeddedFile")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060338#comment-15060338
 ] 

Tilman Hausherr commented on PDFBOX-3168:
-

Could you please attach an example and some code? I ask because the example 
file you'll find after building in "PDFBox reactor/examples/example.pdf" does 
have compressed subsets (see objects 19 and 21). See also 
TrueTypeEmbedder.buildFontFile2(), it does compress.

> Embedded TTF subsets are not compressed
> ---
>
> Key: PDFBOX-3168
> URL: https://issues.apache.org/jira/browse/PDFBOX-3168
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
>
> When embedding font subsets, theses subsets are included uncompressed in the 
> PDF.
> I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-3166.
---
Resolution: Not A Problem

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060317#comment-15060317
 ] 

Tilman Hausherr commented on PDFBOX-3166:
-

I assume you mean the two spaces before the "1" at the beginning. There really 
is one space before and after the 1 in the PDF (look for Tj):
{code}
/Artifact << /Attached [ /Bottom ] /Type /Pagination >> BDC
  BT
/TT0 9 Tf
90.02 51.72 Td
( ) Tj
  ET
  q
295.42 49.62 4.5 10.32 re
W*
n
q
  295.42 49.62 4.5 10.32 re
  W*
  n
  BT
/TT0 9 Tf
295.42 51.78 Td
(1) Tj
  ET
Q
q
  295.42 49.62 4.5 10.32 re
  W*
  n
  BT
/TT0 9 Tf
299.92 51.78 Td
( ) Tj
  ET
EMC
  Q
Q
{code}
The second space is because the real space is too far away from the 1. See also 
the attached image which shows where the space is.

I am aware that Adobe Reader and PDF.js do not bring that space. But I don't 
consider this to be important - and fixing this "problem" might bring new 
problems, text extraction is very sensitive to changes. You can eliminate 
leading or trailing spaces with trim, or eliminate double spaces with replace.

The good side of your issue is that if the only topic you're complaining is a 
space, it means that the rest is pretty good :-)

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-3166:

Attachment: 1201830823-marked-1.png

> Unwanted spaces before number in chinese text extraction
> 
>
> Key: PDFBOX-3166
> URL: https://issues.apache.org/jira/browse/PDFBOX-3166
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.0
> Environment: Windows
>Reporter: Gang Luo
>  Labels: test
> Attachments: 1201830823-marked-1.png
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Unwanted spaces before number in chinese date text .
> such as this pdf file
> http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero

2015-12-16 Thread Tilman Hausherr (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-3167.
-
   Resolution: Fixed
 Assignee: Tilman Hausherr
Fix Version/s: 2.0.0

Fixed - thanks!

> IllegalArgumentException: dash lengths all zero
> ---
>
> Key: PDFBOX-3167
> URL: https://issues.apache.org/jira/browse/PDFBOX-3167
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: simon steiner
>Assignee: Tilman Hausherr
> Fix For: 2.0.0
>
>
> PDF from PDFBOX-624
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> documenta_math.pdf
> Exception in thread "main" java.lang.IllegalArgumentException: dash lengths 
> all zero
>   at java.awt.BasicStroke.(BasicStroke.java:220)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858)
>   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
>   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero

2015-12-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060297#comment-15060297
 ] 

ASF subversion and git services commented on PDFBOX-3167:
-

Commit 1720390 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1720390 ]

PDFBOX-3167: check for invalid dash

> IllegalArgumentException: dash lengths all zero
> ---
>
> Key: PDFBOX-3167
> URL: https://issues.apache.org/jira/browse/PDFBOX-3167
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: simon steiner
>
> PDF from PDFBOX-624
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> documenta_math.pdf
> Exception in thread "main" java.lang.IllegalArgumentException: dash lengths 
> all zero
>   at java.awt.BasicStroke.(BasicStroke.java:220)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929)
>   at 
> org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858)
>   at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
>   at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
>   at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread Thomas Chojecki (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060015#comment-15060015
 ] 

Thomas Chojecki commented on PDFBOX-3169:
-

Yes. This will not solve the main need for easy incremental update of pdfs, but 
make it possible to do some minor incremental updates like augmenting 
signatures or something else that isn't complicated. 

After applying this patch at home, the saveIncremental method should work as it 
does in the 1.8 branch. 

> SaveIncremental does not work without signature
> ---
>
> Key: PDFBOX-3169
> URL: https://issues.apache.org/jira/browse/PDFBOX-3169
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
> Attachments: saveIncremental.patch
>
>
> I know this feature is ongoing, but with the 2.0.0-RC builds the 
> saveIncremental (without signature) stop working at all. A 
> ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
> will only be handled in the case, when we write a signature. Otherwise the 
> whole content will be discarded.
> As I wrote some time ago on the mailinglist, incremental update work in a 
> limited way. At the moment we use it for augmenting signatures and this works 
> with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread Maruan Sahyoun (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059994#comment-15059994
 ] 

Maruan Sahyoun commented on PDFBOX-3169:


If I understand correctly that patch is necessary because of the current 
(limited) way PDFBox handles incremental updates? So we still need full 
incremental update support.

> SaveIncremental does not work without signature
> ---
>
> Key: PDFBOX-3169
> URL: https://issues.apache.org/jira/browse/PDFBOX-3169
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
> Attachments: saveIncremental.patch
>
>
> I know this feature is ongoing, but with the 2.0.0-RC builds the 
> saveIncremental (without signature) stop working at all. A 
> ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
> will only be handled in the case, when we write a signature. Otherwise the 
> whole content will be discarded.
> As I wrote some time ago on the mailinglist, incremental update work in a 
> limited way. At the moment we use it for augmenting signatures and this works 
> with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread Thomas Chojecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Chojecki updated PDFBOX-3169:

Attachment: saveIncremental.patch

The patch add an additional method, that handle the saveIncremental write for 
non-signature cases. It will copy the origin document and the incremental 
update into the given Outputstream. 

> SaveIncremental does not work without signature
> ---
>
> Key: PDFBOX-3169
> URL: https://issues.apache.org/jira/browse/PDFBOX-3169
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.0
>Reporter: Thomas Chojecki
>Assignee: Thomas Chojecki
> Attachments: saveIncremental.patch
>
>
> I know this feature is ongoing, but with the 2.0.0-RC builds the 
> saveIncremental (without signature) stop working at all. A 
> ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
> will only be handled in the case, when we write a signature. Otherwise the 
> whole content will be discarded.
> As I wrote some time ago on the mailinglist, incremental update work in a 
> limited way. At the moment we use it for augmenting signatures and this works 
> with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3169) SaveIncremental does not work without signature

2015-12-16 Thread Thomas Chojecki (JIRA)
Thomas Chojecki created PDFBOX-3169:
---

 Summary: SaveIncremental does not work without signature
 Key: PDFBOX-3169
 URL: https://issues.apache.org/jira/browse/PDFBOX-3169
 Project: PDFBox
  Issue Type: Bug
  Components: Writing
Affects Versions: 2.0.0
Reporter: Thomas Chojecki
Assignee: Thomas Chojecki


I know this feature is ongoing, but with the 2.0.0-RC builds the 
saveIncremental (without signature) stop working at all. A 
ByteArrayOutputStream is used in the COSWriter for output. This OutputStream 
will only be handled in the case, when we write a signature. Otherwise the 
whole content will be discarded.

As I wrote some time ago on the mailinglist, incremental update work in a 
limited way. At the moment we use it for augmenting signatures and this works 
with the old 1.8.x but not with trunk after the patch PDFBOX-1847 was applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3168) Embedded TTF subsets are not compressed

2015-12-16 Thread Philip Helger (JIRA)
Philip Helger created PDFBOX-3168:
-

 Summary: Embedded TTF subsets are not compressed
 Key: PDFBOX-3168
 URL: https://issues.apache.org/jira/browse/PDFBOX-3168
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
 Environment: 2.0.0-RC2
Reporter: Philip Helger


When embedding font subsets, theses subsets are included uncompressed in the 
PDF.
I assume it would makes sense to flate-encode them for space reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3143) Added PDEmbeddedFile constructor with COSName parameter

2015-12-16 Thread Philip Helger (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Helger updated PDFBOX-3143:
--
Attachment: 3143.patch

Voila a small patch to fulfil the task

> Added PDEmbeddedFile constructor with COSName parameter
> ---
>
> Key: PDFBOX-3143
> URL: https://issues.apache.org/jira/browse/PDFBOX-3143
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.0
> Environment: Version 2.0.0-RC2
>Reporter: Philip Helger
> Attachments: 3143.patch
>
>
> Since the "addCompression" method from PDStream got deprecated and instead 
> the "PDStream" constructor with "COSName" parameter should be used, please 
> also provide this constructor in all classes derived from "PDStream" where it 
> makes sense (especially in "PDEmbeddedFile")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3162) IllegalStateException in TTFSubsetter

2015-12-16 Thread Philip Helger (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059811#comment-15059811
 ] 

Philip Helger commented on PDFBOX-3162:
---

It happens some times but I couldn't find more information.
I assume it is a problem with 2.0.0-RC2 - I will update to the latest SNAPSHOT 
and see whether it might be originating from PDFBOX-2945

> IllegalStateException in TTFSubsetter
> -
>
> Key: PDFBOX-3162
> URL: https://issues.apache.org/jira/browse/PDFBOX-3162
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.0
> Environment: 2.0.0-RC2
>Reporter: Philip Helger
>
> Hi encountered a rare exception with an empty TTF subset:
> {code}
> ==> [1] caused by java.lang.IllegalStateException: subset is empty
> 1.: org.apache.fontbox.ttf.TTFSubsetter.writeToStream(TTFSubsetter.java:921)
> 2.: 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:304)
> 3.: org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:162)
> 4.: org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1102)
> 5.: com.helger.pdflayout.PageLayoutPDF.renderTo(PageLayoutPDF.java:276)
> {code}
> Unfortunately I don't know yet what was causing the problem, but I will 
> provide you with more details on Monday (if necessary).
> If there is nothing to subset - I think the call should simply be ignored???
> Or maybe this is a problem because of the "uni" name bug (in 2.0.0-RC2) I 
> reopened lately?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3167) IllegalArgumentException: dash lengths all zero

2015-12-16 Thread simon steiner (JIRA)
simon steiner created PDFBOX-3167:
-

 Summary: IllegalArgumentException: dash lengths all zero
 Key: PDFBOX-3167
 URL: https://issues.apache.org/jira/browse/PDFBOX-3167
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner


PDF from PDFBOX-624

java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
documenta_math.pdf

Exception in thread "main" java.lang.IllegalArgumentException: dash lengths all 
zero
at java.awt.BasicStroke.(BasicStroke.java:220)
at 
org.apache.pdfbox.rendering.PageDrawer.drawAnnotationLinkBorder(PageDrawer.java:929)
at 
org.apache.pdfbox.rendering.PageDrawer.showAnnotation(PageDrawer.java:858)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:191)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:236)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3166) Unwanted spaces before number in chinese text extraction

2015-12-16 Thread Gang Luo (JIRA)
Gang Luo created PDFBOX-3166:


 Summary: Unwanted spaces before number in chinese text extraction
 Key: PDFBOX-3166
 URL: https://issues.apache.org/jira/browse/PDFBOX-3166
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 2.0.0
 Environment: Windows
Reporter: Gang Luo


Unwanted spaces before number in chinese date text .
such as this pdf file
http://www.cninfo.com.cn/finalpage/2015-12-12/1201830823.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org