[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266689#comment-15266689
 ] 

Juraj Lonc commented on PDFBOX-3334:


Yes, I read data from TextPosition.
I mentioned "extended PDFTextStripper". There I read each character and 
TextPosition and Glyph.

But no object "TextPosition" remains in memory after GC. So this problem is not 
related to TextPosition.
See attached screenshot.

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: .pdfbox.cache, screenshot-1.png, screenshot-2.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-3334:
---
Attachment: screenshot-2.png

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: .pdfbox.cache, screenshot-1.png, screenshot-2.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-3334:
---
Attachment: .pdfbox.cache

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: .pdfbox.cache, screenshot-1.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266216#comment-15266216
 ] 

Juraj Lonc commented on PDFBOX-3334:


I have double checked it. None of my classes (related to pdfbox) remains in 
memory, so my classes/objectes do not keep reference to those pdbox/ttf objects.

I gave incomplete description, sorry fot that. I don't just open and render 
that file. I also use extended PDFTextStripper.


My theory (according to data from VisualVM) is this:
TrueTypeFont contains HashMap "tables".
This HashMap contains TTFTable objects.
These objects has reference back to TrueTypeFont.

This is the loop that prevents GC to dispose those objects. They all reference 
to each other.



> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: screenshot-1.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266204#comment-15266204
 ] 

Juraj Lonc commented on PDFBOX-3334:


Well, I am going to track it down again to check if the memory leak is 
somewhere in my code.

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: screenshot-1.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266188#comment-15266188
 ] 

Juraj Lonc commented on PDFBOX-3334:


Nope. When I open the document multiple times the count of objects remains the 
same.

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: screenshot-1.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-3334:
---
Attachment: screenshot-1.png

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: screenshot-1.png, 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-3334:
---
Attachment: skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf

> TrueType fonts memory leak
> --
>
> Key: PDFBOX-3334
> URL: https://issues.apache.org/jira/browse/PDFBOX-3334
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.1
>Reporter: Juraj Lonc
> Attachments: 
> skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf
>
>
> I open this PDF document, read all pages and render to images, close document.
> After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3334) TrueType fonts memory leak

2016-05-02 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-3334:
--

 Summary: TrueType fonts memory leak
 Key: PDFBOX-3334
 URL: https://issues.apache.org/jira/browse/PDFBOX-3334
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.1
Reporter: Juraj Lonc


I open this PDF document, read all pages and render to images, close document.
After running GC there are still TrueTypeFont objects in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2721) Invalid ToUnicode CMap in font

2015-03-20 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2721:
--

 Summary: Invalid ToUnicode CMap in font 
 Key: PDFBOX-2721
 URL: https://issues.apache.org/jira/browse/PDFBOX-2721
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc


Attached PDF file works fine in Adobe Reader, but PDFBox logs warnings:
2015-03-20 15:48:57,573 WARN  [org.apache.pdfbox.pdmodel.font.PDFont] 
(http-0.0.0.0-8080-7) Invalid ToUnicode CMap in font HPDFAA+Thoth-Unicode

It seems that you require beginbfchar or beginbfrange in CMap. But should 
it be required?
CMap definition contains beginnotdefrange and this is ignored in PDFBox.

PDF Reference says:
beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange
define notdef mappings from character codes to CIDs. As described in the
section “Handling Undefined Characters” on page 355, a notdef mapping is
used if the normal mapping produces a CID for which no glyph is present in
the associated CIDFont.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2721) Invalid ToUnicode CMap in font

2015-03-20 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2721:
---
Attachment: cmap_beginnotdefrange.pdf

 Invalid ToUnicode CMap in font 
 ---

 Key: PDFBOX-2721
 URL: https://issues.apache.org/jira/browse/PDFBOX-2721
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: cmap_beginnotdefrange.pdf


 Attached PDF file works fine in Adobe Reader, but PDFBox logs warnings:
 2015-03-20 15:48:57,573 WARN  [org.apache.pdfbox.pdmodel.font.PDFont] 
 (http-0.0.0.0-8080-7) Invalid ToUnicode CMap in font HPDFAA+Thoth-Unicode
 It seems that you require beginbfchar or beginbfrange in CMap. But should 
 it be required?
 CMap definition contains beginnotdefrange and this is ignored in PDFBox.
 PDF Reference says:
 beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange
 define notdef mappings from character codes to CIDs. As described in the
 section “Handling Undefined Characters” on page 355, a notdef mapping is
 used if the normal mapping produces a CID for which no glyph is present in
 the associated CIDFont.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2110:
--

 Summary: Font not found: CourierNew
 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc


PDF uses non-embedded font CourierNew.
OS contains font:
{code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
FontManager is not able to find it and warns:
{code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not 
found: CourierNew{code}

It seems that the problem is in that space in fotn name CourierNew vs. 
Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2110:
---

Description: 
PDF uses non-embedded font CourierNew.
OS contains font:
{code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
FontManager is not able to find it and warns:
{code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not 
found: CourierNew{code}

It seems that the problem is in that space in font name CourierNew vs. 
Courier New

  was:
PDF uses non-embedded font CourierNew.
OS contains font:
{code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
FontManager is not able to find it and warns:
{code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not 
found: CourierNew{code}

It seems that the problem is in that space in fotn name CourierNew vs. 
Courier New


 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc

 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2110:
---

Attachment: testpdf_monospace_DPH_032014.pdf

 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: testpdf_monospace_DPH_032014.pdf


 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016890#comment-14016890
 ] 

Juraj Lonc commented on PDFBOX-2110:


I tested CentOS and Ubuntu

 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: testpdf_monospace_DPH_032014.pdf


 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016904#comment-14016904
 ] 

Juraj Lonc commented on PDFBOX-2110:


I am using PDFBOX with my own modifications, 
that PDF is rendered correctly on Windows.

 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: testpdf_monospace_DPH_032014.pdf


 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016912#comment-14016912
 ] 

Juraj Lonc edited comment on PDFBOX-2110 at 6/3/14 5:47 PM:


I don't think your hardcoded mapping solves problem.
I am not using Liberation fonts (they are not installed either). Courier New 
is installed on that system, so it should be used, right?

In additiont to that, Liberation Mono is visually very different from 
Courier New


was (Author: chupacabras):
I don't think your hardcoded mapping solves problem.
I am not using Liberation fonts (they are not installed either). Courier New 
is installed on that system, so it should be used, right?

 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: testpdf_monospace_DPH_032014.pdf


 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew

2014-06-03 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2110:
---

Attachment: PDFBOX-2110_FontManager.diff

Look at this fix.

 Font not found: CourierNew
 --

 Key: PDFBOX-2110
 URL: https://issues.apache.org/jira/browse/PDFBOX-2110
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2110_FontManager.diff, 
 testpdf_monospace_DPH_032014.pdf


 PDF uses non-embedded font CourierNew.
 OS contains font:
 {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier 
 New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code}
 FontManager is not able to find it and warns:
 {code}WARN  [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font 
 not found: CourierNew{code}
 It seems that the problem is in that space in font name CourierNew vs. 
 Courier New



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1713) [PATCH] Bullet character not rendered

2014-05-26 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008683#comment-14008683
 ] 

Juraj Lonc commented on PDFBOX-1713:


Is this fix considered to be permanent or temporary?
I consider it ugly :(

That bullet character is properly defined in /ToUnicode mapping, and this 
mapping is ignored by pdfbox, IMHO.
I tried to explain proper way of handling this situation in PDFBOX-2093

Replacing all unknown characters to bullet is not a good idea, as there 
could be any unicode character in that /ToUnicode mapping

 [PATCH] Bullet character not rendered
 -

 Key: PDFBOX-1713
 URL: https://issues.apache.org/jira/browse/PDFBOX-1713
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 2.0.0
Reporter: Vincent Hennebert
Assignee: Andreas Lehmkühler
 Fix For: 1.8.6, 2.0.0

 Attachments: bullet.patch, bullet.pdf


 See attached file. In WinAnsiEncoding, any unused code greater than 040 maps 
 to the bullet character.
 The attached patch takes that into account to render characters that don't 
 use the standard encoding for bullet.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1713) [PATCH] Bullet character not rendered

2014-05-26 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008814#comment-14008814
 ] 

Juraj Lonc commented on PDFBOX-1713:


I had to verify that ;)
You are absolutely right.

I modified values in /ToUnicode. Any change had no impact on displaying those 
chars. Changes affected only copied text from Adobe Reader to some text editor 
(Word).

 [PATCH] Bullet character not rendered
 -

 Key: PDFBOX-1713
 URL: https://issues.apache.org/jira/browse/PDFBOX-1713
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 2.0.0
Reporter: Vincent Hennebert
Assignee: Andreas Lehmkühler
 Fix For: 1.8.6, 2.0.0

 Attachments: bullet.patch, bullet.pdf


 See attached file. In WinAnsiEncoding, any unused code greater than 040 maps 
 to the bullet character.
 The attached patch takes that into account to render characters that don't 
 use the standard encoding for bullet.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2093) bullet character is not rendered

2014-05-23 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2093:
--

 Summary: bullet character is not rendered
 Key: PDFBOX-2093
 URL: https://issues.apache.org/jira/browse/PDFBOX-2093
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: missing_bullet.pdf, output_missing_bullet.png

In this PDF is a bullet character which is not rendered.
There is some problem with translating code to glyph.

That character has code 127 (0x7F), but mapping for it is not found
{code}
14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping for 
127 not found
{code}

embedded font contains definition for bullet character.
But bullet character has code 183 in mapping table (from StandardEncoding, I 
suppose).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2093) bullet character is not rendered

2014-05-23 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2093:
---

Attachment: output_missing_bullet.png
missing_bullet.pdf

 bullet character is not rendered
 --

 Key: PDFBOX-2093
 URL: https://issues.apache.org/jira/browse/PDFBOX-2093
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: missing_bullet.pdf, output_missing_bullet.png


 In this PDF is a bullet character which is not rendered.
 There is some problem with translating code to glyph.
 That character has code 127 (0x7F), but mapping for it is not found
 {code}
 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping 
 for 127 not found
 {code}
 embedded font contains definition for bullet character.
 But bullet character has code 183 in mapping table (from StandardEncoding, 
 I suppose).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2093) bullet character is not rendered

2014-05-23 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007111#comment-14007111
 ] 

Juraj Lonc commented on PDFBOX-2093:


Sample PDF contains only one line of text. Original PDF contains more lines 
with that bullet (regular list).
I was playing around with it but was not able to figure out how to properly 
translate that character :(

 bullet character is not rendered
 --

 Key: PDFBOX-2093
 URL: https://issues.apache.org/jira/browse/PDFBOX-2093
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: missing_bullet.pdf, output_missing_bullet.png


 In this PDF is a bullet character which is not rendered.
 There is some problem with translating code to glyph.
 That character has code 127 (0x7F), but mapping for it is not found
 {code}
 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping 
 for 127 not found
 {code}
 embedded font contains definition for bullet character.
 But bullet character has code 183 in mapping table (from StandardEncoding, 
 I suppose).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2093) bullet character is not rendered

2014-05-23 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007131#comment-14007131
 ] 

Juraj Lonc commented on PDFBOX-2093:


There is unicode mapping for that character:
7F 2022
http://www.charbase.com/2022-unicode-bullet

So, shouldn't it work like this?
1. translate code to unicode value (by /ToUnicode mapping)
2. translate unicode value to character name
3. find glyph with that name

I think /ToUnicode is ignored in this case at this moment.

 bullet character is not rendered
 --

 Key: PDFBOX-2093
 URL: https://issues.apache.org/jira/browse/PDFBOX-2093
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: missing_bullet.pdf, output_missing_bullet.png


 In this PDF is a bullet character which is not rendered.
 There is some problem with translating code to glyph.
 That character has code 127 (0x7F), but mapping for it is not found
 {code}
 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping 
 for 127 not found
 {code}
 embedded font contains definition for bullet character.
 But bullet character has code 183 in mapping table (from StandardEncoding, 
 I suppose).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2089) Negative width of character

2014-05-22 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2089:
--

 Summary: Negative width of character
 Key: PDFBOX-2089
 URL: https://issues.apache.org/jira/browse/PDFBOX-2089
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: negative_width.pdf

This PDF contains text matrix:
-10.5679 0 0 -11.4 459.0349 19.4155 Tm

that causes IMHO wrong calculation of character width (and height).
Width and height calculated in PDFStreamEngine are negative numbers, because 
textMatrix.getXScale() gives negative value.

I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). 
Returning value should be fixed by Math.abs()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2089) Negative width of character

2014-05-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2089:
---

Attachment: negative_width.pdf

 Negative width of character
 ---

 Key: PDFBOX-2089
 URL: https://issues.apache.org/jira/browse/PDFBOX-2089
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: negative_width.pdf


 This PDF contains text matrix:
 -10.5679 0 0 -11.4 459.0349 19.4155 Tm
 that causes IMHO wrong calculation of character width (and height).
 Width and height calculated in PDFStreamEngine are negative numbers, because 
 textMatrix.getXScale() gives negative value.
 I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). 
 Returning value should be fixed by Math.abs()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2089) Negative width of character

2014-05-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2089:
---

Attachment: PDFBOX-2089_Matrix.diff

This is what I mean.
It looks quite logically, but I am not 100% sure whether it is PDF compliant 
or not.

 Negative width of character
 ---

 Key: PDFBOX-2089
 URL: https://issues.apache.org/jira/browse/PDFBOX-2089
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf


 This PDF contains text matrix:
 -10.5679 0 0 -11.4 459.0349 19.4155 Tm
 that causes IMHO wrong calculation of character width (and height).
 Width and height calculated in PDFStreamEngine are negative numbers, because 
 textMatrix.getXScale() gives negative value.
 I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). 
 Returning value should be fixed by Math.abs()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2089) Negative width of character

2014-05-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005693#comment-14005693
 ] 

Juraj Lonc commented on PDFBOX-2089:


OK, the plan B is to modify PDFStreamEngine and replace matrix.getXScale() 
with Math.abs(matrix.getXScale()).

Width and height of character should be always positive number, right?

 Negative width of character
 ---

 Key: PDFBOX-2089
 URL: https://issues.apache.org/jira/browse/PDFBOX-2089
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf


 This PDF contains text matrix:
 -10.5679 0 0 -11.4 459.0349 19.4155 Tm
 that causes IMHO wrong calculation of character width (and height).
 Width and height calculated in PDFStreamEngine are negative numbers, because 
 textMatrix.getXScale() gives negative value.
 I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). 
 Returning value should be fixed by Math.abs()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2089) Negative width of character

2014-05-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005912#comment-14005912
 ] 

Juraj Lonc commented on PDFBOX-2089:


I changed my application so everywhere I read character width I added line 
width=Math.abs(width).
So my code is immune to this issue.

Anyways, I am curious whether this is bug or not ;)

 Negative width of character
 ---

 Key: PDFBOX-2089
 URL: https://issues.apache.org/jira/browse/PDFBOX-2089
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf


 This PDF contains text matrix:
 -10.5679 0 0 -11.4 459.0349 19.4155 Tm
 that causes IMHO wrong calculation of character width (and height).
 Width and height calculated in PDFStreamEngine are negative numbers, because 
 textMatrix.getXScale() gives negative value.
 I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). 
 Returning value should be fixed by Math.abs()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2090) Glyph not found:3

2014-05-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005922#comment-14005922
 ] 

Juraj Lonc commented on PDFBOX-2090:


see
http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter08#ba57949e

 Glyph not found:3
 -

 Key: PDFBOX-2090
 URL: https://issues.apache.org/jira/browse/PDFBOX-2090
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
Priority: Minor

 There are some debug messages:
 {code}
 15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not 
 found:3
 {code}
 but glyph id #3 is reserved (according to TTF spec) so it is OK that this 
 glyph was not found in TTF font.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2090) Glyph not found:3

2014-05-22 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2090:
--

 Summary: Glyph not found:3
 Key: PDFBOX-2090
 URL: https://issues.apache.org/jira/browse/PDFBOX-2090
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
Priority: Minor


There are some debug messages:
{code}
15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not found:3
{code}

but glyph id #3 is reserved (according to TTF spec) so it is OK that this glyph 
was not found in TTF font.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2090) Glyph not found:3

2014-05-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005939#comment-14005939
 ] 

Juraj Lonc commented on PDFBOX-2090:


Instead of 
{code}
if (glyphId=3) return null;
{code}
could be better
{code}
if (glyphId==3) return null;
{code}
in case you would like to use glyph id #0 for drawing missing chars

 Glyph not found:3
 -

 Key: PDFBOX-2090
 URL: https://issues.apache.org/jira/browse/PDFBOX-2090
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
Priority: Minor
 Attachments: PDFBOX-2090_TTFGlyph2D.diff, glyph_id3.pdf


 There are some debug messages:
 {code}
 15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not 
 found:3
 {code}
 but glyph id #3 is reserved (according to TTF spec) so it is OK that this 
 glyph was not found in TTF font.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2091) Some characters are not rendered

2014-05-22 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2091:
--

 Summary: Some characters are not rendered
 Key: PDFBOX-2091
 URL: https://issues.apache.org/jira/browse/PDFBOX-2091
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: missing_yaccute.pdf, output.png

Some characters are not rendered (see attached PDF).
In this case it is yaccute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2091) Some characters are not rendered (font with symbol encoding)

2014-05-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2091:
---

Summary: Some characters are not rendered (font with symbol encoding)  
(was: Some characters are not rendered)

 Some characters are not rendered (font with symbol encoding)
 

 Key: PDFBOX-2091
 URL: https://issues.apache.org/jira/browse/PDFBOX-2091
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, 
 output.png


 Some characters are not rendered (see attached PDF).
 In this case it is yaccute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2091) Some characters are not rendered

2014-05-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2091:
---

Attachment: PDFBOX-2091_TTFGlyph2D.diff

Font uses symbol encoding. In TTFGlyph2D is CMAP parsed correctly, but then 
it is not used in getGlyphcode(int code).

I made fix for this.

 Some characters are not rendered
 

 Key: PDFBOX-2091
 URL: https://issues.apache.org/jira/browse/PDFBOX-2091
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, 
 output.png


 Some characters are not rendered (see attached PDF).
 In this case it is yaccute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2091) Some characters are not rendered (font with symbol encoding)

2014-05-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006839#comment-14006839
 ] 

Juraj Lonc commented on PDFBOX-2091:


In addition to that, I think TTFGlyph2D should take encoding value that is set 
in PDFont/COSObject and then use particular CMAP according to this encoding.
TTFGlyph2D currently doesn't care about encoding that is set in PDF.

 Some characters are not rendered (font with symbol encoding)
 

 Key: PDFBOX-2091
 URL: https://issues.apache.org/jira/browse/PDFBOX-2091
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, 
 output.png


 Some characters are not rendered (see attached PDF).
 In this case it is yaccute.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered.png
Obyčajné zásielky.pdf

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered.png


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999694#comment-13999694
 ] 

Juraj Lonc commented on PDFBOX-2074:


I am curious whether Adobe Reader ignores such entries (entries are invalid) or 
processes them (entries are valid).

 4-bytes CMap entry causes exception
 ---

 Key: PDFBOX-2074
 URL: https://issues.apache.org/jira/browse/PDFBOX-2074
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf


 I have PDF that has CMap entry consisting of 4 bytes. It is just one entry 
 with that size, other entries have 2-bytes.
 Adobe reader has no problems with that, PDFBox throws Exception.
 I think this Exception should not be thrown. It should be skipped or 
 truncated tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000280#comment-14000280
 ] 

Juraj Lonc commented on PDFBOX-2081:


I know that line completely disables clipping and I know it is not a solution ;)
I have used it just for description of the problem.

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered.png


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: (was: rendered.png)

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered_(with_null_clipping).png

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, 
 rendered_(with_null_clipping).png


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2081:
---

Attachment: rendered_(missing_lines).png

Previously uploaded file was not the one I wanted to upload. Now I have 
attached image that was actually rendered

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2081:
--

 Summary: Lines that exceeds clipping area are not drawn
 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered.png

PDF contains shapes that are partly on the paper and partly outside (shape 
overflows paper borders).
Those shapes are not rendered to image.

It is caused by clipping area.
When I replace line in PDFDrawer.strokePath()
{noformat}
graphics.setClip(getGraphicsState().getCurrentClippingPath());
{noformat}
to
{noformat}
graphics.setClip(null);
{noformat}
then everything is rendered correctly.

Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn

2014-05-16 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000320#comment-14000320
 ] 

Juraj Lonc commented on PDFBOX-2081:


I have also tried to replace
{code}
graphics.setClip(getGraphicsState().getCurrentClippingPath());
{code}
by
{code}
Rectangle2D rc0=getGraphicsState().getCurrentClippingPath().getBounds2D();
Rectangle2D rc1=new Rectangle2D.Double(rc0.getMinX(), rc0.getMinY(), 
rc0.getWidth()+1000, rc0.getHeight());
graphics.setClip(rc1);
{code}
so I made clipping area wider. This helped too - lines were rendered.

 Lines that exceeds clipping area are not drawn
 --

 Key: PDFBOX-2081
 URL: https://issues.apache.org/jira/browse/PDFBOX-2081
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, 
 rendered_(with_null_clipping).png


 PDF contains shapes that are partly on the paper and partly outside (shape 
 overflows paper borders).
 Those shapes are not rendered to image.
 It is caused by clipping area.
 When I replace line in PDFDrawer.strokePath()
 {noformat}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {noformat}
 to
 {noformat}
 graphics.setClip(null);
 {noformat}
 then everything is rendered correctly.
 Possibly bug in Java?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-15 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997810#comment-13997810
 ] 

Juraj Lonc commented on PDFBOX-2070:


I did not mean it as a replacement for this fix.
I meant it as an addition. For case that someone loads PDF that already has 
such wrong elements and saving would heal it.

But I understand that it is a complete different story for another issue.
That was just an idea.

 Filter.decode() modifies PDF if there is a filter array
 ---

 Key: PDFBOX-2070
 URL: https://issues.apache.org/jira/browse/PDFBOX-2070
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: after.pdf, before.pdf


 If there are several filters (filter array) in an image, PDFBox is inserting 
 an empty DecodeParms object here
 {code}
 params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
 {code}
 instead of either inserting an empty COSArray, or (better) do nothing. Saving 
 such a PDF results in it not being displayable in the Acrobat Reader.
 Test code:
 {code}
 PDDocument d = PDDocument.load(before.pdf);
 new PDFRenderer(d).renderImage(0);
 d.save(after.pdf);
 {code}
 The rendering is important because without it, the filtered objects aren't 
 decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array

2014-05-14 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997373#comment-13997373
 ] 

Juraj Lonc commented on PDFBOX-2070:


Thanks for fix.
Problem disappeared in my case, so this fix works for me.

In addition to that I made a workaround: Before I save document I remove all 
empty DecodeParms from images.
I don't know whether it is good idea to implement something similar into 
PDDocument.save(), so these evidently wrong elements would be skipped and not 
written to pdf file.

 Filter.decode() modifies PDF if there is a filter array
 ---

 Key: PDFBOX-2070
 URL: https://issues.apache.org/jira/browse/PDFBOX-2070
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: after.pdf, before.pdf


 If there are several filters (filter array) in an image, PDFBox is inserting 
 an empty DecodeParms object here
 {code}
 params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index));
 {code}
 instead of either inserting an empty COSArray, or (better) do nothing. Saving 
 such a PDF results in it not being displayable in the Acrobat Reader.
 Test code:
 {code}
 PDDocument d = PDDocument.load(before.pdf);
 new PDFRenderer(d).renderImage(0);
 d.save(after.pdf);
 {code}
 The rendering is important because without it, the filtered objects aren't 
 decoded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5

2014-05-14 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992592#comment-13992592
 ] 

Juraj Lonc commented on PDFBOX-2057:


I see the fix in ImageFactory.getAlphaImage(BufferedImage image).
But isn't it much easier to do it like this?

WritableRaster alphaRaster = image.getAlphaRaster();
BufferedImage bi=new BufferedImage(alphaRaster.getWidth(), 
alphaRaster.getHeight(), BufferedImage.TYPE_BYTE_GRAY);
bi.setData(alphaRaster);

 Importing BufferedImage into PDPixelMap is broken in 1.8.5
 --

 Key: PDFBOX-2057
 URL: https://issues.apache.org/jira/browse/PDFBOX-2057
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.5, 1.8.6
 Environment: windows vista / jdk 1.7.0_45
Reporter: Michaël Michaud
Assignee: Tilman Hausherr
  Labels: regression
 Fix For: 1.8.6, 2.0.0

 Attachments: CS-Convocation entretien signed.pdf, 
 renderTransparentImage.zip


 Try to import a BufferedImage in a PDDocument with PDPixelMap
 BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the 
 pdf file contains instruction /ColorSpace /DeviceGray)
 BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox 
 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB).
 Code used to demonstrate the problem is as follows (image has also been 
 colored with some Graphics instructions to demonstrate that 1.8.4 is working) 
 :
 {code}
 try {
 PDDocument doc = new PDDocument();
 PDPage page = new PDPage();
 doc.addPage(page);
 BufferedImage awtImage = new BufferedImage(100,100, 
 BufferedImage.TYPE_4BYTE_ABGR);
 PDPixelMap ximage = new PDPixelMap(doc, awtImage);
 PDPageContentStream contentStream = new PDPageContentStream(doc, 
 page);
 contentStream.drawXObject(ximage, 200, 200, 100, 100);
 contentStream.close();
 doc.save(C:\\Temp\\PDF\\test185_4babgr.pdf);
 } catch(COSVisitorException|IOException e) {
 e.printStackTrace();
 }
 {code}
 I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an 
 exception with PDFBox 1.8.4 and 1.8.5 :
 {code}
 Exception in thread main java.lang.IllegalArgumentException: Raster 
 IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = 
 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 
 numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 
 transparency = 1 has alpha = false isAlphaPre = false
   at java.awt.image.BufferedImage.init(BufferedImage.java:630)
   at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107)
 {code}
 My main purpose was to use a BufferedImage with a CMYK ColorSpace, but 
 PDPixelMap seems to accept 1 component and 3 component ColorSpace only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2077) Empty (invalid) DecodeParms is added to image

2014-05-13 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2077:
--

 Summary: Empty (invalid) DecodeParms is added to image
 Key: PDFBOX-2077
 URL: https://issues.apache.org/jira/browse/PDFBOX-2077
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc


PDF contains image (xobject), it as no /DecodeParms.
PDFBox adds empty /DecodeParms to this image which results to invalid PDF and 
Adobe reader complains about it.

Problem is caused by calling PDResources.getXObjects().

It is very similar to PDFBOX-2042



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2077) Empty (invalid) DecodeParms is added to image

2014-05-13 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996241#comment-13996241
 ] 

Juraj Lonc edited comment on PDFBOX-2077 at 5/13/14 10:14 AM:
--

{noformat}
PDDocument pdDoc=PDDocument.load(f);
PDPage pdPage=(PDPage)pdDoc.getDocumentCatalog().getAllPages().get(0);

PDResources res=pdPage.findResources();
// this is the guilty line
res.getXObjects();

File fout=new File(resaved.pdf);
pdDoc.save(fout);
{noformat}


was (Author: chupacabras):
DDocument pdDoc=PDDocument.load(f);
PDPage 
pdPage=(PDPage)pdDoc.getDocumentCatalog().getAllPages().get(0);


PDResources res=pdPage.findResources();
// this is the guilty line
res.getXObjects();

File fout=new File(resaved.pdf);
pdDoc.save(fout);


 Empty (invalid) DecodeParms is added to image
 -

 Key: PDFBOX-2077
 URL: https://issues.apache.org/jira/browse/PDFBOX-2077
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: input_image.pdf, resaved.pdf


 PDF contains image (xobject), it as no /DecodeParms.
 PDFBox adds empty /DecodeParms to this image which results to invalid PDF 
 and Adobe reader complains about it.
 Problem is caused by calling PDResources.getXObjects().
 It is very similar to PDFBOX-2042



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2072) Wrong calculation of space char width in PDFStreamEngine

2014-05-12 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2072:
--

 Summary: Wrong calculation of space char width in PDFStreamEngine
 Key: PDFBOX-2072
 URL: https://issues.apache.org/jira/browse/PDFBOX-2072
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc


PDFStreamEngine calculates width of space character wrongly.
Page's content stream contains this operation:
0 12 -12 0 562.3199 372.7105 Tm

and that causes PDFStreamEngine calculate width of   to value 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2072) Wrong calculation of space char width in PDFStreamEngine

2014-05-12 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2072:
---

Attachment: PDFBOX-2072_PDFStreamEngine.diff

I made fix for this.

 Wrong calculation of space char width in PDFStreamEngine
 

 Key: PDFBOX-2072
 URL: https://issues.apache.org/jira/browse/PDFBOX-2072
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2072_PDFStreamEngine.diff


 PDFStreamEngine calculates width of space character wrongly.
 Page's content stream contains this operation:
 0 12 -12 0 562.3199 372.7105 Tm
 and that causes PDFStreamEngine calculate width of   to value 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-12 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2074:
---

Attachment: PDFBOX-2074_CMap.diff
pdf_with_4B_cmap_entry.pdf

 4-bytes CMap entry causes exception
 ---

 Key: PDFBOX-2074
 URL: https://issues.apache.org/jira/browse/PDFBOX-2074
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf


 I have PDF that has CMap entry consisting of 4 bytes. It is just one entry 
 with that size, other entries have 2-bytes.
 Adobe reader has no problems with that, PDFBox throws Exception.
 I think this Exception should not be thrown. It should be skipped or 
 truncated tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-12 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995559#comment-13995559
 ] 

Juraj Lonc commented on PDFBOX-2074:


I have no idea how to properly handle entries that are longer than 2 bytes.
But I think it is better skip them and not throw Exception there. Just logging 
warning or error should be fine.

If somebody tries to render (to image) such PDF now it will fail.
I suggest to remove that Exception so PDF will be rendered. Rendered image will 
be most likely ok. Maybe some char will not be drawn.

 4-bytes CMap entry causes exception
 ---

 Key: PDFBOX-2074
 URL: https://issues.apache.org/jira/browse/PDFBOX-2074
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf


 I have PDF that has CMap entry consisting of 4 bytes. It is just one entry 
 with that size, other entries have 2-bytes.
 Adobe reader has no problems with that, PDFBox throws Exception.
 I think this Exception should not be thrown. It should be skipped or 
 truncated tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-12 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2074:
--

 Summary: 4-bytes CMap entry causes exception
 Key: PDFBOX-2074
 URL: https://issues.apache.org/jira/browse/PDFBOX-2074
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc


I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with 
that size, other entries have 2-bytes.

Adobe reader has no problems with that, PDFBox throws Exception.

I think this Exception should not be thrown. It should be skipped or truncated 
tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2075) Texts are not properly positioned/sized

2014-05-12 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2075:
---

Attachment: output.png
ozn_dmv_1_2008.pdf

 Texts are not properly positioned/sized
 ---

 Key: PDFBOX-2075
 URL: https://issues.apache.org/jira/browse/PDFBOX-2075
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: output.png, ozn_dmv_1_2008.pdf


 Texts in this PDF are displayed somehow strange.
 It seems that first half of texts are little bit wider so that causes texts 
 to overlap on several places.
 I was not able to figure out what caused it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2075) Texts are not properly positioned/sized

2014-05-12 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2075:
--

 Summary: Texts are not properly positioned/sized
 Key: PDFBOX-2075
 URL: https://issues.apache.org/jira/browse/PDFBOX-2075
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: output.png, ozn_dmv_1_2008.pdf

Texts in this PDF are displayed somehow strange.
It seems that first half of texts are little bit wider so that causes texts to 
overlap on several places.

I was not able to figure out what caused it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2075) Texts are not properly positioned/sized

2014-05-12 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995696#comment-13995696
 ] 

Juraj Lonc commented on PDFBOX-2075:


Or it is something wrong with positioning, so the right parts of those lines 
are not moved enough to the right side and thus overlapping with the left parts

 Texts are not properly positioned/sized
 ---

 Key: PDFBOX-2075
 URL: https://issues.apache.org/jira/browse/PDFBOX-2075
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: output.png, ozn_dmv_1_2008.pdf


 Texts in this PDF are displayed somehow strange.
 It seems that first half of texts are little bit wider so that causes texts 
 to overlap on several places.
 I was not able to figure out what caused it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Closed] (PDFBOX-2075) Texts are not properly positioned/sized

2014-05-12 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc closed PDFBOX-2075.
--

Resolution: Not a Problem

I had source code inconsistency.
My bad.

 Texts are not properly positioned/sized
 ---

 Key: PDFBOX-2075
 URL: https://issues.apache.org/jira/browse/PDFBOX-2075
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: output.png, ozn_dmv_1_2008.pdf


 Texts in this PDF are displayed somehow strange.
 It seems that first half of texts are little bit wider so that causes texts 
 to overlap on several places.
 I was not able to figure out what caused it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2074) 4-bytes CMap entry causes exception

2014-05-12 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995559#comment-13995559
 ] 

Juraj Lonc edited comment on PDFBOX-2074 at 5/12/14 7:57 PM:
-

I have no idea how to properly handle entries that are longer than 2 bytes.
But I think it is better to skip them and not throw Exception there. Just 
logging of warning or error should be fine.

If somebody tries to render (to image) such PDF now it will fail.
I suggest to remove that Exception so PDF will be rendered. Rendered image will 
be most likely ok. Maybe some char will not be drawn.


was (Author: chupacabras):
I have no idea how to properly handle entries that are longer than 2 bytes.
But I think it is better skip them and not throw Exception there. Just logging 
warning or error should be fine.

If somebody tries to render (to image) such PDF now it will fail.
I suggest to remove that Exception so PDF will be rendered. Rendered image will 
be most likely ok. Maybe some char will not be drawn.

 4-bytes CMap entry causes exception
 ---

 Key: PDFBOX-2074
 URL: https://issues.apache.org/jira/browse/PDFBOX-2074
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf


 I have PDF that has CMap entry consisting of 4 bytes. It is just one entry 
 with that size, other entries have 2-bytes.
 Adobe reader has no problems with that, PDFBox throws Exception.
 I think this Exception should not be thrown. It should be skipped or 
 truncated tu 2 bytes and write warning to log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2067) Error creating JPEG image with SMask

2014-05-11 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992584#comment-13992584
 ] 

Juraj Lonc commented on PDFBOX-2067:


My problem was, that I was adding some images and then reading xobjects from 
page's resources. This gave me an exception:

Exception in thread main java.lang.ClassCastException: 
java.awt.image.DataBufferInt cannot be cast to java.awt.image.DataBufferByte
at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:124)
at org.apache.pdfbox.filter.Filter.decode(Filter.java:58)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:337)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278)
at org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:235)
at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:94)
at 
org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:65)
at 
org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:247)

It took me a half day to track the cause down to that single line ;)
But I like brain teasers ;)

 Error creating JPEG image with SMask
 

 Key: PDFBOX-2067
 URL: https://issues.apache.org/jira/browse/PDFBOX-2067
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Attachments: PDFBOX-2067_JPEGFactory.diff


 JPEGFactory.createFromImage() has problems with images with transparency 
 (alpha data).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5

2014-05-10 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992976#comment-13992976
 ] 

Juraj Lonc commented on PDFBOX-2057:


That was just a suggestion ;) I did not make wide testing.

 Importing BufferedImage into PDPixelMap is broken in 1.8.5
 --

 Key: PDFBOX-2057
 URL: https://issues.apache.org/jira/browse/PDFBOX-2057
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.5, 1.8.6
 Environment: windows vista / jdk 1.7.0_45
Reporter: Michaël Michaud
Assignee: Tilman Hausherr
  Labels: regression
 Fix For: 1.8.6, 2.0.0

 Attachments: CS-Convocation entretien signed.pdf, CS-Convocation 
 entretien-IText.pdf, CS-Convocation entretien-PDFBox-with-workarround.pdf, 
 CS-Convocation entretien-PDFBox.pdf, ImageFilterOp.java, 
 differentBufferedImages.pdf, renderTransparentImage.zip


 Try to import a BufferedImage in a PDDocument with PDPixelMap
 BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the 
 pdf file contains instruction /ColorSpace /DeviceGray)
 BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox 
 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB).
 Code used to demonstrate the problem is as follows (image has also been 
 colored with some Graphics instructions to demonstrate that 1.8.4 is working) 
 :
 {code}
 try {
 PDDocument doc = new PDDocument();
 PDPage page = new PDPage();
 doc.addPage(page);
 BufferedImage awtImage = new BufferedImage(100,100, 
 BufferedImage.TYPE_4BYTE_ABGR);
 PDPixelMap ximage = new PDPixelMap(doc, awtImage);
 PDPageContentStream contentStream = new PDPageContentStream(doc, 
 page);
 contentStream.drawXObject(ximage, 200, 200, 100, 100);
 contentStream.close();
 doc.save(C:\\Temp\\PDF\\test185_4babgr.pdf);
 } catch(COSVisitorException|IOException e) {
 e.printStackTrace();
 }
 {code}
 I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an 
 exception with PDFBox 1.8.4 and 1.8.5 :
 {code}
 Exception in thread main java.lang.IllegalArgumentException: Raster 
 IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = 
 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 
 numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 
 transparency = 1 has alpha = false isAlphaPre = false
   at java.awt.image.BufferedImage.init(BufferedImage.java:630)
   at 
 org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107)
 {code}
 My main purpose was to use a BufferedImage with a CMYK ColorSpace, but 
 PDPixelMap seems to accept 1 component and 3 component ColorSpace only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)

2014-05-06 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990395#comment-13990395
 ] 

Juraj Lonc commented on PDFBOX-2059:


I made 2 fixes for this. Both are for PDTrueTypeFont. I don't know whether it 
is useful or you already have some other plans for this bug.

1. added getFontWidth() where I calculate widths from TTF font. Right now you 
are relying only on widths defined within PDF
2. modified getExternalFontFile2() so I am looking for system fonts too. Right 
now you are using only fonts defined in PDFBox_External_Fonts.properties

 Characters are not positioned properly (due to wrong widthheight of chars)
 ---

 Key: PDFBOX-2059
 URL: https://issues.apache.org/jira/browse/PDFBOX-2059
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
Assignee: Andreas Lehmkühler
 Attachments: DPH 032014.pdf


 Characters in this PDF are not positioned properly.
 All characters are rendered at position x=0.0
 Problem is in PDFont.getFontWidth(). it returns 0.0 for every char.
 The same applies for PDFont.getFontHeight()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-62) Incorrect (zero) character widths returned in some docs

2014-05-06 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-62:
-

Attachment: PDFBOX-2059_PDTrueTypeFont.diff

Just let me know if it is useful somehow ;)

 Incorrect (zero) character widths returned in some docs
 ---

 Key: PDFBOX-62
 URL: https://issues.apache.org/jira/browse/PDFBOX-62
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Assignee: Andreas Lehmkühler
 Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, 
 PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, 
 pdfbox-62-zerowidth.pdf-1.png


 [imported from SourceForge]
 http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674
 Originally submitted by tamirhassan on 2005-06-07 13:42.
 For certain PDF documents (such as the one attached) 
 the character/string widths (as obtained e.g. by the 
 PDFont.getStringWidth method) are not returned 
 correctly, i.e. they appear to be correct for punctuation 
 characters but are zero for alphanumeric characters.  
 It seems as if these alphanumeric characters are NOT 
 within PDFont.firstChar and PDFont.lastChar in the 
 Type 1 font.  The method therefore attempts to obtain 
 the font widths from the AFM (font metric) file, but fails 
 (silently) with a 'resource is null' logline message.
 (Note that this problem doesn't seem to occur with Type 
 1 fonts in other documents.)
 A more detailed discussion regarding this issue can be 
 found in this link:
 http://sourceforge.net/forum/forum.php?
 thread_id=1260349forum_id=267205
 Thanks in advance for any help that can be obtained,
 Tam



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-62) Incorrect (zero) character widths returned in some docs

2014-05-06 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990540#comment-13990540
 ] 

Juraj Lonc commented on PDFBOX-62:
--

I made 2 quick fixes for this (actually it is for PDFBOX-2059). Both are for 
PDTrueTypeFont. I don't know whether it is useful or you already have some 
other plans for this bug.

1. added getFontWidth() where I calculate widths from TTF font. Right now you 
are relying only on widths defined within PDF
2. modified getExternalFontFile2() so I am looking for system fonts too. Right 
now you are using only fonts defined in PDFBox_External_Fonts.properties

It works, but take it just for inspiration. It should be moved do PDFont (I 
guess) and make it more robust.

 Incorrect (zero) character widths returned in some docs
 ---

 Key: PDFBOX-62
 URL: https://issues.apache.org/jira/browse/PDFBOX-62
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Assignee: Andreas Lehmkühler
 Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, 
 PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, 
 pdfbox-62-zerowidth.pdf-1.png


 [imported from SourceForge]
 http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674
 Originally submitted by tamirhassan on 2005-06-07 13:42.
 For certain PDF documents (such as the one attached) 
 the character/string widths (as obtained e.g. by the 
 PDFont.getStringWidth method) are not returned 
 correctly, i.e. they appear to be correct for punctuation 
 characters but are zero for alphanumeric characters.  
 It seems as if these alphanumeric characters are NOT 
 within PDFont.firstChar and PDFont.lastChar in the 
 Type 1 font.  The method therefore attempts to obtain 
 the font widths from the AFM (font metric) file, but fails 
 (silently) with a 'resource is null' logline message.
 (Note that this problem doesn't seem to occur with Type 
 1 fonts in other documents.)
 A more detailed discussion regarding this issue can be 
 found in this link:
 http://sourceforge.net/forum/forum.php?
 thread_id=1260349forum_id=267205
 Thanks in advance for any help that can be obtained,
 Tam



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-62) Incorrect (zero) character widths returned in some docs

2014-05-06 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990699#comment-13990699
 ] 

Juraj Lonc commented on PDFBOX-62:
--

Yes, I am aware of that.
However I think it is better than nothing. It works on current JDK.

It is possible to replace those used sun.* classes by own implementation of 
system font look-up. That would be quite easy.

 Incorrect (zero) character widths returned in some docs
 ---

 Key: PDFBOX-62
 URL: https://issues.apache.org/jira/browse/PDFBOX-62
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Text extraction
Assignee: Andreas Lehmkühler
 Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, 
 PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, 
 pdfbox-62-zerowidth.pdf-1.png


 [imported from SourceForge]
 http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674
 Originally submitted by tamirhassan on 2005-06-07 13:42.
 For certain PDF documents (such as the one attached) 
 the character/string widths (as obtained e.g. by the 
 PDFont.getStringWidth method) are not returned 
 correctly, i.e. they appear to be correct for punctuation 
 characters but are zero for alphanumeric characters.  
 It seems as if these alphanumeric characters are NOT 
 within PDFont.firstChar and PDFont.lastChar in the 
 Type 1 font.  The method therefore attempts to obtain 
 the font widths from the AFM (font metric) file, but fails 
 (silently) with a 'resource is null' logline message.
 (Note that this problem doesn't seem to occur with Type 
 1 fonts in other documents.)
 A more detailed discussion regarding this issue can be 
 found in this link:
 http://sourceforge.net/forum/forum.php?
 thread_id=1260349forum_id=267205
 Thanks in advance for any help that can be obtained,
 Tam



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)

2014-05-05 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2059:
--

 Summary: Characters are not positioned properly (due to wrong 
widthheight of chars)
 Key: PDFBOX-2059
 URL: https://issues.apache.org/jira/browse/PDFBOX-2059
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: DPH 032014.pdf

Characters in this PDF are not positioned properly.
All characters are rendered at position x=0.0
Problem is in PDFont.getFontWidth(). it returns 0.0 for every char.
The same applies for PDFont.getFontHeight()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)

2014-05-05 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2059:
---

Attachment: DPH 032014.pdf

 Characters are not positioned properly (due to wrong widthheight of chars)
 ---

 Key: PDFBOX-2059
 URL: https://issues.apache.org/jira/browse/PDFBOX-2059
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: DPH 032014.pdf


 Characters in this PDF are not positioned properly.
 All characters are rendered at position x=0.0
 Problem is in PDFont.getFontWidth(). it returns 0.0 for every char.
 The same applies for PDFont.getFontHeight()



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-24 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: ModifyTest.java

Here is the sample code.
Actually I do not need to modify content of page. Problem is caused just by 
calling pdResources.getColorSpaces(); and then saving document.

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array

2014-04-24 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980165#comment-13980165
 ] 

Juraj Lonc commented on PDFBOX-2042:


Thanks for fix ;)

 ColorSpace with empty Range array
 -

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.8.4, 1.8.5, 2.0.0
Reporter: Juraj Lonc
Assignee: Tilman Hausherr
 Fix For: 1.8.5, 2.0.0

 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-2042:
--

 Summary: ColorSpace without Range
 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc


I have PDF document where I am modifying PDPage content stream.
Saved document is invalid (Adobe reader complains about it).

I have narrowed it down to ColorSpace. 

Original document has colorspace:
/ColorSpace 
/Cs6 [/ICCBased 
/Alternate /DeviceRGB
/Filter /FlateDecode
/Length 2597
/N 3
]

Modified document has colorspace:
/ColorSpace 
/Cs6 [/ICCBased 
/Alternate /DeviceRGB
/Filter /FlateDecode
/Length 2597
/N 3
/Range []
]

When I manually remove /Range [] from PDF then Adobe reader opens it without 
an error.

Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: pdfbox18.pdf

Original (working) file.

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: pdfbox18.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2042) ColorSpace without Range

2014-04-23 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-2042:
---

Attachment: pdfbox20.pdf

Modified file in pdfbox 2.0.0 (error in Adobe Reader)

 ColorSpace without Range
 

 Key: PDFBOX-2042
 URL: https://issues.apache.org/jira/browse/PDFBOX-2042
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.0
Reporter: Juraj Lonc
 Attachments: pdfbox18.pdf, pdfbox20.pdf


 I have PDF document where I am modifying PDPage content stream.
 Saved document is invalid (Adobe reader complains about it).
 I have narrowed it down to ColorSpace. 
 Original document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 ]
 Modified document has colorspace:
 /ColorSpace 
 /Cs6 [/ICCBased 
 /Alternate /DeviceRGB
 /Filter /FlateDecode
 /Length 2597
 /N 3
 /Range []
 ]
 When I manually remove /Range [] from PDF then Adobe reader opens it 
 without an error.
 Obviously that range is added by calling PDICCBased.getRangeArray(0) 
 somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox

2013-03-22 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-1547:
--

 Summary: TextPosition.getX() and getY() do not work properly with 
CropBox
 Key: PDFBOX-1547
 URL: https://issues.apache.org/jira/browse/PDFBOX-1547
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc


TextPosition.getX() and getY() are supposed to calculate position relative to 
upper left corner of page.
When PDF contains CropBox then these functions return incorrect values. CropBox 
is ignored.
Text is relative to CropBox coordinates.

page in function description means MediaBox or CropBox?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox

2013-03-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1547:
---

Attachment: redig_test_crop3.pdf

 TextPosition.getX() and getY() do not work properly with CropBox
 

 Key: PDFBOX-1547
 URL: https://issues.apache.org/jira/browse/PDFBOX-1547
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc
 Attachments: redig_test_crop3.pdf


 TextPosition.getX() and getY() are supposed to calculate position relative to 
 upper left corner of page.
 When PDF contains CropBox then these functions return incorrect values. 
 CropBox is ignored.
 Text is relative to CropBox coordinates.
 page in function description means MediaBox or CropBox?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox

2013-03-22 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1547:
---

Description: 
TextPosition.getX() and getY() are supposed to calculate position relative to 
upper left corner of page.
When PDF contains CropBox then these functions return incorrect values. CropBox 
is ignored.
Text is relative to CropBox coordinates but calculations are made only with 
pageWidth and pageHeight, and that is wrong.

page in function description means MediaBox or CropBox?


  was:
TextPosition.getX() and getY() are supposed to calculate position relative to 
upper left corner of page.
When PDF contains CropBox then these functions return incorrect values. CropBox 
is ignored.
Text is relative to CropBox coordinates.

page in function description means MediaBox or CropBox?



 TextPosition.getX() and getY() do not work properly with CropBox
 

 Key: PDFBOX-1547
 URL: https://issues.apache.org/jira/browse/PDFBOX-1547
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc
 Attachments: redig_test_crop3.pdf


 TextPosition.getX() and getY() are supposed to calculate position relative to 
 upper left corner of page.
 When PDF contains CropBox then these functions return incorrect values. 
 CropBox is ignored.
 Text is relative to CropBox coordinates but calculations are made only with 
 pageWidth and pageHeight, and that is wrong.
 page in function description means MediaBox or CropBox?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-21 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1538:
---

Attachment: PDFBOX-1538_PageDrawer.diff
redig_test_textAdded_annot.pdf

I made fix for this in PageDrawer.

I was not able to find any information about this in PDF reference.
So I don't know whether FreeText annotation subtype has to be handled in 
different way than other annotation subtypes or not.
But my fix works for attached sample PDF file.

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: output.png, PDFBOX-1538_PageDrawer.diff, 
 redig_test_textAdded_annot.pdf, redig_test_textAdded.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2013-03-21 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609022#comment-13609022
 ] 

Juraj Lonc commented on PDFBOX-1545:


This iteration is not supposed to give you whole words.
This iteration gives you tokes exactly in the same way they are stored in PDF. 
Every single letter could be stored separately.

 ReplaceString fails to replace text, however RemoveText or TextExtraction 
 works fine
 

 Key: PDFBOX-1545
 URL: https://issues.apache.org/jira/browse/PDFBOX-1545
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.1
 Environment: ubuntu 32bit, Java 6
Reporter: MartinV
  Labels: patch
   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
 in this pdf :
 https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
 (anyone with link can view and download it...)
 As i found during iteration in Tj and tj operations :
  COSString previous = (COSString)tokens.get( j-1 );
  String string = previous.getString();
 Those strings are just empty or with length of 2 (some whitespaces only) ... 
 i would expect to get some separated group of words from my PDF.
 I tried this on version 1.7.1 and then i download latest code from SVN 
 (today) and both version had the same behaviour. I my PDF special in any way 
 or which objects should be explored next ? I tried another two PDF downloaded 
 from google drive and both had the same issue (maybe google formats PDF in 
 special way ?).
 I am suprised that RemoveText works fine in this PDF and also test extraction 
 give me good result - so there must be a way... Thank you
 PS: I don`t mind to fix bug on my own it but i do not have any significant 
 knowledge of internal PDF structure. Hints welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

2013-03-21 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609022#comment-13609022
 ] 

Juraj Lonc edited comment on PDFBOX-1545 at 3/21/13 3:13 PM:
-

This iteration is not supposed to give you whole words.
This iteration gives you tokens exactly in the same way they are stored in PDF. 
Every single letter could be stored separately.

  was (Author: chupacabras):
This iteration is not supposed to give you whole words.
This iteration gives you tokes exactly in the same way they are stored in PDF. 
Every single letter could be stored separately.
  
 ReplaceString fails to replace text, however RemoveText or TextExtraction 
 works fine
 

 Key: PDFBOX-1545
 URL: https://issues.apache.org/jira/browse/PDFBOX-1545
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 1.7.1
 Environment: ubuntu 32bit, Java 6
Reporter: MartinV
  Labels: patch
   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings 
 in this pdf :
 https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
 (anyone with link can view and download it...)
 As i found during iteration in Tj and tj operations :
  COSString previous = (COSString)tokens.get( j-1 );
  String string = previous.getString();
 Those strings are just empty or with length of 2 (some whitespaces only) ... 
 i would expect to get some separated group of words from my PDF.
 I tried this on version 1.7.1 and then i download latest code from SVN 
 (today) and both version had the same behaviour. I my PDF special in any way 
 or which objects should be explored next ? I tried another two PDF downloaded 
 from google drive and both had the same issue (maybe google formats PDF in 
 special way ?).
 I am suprised that RemoveText works fine in this PDF and also test extraction 
 give me good result - so there must be a way... Thank you
 PS: I don`t mind to fix bug on my own it but i do not have any significant 
 knowledge of internal PDF structure. Hints welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-21 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1538:
---

Attachment: PDFBOX-1538_PageDrawer.diff

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: output.png, PDFBOX-1538_PageDrawer.diff, 
 redig_test_textAdded_annot.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-21 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1538:
---

Attachment: (was: PDFBOX-1538_PageDrawer.diff)

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: output.png, PDFBOX-1538_PageDrawer.diff, 
 redig_test_textAdded_annot.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-13 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601283#comment-13601283
 ] 

Juraj Lonc commented on PDFBOX-1538:


It seems that problem is in transformations.
Problem is not with font or color.

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: output.png, redig_test_textAdded.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-11 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1538:
---

Attachment: output.png
redig_test_textAdded.pdf

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc
 Attachments: output.png, redig_test_textAdded.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-11 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-1538:
--

 Summary: Content of annotation not visible in image (converted 
from pdf)
 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc
 Attachments: output.png, redig_test_textAdded.pdf

pdPage.convertToImage converts pdf to image but content of annotation is missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)

2013-03-11 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1538:
---

Affects Version/s: 1.7.1

 Content of annotation not visible in image (converted from pdf)
 ---

 Key: PDFBOX-1538
 URL: https://issues.apache.org/jira/browse/PDFBOX-1538
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: output.png, redig_test_textAdded.pdf


 pdPage.convertToImage converts pdf to image but content of annotation is 
 missing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1503) double logging of exceptions

2013-01-30 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-1503:
--

 Summary: double logging of exceptions
 Key: PDFBOX-1503
 URL: https://issues.apache.org/jira/browse/PDFBOX-1503
 Project: PDFBox
  Issue Type: Improvement
Reporter: Juraj Lonc


I made web application which uses pdfbox library and its funcionality.
This web application is deployed on jboss. (log4j is used for logging)

If some exception occurs in pdfbox then exception is printed twice that is not 
good. It makes mess and it is hard to use SMTPAppender or other appender that 
processes log events.

Problem is that you are using everywhere construction:

try {
...
} catch( Exception e ) {
 e.printStackTrace();
 LOG.error(e, e);
}

I think it would be nice to have it like this:

try {
...
} catch( Exception e ) {
 LOG.error(e, e);
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1503) double logging of exceptions

2013-01-30 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1503:
---

Description: 
I made web application which uses pdfbox library and its funcionality.
This web application is deployed on jboss. (log4j is used for logging)

If some exception occurs in pdfbox then exception is printed twice. That is not 
good. It makes mess in log and it is hard to use SMTPAppender or other appender 
that processes log events.

Problem is that you are using everywhere construction:

try {
...
} catch( Exception e ) {
 e.printStackTrace();
 LOG.error(e, e);
}

I think it would be nice to have it like this:

try {
...
} catch( Exception e ) {
 LOG.error(e, e);
}


  was:
I made web application which uses pdfbox library and its funcionality.
This web application is deployed on jboss. (log4j is used for logging)

If some exception occurs in pdfbox then exception is printed twice that is not 
good. It makes mess and it is hard to use SMTPAppender or other appender that 
processes log events.

Problem is that you are using everywhere construction:

try {
...
} catch( Exception e ) {
 e.printStackTrace();
 LOG.error(e, e);
}

I think it would be nice to have it like this:

try {
...
} catch( Exception e ) {
 LOG.error(e, e);
}



 double logging of exceptions
 

 Key: PDFBOX-1503
 URL: https://issues.apache.org/jira/browse/PDFBOX-1503
 Project: PDFBox
  Issue Type: Improvement
Reporter: Juraj Lonc

 I made web application which uses pdfbox library and its funcionality.
 This web application is deployed on jboss. (log4j is used for logging)
 If some exception occurs in pdfbox then exception is printed twice. That is 
 not good. It makes mess in log and it is hard to use SMTPAppender or other 
 appender that processes log events.
 Problem is that you are using everywhere construction:
 try {
 ...
 } catch( Exception e ) {
  e.printStackTrace();
  LOG.error(e, e);
 }
 I think it would be nice to have it like this:
 try {
 ...
 } catch( Exception e ) {
  LOG.error(e, e);
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-22 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538870#comment-13538870
 ] 

Juraj Lonc commented on PDFBOX-1473:


Yes. But you have to remember that any sequence of operators and operands which 
comes to CharStringConverter must be already inflated (subrs replaced). If 
there is any subr command in that sequence then exception should be thrown.
You are right, handling of subr commands in CharStringConverter is obsolete and 
not necessary.

If you remove handling of subr commands than CharStringConverter does not need 
fontGlobalSubrIndex/fontLocalSubrIndex.

Type1CharStringParser should be fixed the same way.

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
Assignee: Andreas Lehmkühler
 Attachments: CFFParser.patch, parsingfix_CFFFont.patch, 
 parsingfix_Type2CharStringParser.patch, PDType1CFont.patch, 
 redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-20 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1473:
---

Attachment: parsingfix_Type2CharStringParser.patch
parsingfix_CFFFont.patch

I made fix in parser.
callsubr and callgsubr commands are processed directly in parser so parser 
works with correct count of stems.

Parser works OK with this modification and CharString is being decoded 
correctly.

But I found another bug. OpenType fonts may have CMAP definition (like the one 
in attached PDF). And this CMAP is not handled/processed. You are expecting 
that Type1 font cannot have CMAP. This causes that no character is 
printed/drawed despite font is parsed correctly (now).

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
Assignee: Andreas Lehmkühler
 Attachments: CFFParser.patch, parsingfix_CFFFont.patch, 
 parsingfix_Type2CharStringParser.patch, PDType1CFont.patch, 
 redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-19 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1473:
---

Attachment: redig_test_textAdded.pdf

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-19 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1473:
---

Attachment: CFFParser.patch

I made fix in CFFParser so it can properly read CFF data from OpenType font.

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: CFFParser.patch, redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-19 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1473:
---

Attachment: PDType1CFont.patch

I made enhancement to PDType1CFont so the proper font is selected. OpenType 
font could contain multiple fonts.

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: CFFParser.patch, PDType1CFont.patch, 
 redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-19 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535850#comment-13535850
 ] 

Juraj Lonc commented on PDFBOX-1473:


Partly yes.

I detected 3 problems. I provided patch files for 2 of them.
The last one equals to mentioned PDFBOX-969

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: CFFParser.patch, PDType1CFont.patch, 
 redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts

2012-12-19 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535928#comment-13535928
 ] 

Juraj Lonc commented on PDFBOX-1473:


I think I found where is the problem.

It seems that command hintmask is incorrectly parsed. It results to incorrect 
count of hints and that causes that incorrect number of following bytes is read 
(as part of hintmask operator). And this naturally leads to incorrect parsing 
of data following the hintmask.

Commands should be parsed in inline way, so callsubr and callgsubr should 
be expanded to the stack before the following data is parsed.
I am not sure if I explained it understoodable ;)

 Incorrect handling of OpenType fonts
 

 Key: PDFBOX-1473
 URL: https://issues.apache.org/jira/browse/PDFBOX-1473
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 1.7.1
Reporter: Juraj Lonc
Assignee: Andreas Lehmkühler
 Attachments: CFFParser.patch, PDType1CFont.patch, 
 redig_test_textAdded.pdf


 There is embedded font in this PDF which pdfbox/fontbox does not handle 
 properly.
 This OpenType font contains CFF data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1468) Decrypting unencrypted strings

2012-12-13 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-1468:
--

 Summary: Decrypting unencrypted strings
 Key: PDFBOX-1468
 URL: https://issues.apache.org/jira/browse/PDFBOX-1468
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - 
Drabikova.pdf

I have received encrypted PDF which contains several string objects but not all 
of them are encrypted.
I am not sure whether it is or it is not compliant with pdf reference.

But I have created fix so pdfbox can handle this.
If string contains only chars between 32-127 then decryption is not necessary 
(I know, this is not true in 100% of cases but I think it is swallowable)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings

2012-12-13 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1468:
---

Attachment: Protokol o kontrole originality - Drabikova.pdf
PDFBOX-1468.diff

 Decrypting unencrypted strings
 --

 Key: PDFBOX-1468
 URL: https://issues.apache.org/jira/browse/PDFBOX-1468
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - 
 Drabikova.pdf


 I have received encrypted PDF which contains several string objects but not 
 all of them are encrypted.
 I am not sure whether it is or it is not compliant with pdf reference.
 But I have created fix so pdfbox can handle this.
 If string contains only chars between 32-127 then decryption is not necessary 
 (I know, this is not true in 100% of cases but I think it is swallowable)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PDFBOX-1468) Decrypting unencrypted strings

2012-12-13 Thread Juraj Lonc (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531041#comment-13531041
 ] 

Juraj Lonc commented on PDFBOX-1468:


that fix was done in
pdfbox-1.7.1\org\apache\pdfbox\pdmodel\encryption\SecurityHandler.java 

 Decrypting unencrypted strings
 --

 Key: PDFBOX-1468
 URL: https://issues.apache.org/jira/browse/PDFBOX-1468
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - 
 Drabikova.pdf


 I have received encrypted PDF which contains several string objects but not 
 all of them are encrypted.
 I am not sure whether it is or it is not compliant with pdf reference.
 But I have created fix so pdfbox can handle this.
 If string contains only chars between 32-127 then decryption is not necessary 
 (I know, this is not true in 100% of cases but I think it is swallowable)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings

2012-12-13 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1468:
---

Description: 
I have received encrypted PDF which contains several string objects but not all 
of them are encrypted.
I am not sure whether it is or it is not compliant with pdf reference.

But I have created fix so pdfbox can handle this.
If string contains only chars between 32-127 then decryption is not necessary 
(I know, this is not true in 100% of cases but I think it is swallowable)

Some string are encrypted:
/CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265)

Some are not:
/Registry(Adobe)
/Ordering(Identity)

  was:
I have received encrypted PDF which contains several string objects but not all 
of them are encrypted.
I am not sure whether it is or it is not compliant with pdf reference.

But I have created fix so pdfbox can handle this.
If string contains only chars between 32-127 then decryption is not necessary 
(I know, this is not true in 100% of cases but I think it is swallowable)


 Decrypting unencrypted strings
 --

 Key: PDFBOX-1468
 URL: https://issues.apache.org/jira/browse/PDFBOX-1468
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - 
 Drabikova.pdf


 I have received encrypted PDF which contains several string objects but not 
 all of them are encrypted.
 I am not sure whether it is or it is not compliant with pdf reference.
 But I have created fix so pdfbox can handle this.
 If string contains only chars between 32-127 then decryption is not necessary 
 (I know, this is not true in 100% of cases but I think it is swallowable)
 Some string are encrypted:
 /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
 /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
 /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265)
 Some are not:
 /Registry(Adobe)
 /Ordering(Identity)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings

2012-12-13 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1468:
---

Description: 
I have received encrypted PDF which contains several string objects but not all 
of them are encrypted.
I am not sure whether it is or it is not compliant with pdf reference.

But I have created fix so pdfbox can handle this.
If string contains only chars between 32-127 then decryption is not necessary 
(I know, this is not true in 100% of cases but I think it is swallowable)

Some strings are encrypted:
/CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265)

Some are not:
/Registry(Adobe)
/Ordering(Identity)

  was:
I have received encrypted PDF which contains several string objects but not all 
of them are encrypted.
I am not sure whether it is or it is not compliant with pdf reference.

But I have created fix so pdfbox can handle this.
If string contains only chars between 32-127 then decryption is not necessary 
(I know, this is not true in 100% of cases but I think it is swallowable)

Some string are encrypted:
/CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
/Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265)

Some are not:
/Registry(Adobe)
/Ordering(Identity)


 Decrypting unencrypted strings
 --

 Key: PDFBOX-1468
 URL: https://issues.apache.org/jira/browse/PDFBOX-1468
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 1.7.1
Reporter: Juraj Lonc
 Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - 
 Drabikova.pdf


 I have received encrypted PDF which contains several string objects but not 
 all of them are encrypted.
 I am not sure whether it is or it is not compliant with pdf reference.
 But I have created fix so pdfbox can handle this.
 If string contains only chars between 32-127 then decryption is not necessary 
 (I know, this is not true in 100% of cases but I think it is swallowable)
 Some strings are encrypted:
 /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
 /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357)
 /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265)
 Some are not:
 /Registry(Adobe)
 /Ordering(Identity)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1408) Width of space character is calculated wrong

2012-09-10 Thread Juraj Lonc (JIRA)
Juraj Lonc created PDFBOX-1408:
--

 Summary: Width of space character is calculated wrong
 Key: PDFBOX-1408
 URL: https://issues.apache.org/jira/browse/PDFBOX-1408
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc


PDFStreamEngine calculates width of space (line 357):

spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 
)*glyphSpaceToTextSpaceFactor);

In some cases it the result is 0.
Problem is that getFontWidth requires code number of  .

If there is ToUnicode mapping for that font that it is necessary to lookup 
CMap for code number and NOT to use 0x20 (space) as it is in souce code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1408) Width of space character is calculated wrong

2012-09-10 Thread Juraj Lonc (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juraj Lonc updated PDFBOX-1408:
---

Description: 
PDFStreamEngine calculates width of space (line 357):

spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 
)*glyphSpaceToTextSpaceFactor);

In some cases the result is 0.
Problem is that getFontWidth requires code number of  .

If there is ToUnicode mapping for that font that it is necessary to lookup 
CMap for code number and NOT to use 0x20 (space) as it is in souce code.

  was:
PDFStreamEngine calculates width of space (line 357):

spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 
)*glyphSpaceToTextSpaceFactor);

In some cases it the result is 0.
Problem is that getFontWidth requires code number of  .

If there is ToUnicode mapping for that font that it is necessary to lookup 
CMap for code number and NOT to use 0x20 (space) as it is in souce code.


 Width of space character is calculated wrong
 

 Key: PDFBOX-1408
 URL: https://issues.apache.org/jira/browse/PDFBOX-1408
 Project: PDFBox
  Issue Type: Bug
Reporter: Juraj Lonc

 PDFStreamEngine calculates width of space (line 357):
 spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 
 )*glyphSpaceToTextSpaceFactor);
 In some cases the result is 0.
 Problem is that getFontWidth requires code number of  .
 If there is ToUnicode mapping for that font that it is necessary to lookup 
 CMap for code number and NOT to use 0x20 (space) as it is in souce code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >