[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266689#comment-15266689 ] Juraj Lonc commented on PDFBOX-3334: Yes, I read data from TextPosition. I mentioned "extended PDFTextStripper". There I read each character and TextPosition and Glyph. But no object "TextPosition" remains in memory after GC. So this problem is not related to TextPosition. See attached screenshot. > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: .pdfbox.cache, screenshot-1.png, screenshot-2.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-3334: --- Attachment: screenshot-2.png > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: .pdfbox.cache, screenshot-1.png, screenshot-2.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-3334: --- Attachment: .pdfbox.cache > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: .pdfbox.cache, screenshot-1.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266216#comment-15266216 ] Juraj Lonc commented on PDFBOX-3334: I have double checked it. None of my classes (related to pdfbox) remains in memory, so my classes/objectes do not keep reference to those pdbox/ttf objects. I gave incomplete description, sorry fot that. I don't just open and render that file. I also use extended PDFTextStripper. My theory (according to data from VisualVM) is this: TrueTypeFont contains HashMap "tables". This HashMap contains TTFTable objects. These objects has reference back to TrueTypeFont. This is the loop that prevents GC to dispose those objects. They all reference to each other. > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: screenshot-1.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266204#comment-15266204 ] Juraj Lonc commented on PDFBOX-3334: Well, I am going to track it down again to check if the memory leak is somewhere in my code. > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: screenshot-1.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266188#comment-15266188 ] Juraj Lonc commented on PDFBOX-3334: Nope. When I open the document multiple times the count of objects remains the same. > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: screenshot-1.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-3334: --- Attachment: screenshot-1.png > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: screenshot-1.png, > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3334) TrueType fonts memory leak
[ https://issues.apache.org/jira/browse/PDFBOX-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-3334: --- Attachment: skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > TrueType fonts memory leak > -- > > Key: PDFBOX-3334 > URL: https://issues.apache.org/jira/browse/PDFBOX-3334 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.1 >Reporter: Juraj Lonc > Attachments: > skusenosti-z-implementacie-a-prevadzky-systemu_roman-pavco.pdf > > > I open this PDF document, read all pages and render to images, close document. > After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3334) TrueType fonts memory leak
Juraj Lonc created PDFBOX-3334: -- Summary: TrueType fonts memory leak Key: PDFBOX-3334 URL: https://issues.apache.org/jira/browse/PDFBOX-3334 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.1 Reporter: Juraj Lonc I open this PDF document, read all pages and render to images, close document. After running GC there are still TrueTypeFont objects in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-2721) Invalid ToUnicode CMap in font
Juraj Lonc created PDFBOX-2721: -- Summary: Invalid ToUnicode CMap in font Key: PDFBOX-2721 URL: https://issues.apache.org/jira/browse/PDFBOX-2721 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attached PDF file works fine in Adobe Reader, but PDFBox logs warnings: 2015-03-20 15:48:57,573 WARN [org.apache.pdfbox.pdmodel.font.PDFont] (http-0.0.0.0-8080-7) Invalid ToUnicode CMap in font HPDFAA+Thoth-Unicode It seems that you require beginbfchar or beginbfrange in CMap. But should it be required? CMap definition contains beginnotdefrange and this is ignored in PDFBox. PDF Reference says: beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange define notdef mappings from character codes to CIDs. As described in the section “Handling Undefined Characters” on page 355, a notdef mapping is used if the normal mapping produces a CID for which no glyph is present in the associated CIDFont. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2721) Invalid ToUnicode CMap in font
[ https://issues.apache.org/jira/browse/PDFBOX-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2721: --- Attachment: cmap_beginnotdefrange.pdf Invalid ToUnicode CMap in font --- Key: PDFBOX-2721 URL: https://issues.apache.org/jira/browse/PDFBOX-2721 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: cmap_beginnotdefrange.pdf Attached PDF file works fine in Adobe Reader, but PDFBox logs warnings: 2015-03-20 15:48:57,573 WARN [org.apache.pdfbox.pdmodel.font.PDFont] (http-0.0.0.0-8080-7) Invalid ToUnicode CMap in font HPDFAA+Thoth-Unicode It seems that you require beginbfchar or beginbfrange in CMap. But should it be required? CMap definition contains beginnotdefrange and this is ignored in PDFBox. PDF Reference says: beginnotdefchar, endnotdefchar, beginnotdefrange, and endnotdefrange define notdef mappings from character codes to CIDs. As described in the section “Handling Undefined Characters” on page 355, a notdef mapping is used if the normal mapping produces a CID for which no glyph is present in the associated CIDFont. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-2110) Font not found: CourierNew
Juraj Lonc created PDFBOX-2110: -- Summary: Font not found: CourierNew Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in fotn name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2110: --- Description: PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New was: PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in fotn name CourierNew vs. Courier New Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2110: --- Attachment: testpdf_monospace_DPH_032014.pdf Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: testpdf_monospace_DPH_032014.pdf PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016890#comment-14016890 ] Juraj Lonc commented on PDFBOX-2110: I tested CentOS and Ubuntu Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: testpdf_monospace_DPH_032014.pdf PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016904#comment-14016904 ] Juraj Lonc commented on PDFBOX-2110: I am using PDFBOX with my own modifications, that PDF is rendered correctly on Windows. Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: testpdf_monospace_DPH_032014.pdf PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016912#comment-14016912 ] Juraj Lonc edited comment on PDFBOX-2110 at 6/3/14 5:47 PM: I don't think your hardcoded mapping solves problem. I am not using Liberation fonts (they are not installed either). Courier New is installed on that system, so it should be used, right? In additiont to that, Liberation Mono is visually very different from Courier New was (Author: chupacabras): I don't think your hardcoded mapping solves problem. I am not using Liberation fonts (they are not installed either). Courier New is installed on that system, so it should be used, right? Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: testpdf_monospace_DPH_032014.pdf PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2110) Font not found: CourierNew
[ https://issues.apache.org/jira/browse/PDFBOX-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2110: --- Attachment: PDFBOX-2110_FontManager.diff Look at this fix. Font not found: CourierNew -- Key: PDFBOX-2110 URL: https://issues.apache.org/jira/browse/PDFBOX-2110 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2110_FontManager.diff, testpdf_monospace_DPH_032014.pdf PDF uses non-embedded font CourierNew. OS contains font: {code}/usr/share/fonts/truetype/msttcorefonts/Courier_New.ttf: Courier New:style=Regular,Normal,obyèejné,Standard,,Normaali,Normál,Normale,Standaard,Normal{code} FontManager is not able to find it and warns: {code}WARN [org.apache.fontbox.util.FontManager] (http-0.0.0.0-80-6) Font not found: CourierNew{code} It seems that the problem is in that space in font name CourierNew vs. Courier New -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1713) [PATCH] Bullet character not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008683#comment-14008683 ] Juraj Lonc commented on PDFBOX-1713: Is this fix considered to be permanent or temporary? I consider it ugly :( That bullet character is properly defined in /ToUnicode mapping, and this mapping is ignored by pdfbox, IMHO. I tried to explain proper way of handling this situation in PDFBOX-2093 Replacing all unknown characters to bullet is not a good idea, as there could be any unicode character in that /ToUnicode mapping [PATCH] Bullet character not rendered - Key: PDFBOX-1713 URL: https://issues.apache.org/jira/browse/PDFBOX-1713 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 2.0.0 Reporter: Vincent Hennebert Assignee: Andreas Lehmkühler Fix For: 1.8.6, 2.0.0 Attachments: bullet.patch, bullet.pdf See attached file. In WinAnsiEncoding, any unused code greater than 040 maps to the bullet character. The attached patch takes that into account to render characters that don't use the standard encoding for bullet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1713) [PATCH] Bullet character not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008814#comment-14008814 ] Juraj Lonc commented on PDFBOX-1713: I had to verify that ;) You are absolutely right. I modified values in /ToUnicode. Any change had no impact on displaying those chars. Changes affected only copied text from Adobe Reader to some text editor (Word). [PATCH] Bullet character not rendered - Key: PDFBOX-1713 URL: https://issues.apache.org/jira/browse/PDFBOX-1713 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 2.0.0 Reporter: Vincent Hennebert Assignee: Andreas Lehmkühler Fix For: 1.8.6, 2.0.0 Attachments: bullet.patch, bullet.pdf See attached file. In WinAnsiEncoding, any unused code greater than 040 maps to the bullet character. The attached patch takes that into account to render characters that don't use the standard encoding for bullet. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2093) bullet character is not rendered
Juraj Lonc created PDFBOX-2093: -- Summary: bullet character is not rendered Key: PDFBOX-2093 URL: https://issues.apache.org/jira/browse/PDFBOX-2093 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: missing_bullet.pdf, output_missing_bullet.png In this PDF is a bullet character which is not rendered. There is some problem with translating code to glyph. That character has code 127 (0x7F), but mapping for it is not found {code} 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping for 127 not found {code} embedded font contains definition for bullet character. But bullet character has code 183 in mapping table (from StandardEncoding, I suppose). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2093) bullet character is not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2093: --- Attachment: output_missing_bullet.png missing_bullet.pdf bullet character is not rendered -- Key: PDFBOX-2093 URL: https://issues.apache.org/jira/browse/PDFBOX-2093 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: missing_bullet.pdf, output_missing_bullet.png In this PDF is a bullet character which is not rendered. There is some problem with translating code to glyph. That character has code 127 (0x7F), but mapping for it is not found {code} 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping for 127 not found {code} embedded font contains definition for bullet character. But bullet character has code 183 in mapping table (from StandardEncoding, I suppose). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2093) bullet character is not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007111#comment-14007111 ] Juraj Lonc commented on PDFBOX-2093: Sample PDF contains only one line of text. Original PDF contains more lines with that bullet (regular list). I was playing around with it but was not able to figure out how to properly translate that character :( bullet character is not rendered -- Key: PDFBOX-2093 URL: https://issues.apache.org/jira/browse/PDFBOX-2093 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: missing_bullet.pdf, output_missing_bullet.png In this PDF is a bullet character which is not rendered. There is some problem with translating code to glyph. That character has code 127 (0x7F), but mapping for it is not found {code} 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping for 127 not found {code} embedded font contains definition for bullet character. But bullet character has code 183 in mapping table (from StandardEncoding, I suppose). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2093) bullet character is not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007131#comment-14007131 ] Juraj Lonc commented on PDFBOX-2093: There is unicode mapping for that character: 7F 2022 http://www.charbase.com/2022-unicode-bullet So, shouldn't it work like this? 1. translate code to unicode value (by /ToUnicode mapping) 2. translate unicode value to character name 3. find glyph with that name I think /ToUnicode is ignored in this case at this moment. bullet character is not rendered -- Key: PDFBOX-2093 URL: https://issues.apache.org/jira/browse/PDFBOX-2093 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: missing_bullet.pdf, output_missing_bullet.png In this PDF is a bullet character which is not rendered. There is some problem with translating code to glyph. That character has code 127 (0x7F), but mapping for it is not found {code} 14:33:17,966 DEBUG Type1Glyph2D:127 - FKOYIT+MyriadPro-Cond: glyph mapping for 127 not found {code} embedded font contains definition for bullet character. But bullet character has code 183 in mapping table (from StandardEncoding, I suppose). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2089) Negative width of character
Juraj Lonc created PDFBOX-2089: -- Summary: Negative width of character Key: PDFBOX-2089 URL: https://issues.apache.org/jira/browse/PDFBOX-2089 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: negative_width.pdf This PDF contains text matrix: -10.5679 0 0 -11.4 459.0349 19.4155 Tm that causes IMHO wrong calculation of character width (and height). Width and height calculated in PDFStreamEngine are negative numbers, because textMatrix.getXScale() gives negative value. I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). Returning value should be fixed by Math.abs() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2089) Negative width of character
[ https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2089: --- Attachment: negative_width.pdf Negative width of character --- Key: PDFBOX-2089 URL: https://issues.apache.org/jira/browse/PDFBOX-2089 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: negative_width.pdf This PDF contains text matrix: -10.5679 0 0 -11.4 459.0349 19.4155 Tm that causes IMHO wrong calculation of character width (and height). Width and height calculated in PDFStreamEngine are negative numbers, because textMatrix.getXScale() gives negative value. I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). Returning value should be fixed by Math.abs() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2089) Negative width of character
[ https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2089: --- Attachment: PDFBOX-2089_Matrix.diff This is what I mean. It looks quite logically, but I am not 100% sure whether it is PDF compliant or not. Negative width of character --- Key: PDFBOX-2089 URL: https://issues.apache.org/jira/browse/PDFBOX-2089 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf This PDF contains text matrix: -10.5679 0 0 -11.4 459.0349 19.4155 Tm that causes IMHO wrong calculation of character width (and height). Width and height calculated in PDFStreamEngine are negative numbers, because textMatrix.getXScale() gives negative value. I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). Returning value should be fixed by Math.abs() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2089) Negative width of character
[ https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005693#comment-14005693 ] Juraj Lonc commented on PDFBOX-2089: OK, the plan B is to modify PDFStreamEngine and replace matrix.getXScale() with Math.abs(matrix.getXScale()). Width and height of character should be always positive number, right? Negative width of character --- Key: PDFBOX-2089 URL: https://issues.apache.org/jira/browse/PDFBOX-2089 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf This PDF contains text matrix: -10.5679 0 0 -11.4 459.0349 19.4155 Tm that causes IMHO wrong calculation of character width (and height). Width and height calculated in PDFStreamEngine are negative numbers, because textMatrix.getXScale() gives negative value. I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). Returning value should be fixed by Math.abs() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2089) Negative width of character
[ https://issues.apache.org/jira/browse/PDFBOX-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005912#comment-14005912 ] Juraj Lonc commented on PDFBOX-2089: I changed my application so everywhere I read character width I added line width=Math.abs(width). So my code is immune to this issue. Anyways, I am curious whether this is bug or not ;) Negative width of character --- Key: PDFBOX-2089 URL: https://issues.apache.org/jira/browse/PDFBOX-2089 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2089_Matrix.diff, negative_width.pdf This PDF contains text matrix: -10.5679 0 0 -11.4 459.0349 19.4155 Tm that causes IMHO wrong calculation of character width (and height). Width and height calculated in PDFStreamEngine are negative numbers, because textMatrix.getXScale() gives negative value. I think it should be fixed in Matrix.getXScale() and Matrix.getYScale(). Returning value should be fixed by Math.abs() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2090) Glyph not found:3
[ https://issues.apache.org/jira/browse/PDFBOX-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005922#comment-14005922 ] Juraj Lonc commented on PDFBOX-2090: see http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter08#ba57949e Glyph not found:3 - Key: PDFBOX-2090 URL: https://issues.apache.org/jira/browse/PDFBOX-2090 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Priority: Minor There are some debug messages: {code} 15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not found:3 {code} but glyph id #3 is reserved (according to TTF spec) so it is OK that this glyph was not found in TTF font. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2090) Glyph not found:3
Juraj Lonc created PDFBOX-2090: -- Summary: Glyph not found:3 Key: PDFBOX-2090 URL: https://issues.apache.org/jira/browse/PDFBOX-2090 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Priority: Minor There are some debug messages: {code} 15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not found:3 {code} but glyph id #3 is reserved (according to TTF spec) so it is OK that this glyph was not found in TTF font. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2090) Glyph not found:3
[ https://issues.apache.org/jira/browse/PDFBOX-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005939#comment-14005939 ] Juraj Lonc commented on PDFBOX-2090: Instead of {code} if (glyphId=3) return null; {code} could be better {code} if (glyphId==3) return null; {code} in case you would like to use glyph id #0 for drawing missing chars Glyph not found:3 - Key: PDFBOX-2090 URL: https://issues.apache.org/jira/browse/PDFBOX-2090 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Priority: Minor Attachments: PDFBOX-2090_TTFGlyph2D.diff, glyph_id3.pdf There are some debug messages: {code} 15:30:46,574 DEBUG TTFGlyph2D:227 - GYQPBH+TimesNewRomanPSMT: Glyph not found:3 {code} but glyph id #3 is reserved (according to TTF spec) so it is OK that this glyph was not found in TTF font. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2091) Some characters are not rendered
Juraj Lonc created PDFBOX-2091: -- Summary: Some characters are not rendered Key: PDFBOX-2091 URL: https://issues.apache.org/jira/browse/PDFBOX-2091 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: missing_yaccute.pdf, output.png Some characters are not rendered (see attached PDF). In this case it is yaccute. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2091) Some characters are not rendered (font with symbol encoding)
[ https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2091: --- Summary: Some characters are not rendered (font with symbol encoding) (was: Some characters are not rendered) Some characters are not rendered (font with symbol encoding) Key: PDFBOX-2091 URL: https://issues.apache.org/jira/browse/PDFBOX-2091 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, output.png Some characters are not rendered (see attached PDF). In this case it is yaccute. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2091) Some characters are not rendered
[ https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2091: --- Attachment: PDFBOX-2091_TTFGlyph2D.diff Font uses symbol encoding. In TTFGlyph2D is CMAP parsed correctly, but then it is not used in getGlyphcode(int code). I made fix for this. Some characters are not rendered Key: PDFBOX-2091 URL: https://issues.apache.org/jira/browse/PDFBOX-2091 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, output.png Some characters are not rendered (see attached PDF). In this case it is yaccute. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2091) Some characters are not rendered (font with symbol encoding)
[ https://issues.apache.org/jira/browse/PDFBOX-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006839#comment-14006839 ] Juraj Lonc commented on PDFBOX-2091: In addition to that, I think TTFGlyph2D should take encoding value that is set in PDFont/COSObject and then use particular CMAP according to this encoding. TTFGlyph2D currently doesn't care about encoding that is set in PDF. Some characters are not rendered (font with symbol encoding) Key: PDFBOX-2091 URL: https://issues.apache.org/jira/browse/PDFBOX-2091 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2091_TTFGlyph2D.diff, missing_yaccute.pdf, output.png Some characters are not rendered (see attached PDF). In this case it is yaccute. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered.png Obyčajné zásielky.pdf Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered.png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception
[ https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999694#comment-13999694 ] Juraj Lonc commented on PDFBOX-2074: I am curious whether Adobe Reader ignores such entries (entries are invalid) or processes them (entries are valid). 4-bytes CMap entry causes exception --- Key: PDFBOX-2074 URL: https://issues.apache.org/jira/browse/PDFBOX-2074 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with that size, other entries have 2-bytes. Adobe reader has no problems with that, PDFBox throws Exception. I think this Exception should not be thrown. It should be skipped or truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000280#comment-14000280 ] Juraj Lonc commented on PDFBOX-2081: I know that line completely disables clipping and I know it is not a solution ;) I have used it just for description of the problem. Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered.png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: (was: rendered.png) Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered_(with_null_clipping).png Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, rendered_(with_null_clipping).png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2081: --- Attachment: rendered_(missing_lines).png Previously uploaded file was not the one I wanted to upload. Now I have attached image that was actually rendered Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
Juraj Lonc created PDFBOX-2081: -- Summary: Lines that exceeds clipping area are not drawn Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered.png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2081) Lines that exceeds clipping area are not drawn
[ https://issues.apache.org/jira/browse/PDFBOX-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000320#comment-14000320 ] Juraj Lonc commented on PDFBOX-2081: I have also tried to replace {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} by {code} Rectangle2D rc0=getGraphicsState().getCurrentClippingPath().getBounds2D(); Rectangle2D rc1=new Rectangle2D.Double(rc0.getMinX(), rc0.getMinY(), rc0.getWidth()+1000, rc0.getHeight()); graphics.setClip(rc1); {code} so I made clipping area wider. This helped too - lines were rendered. Lines that exceeds clipping area are not drawn -- Key: PDFBOX-2081 URL: https://issues.apache.org/jira/browse/PDFBOX-2081 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: Obyčajné zásielky.pdf, rendered_(missing_lines).png, rendered_(with_null_clipping).png PDF contains shapes that are partly on the paper and partly outside (shape overflows paper borders). Those shapes are not rendered to image. It is caused by clipping area. When I replace line in PDFDrawer.strokePath() {noformat} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {noformat} to {noformat} graphics.setClip(null); {noformat} then everything is rendered correctly. Possibly bug in Java? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
[ https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997810#comment-13997810 ] Juraj Lonc commented on PDFBOX-2070: I did not mean it as a replacement for this fix. I meant it as an addition. For case that someone loads PDF that already has such wrong elements and saving would heal it. But I understand that it is a complete different story for another issue. That was just an idea. Filter.decode() modifies PDF if there is a filter array --- Key: PDFBOX-2070 URL: https://issues.apache.org/jira/browse/PDFBOX-2070 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Fix For: 2.0.0 Attachments: after.pdf, before.pdf If there are several filters (filter array) in an image, PDFBox is inserting an empty DecodeParms object here {code} params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); {code} instead of either inserting an empty COSArray, or (better) do nothing. Saving such a PDF results in it not being displayable in the Acrobat Reader. Test code: {code} PDDocument d = PDDocument.load(before.pdf); new PDFRenderer(d).renderImage(0); d.save(after.pdf); {code} The rendering is important because without it, the filtered objects aren't decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2070) Filter.decode() modifies PDF if there is a filter array
[ https://issues.apache.org/jira/browse/PDFBOX-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997373#comment-13997373 ] Juraj Lonc commented on PDFBOX-2070: Thanks for fix. Problem disappeared in my case, so this fix works for me. In addition to that I made a workaround: Before I save document I remove all empty DecodeParms from images. I don't know whether it is good idea to implement something similar into PDDocument.save(), so these evidently wrong elements would be skipped and not written to pdf file. Filter.decode() modifies PDF if there is a filter array --- Key: PDFBOX-2070 URL: https://issues.apache.org/jira/browse/PDFBOX-2070 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Fix For: 2.0.0 Attachments: after.pdf, before.pdf If there are several filters (filter array) in an image, PDFBox is inserting an empty DecodeParms object here {code} params.setItem(COSName.DECODE_PARMS, getDecodeParams(params, index)); {code} instead of either inserting an empty COSArray, or (better) do nothing. Saving such a PDF results in it not being displayable in the Acrobat Reader. Test code: {code} PDDocument d = PDDocument.load(before.pdf); new PDFRenderer(d).renderImage(0); d.save(after.pdf); {code} The rendering is important because without it, the filtered objects aren't decoded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5
[ https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992592#comment-13992592 ] Juraj Lonc commented on PDFBOX-2057: I see the fix in ImageFactory.getAlphaImage(BufferedImage image). But isn't it much easier to do it like this? WritableRaster alphaRaster = image.getAlphaRaster(); BufferedImage bi=new BufferedImage(alphaRaster.getWidth(), alphaRaster.getHeight(), BufferedImage.TYPE_BYTE_GRAY); bi.setData(alphaRaster); Importing BufferedImage into PDPixelMap is broken in 1.8.5 -- Key: PDFBOX-2057 URL: https://issues.apache.org/jira/browse/PDFBOX-2057 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.5, 1.8.6 Environment: windows vista / jdk 1.7.0_45 Reporter: Michaël Michaud Assignee: Tilman Hausherr Labels: regression Fix For: 1.8.6, 2.0.0 Attachments: CS-Convocation entretien signed.pdf, renderTransparentImage.zip Try to import a BufferedImage in a PDDocument with PDPixelMap BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the pdf file contains instruction /ColorSpace /DeviceGray) BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB). Code used to demonstrate the problem is as follows (image has also been colored with some Graphics instructions to demonstrate that 1.8.4 is working) : {code} try { PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); BufferedImage awtImage = new BufferedImage(100,100, BufferedImage.TYPE_4BYTE_ABGR); PDPixelMap ximage = new PDPixelMap(doc, awtImage); PDPageContentStream contentStream = new PDPageContentStream(doc, page); contentStream.drawXObject(ximage, 200, 200, 100, 100); contentStream.close(); doc.save(C:\\Temp\\PDF\\test185_4babgr.pdf); } catch(COSVisitorException|IOException e) { e.printStackTrace(); } {code} I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an exception with PDFBox 1.8.4 and 1.8.5 : {code} Exception in thread main java.lang.IllegalArgumentException: Raster IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 transparency = 1 has alpha = false isAlphaPre = false at java.awt.image.BufferedImage.init(BufferedImage.java:630) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107) {code} My main purpose was to use a BufferedImage with a CMYK ColorSpace, but PDPixelMap seems to accept 1 component and 3 component ColorSpace only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2077) Empty (invalid) DecodeParms is added to image
Juraj Lonc created PDFBOX-2077: -- Summary: Empty (invalid) DecodeParms is added to image Key: PDFBOX-2077 URL: https://issues.apache.org/jira/browse/PDFBOX-2077 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc PDF contains image (xobject), it as no /DecodeParms. PDFBox adds empty /DecodeParms to this image which results to invalid PDF and Adobe reader complains about it. Problem is caused by calling PDResources.getXObjects(). It is very similar to PDFBOX-2042 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2077) Empty (invalid) DecodeParms is added to image
[ https://issues.apache.org/jira/browse/PDFBOX-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996241#comment-13996241 ] Juraj Lonc edited comment on PDFBOX-2077 at 5/13/14 10:14 AM: -- {noformat} PDDocument pdDoc=PDDocument.load(f); PDPage pdPage=(PDPage)pdDoc.getDocumentCatalog().getAllPages().get(0); PDResources res=pdPage.findResources(); // this is the guilty line res.getXObjects(); File fout=new File(resaved.pdf); pdDoc.save(fout); {noformat} was (Author: chupacabras): DDocument pdDoc=PDDocument.load(f); PDPage pdPage=(PDPage)pdDoc.getDocumentCatalog().getAllPages().get(0); PDResources res=pdPage.findResources(); // this is the guilty line res.getXObjects(); File fout=new File(resaved.pdf); pdDoc.save(fout); Empty (invalid) DecodeParms is added to image - Key: PDFBOX-2077 URL: https://issues.apache.org/jira/browse/PDFBOX-2077 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: input_image.pdf, resaved.pdf PDF contains image (xobject), it as no /DecodeParms. PDFBox adds empty /DecodeParms to this image which results to invalid PDF and Adobe reader complains about it. Problem is caused by calling PDResources.getXObjects(). It is very similar to PDFBOX-2042 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2072) Wrong calculation of space char width in PDFStreamEngine
Juraj Lonc created PDFBOX-2072: -- Summary: Wrong calculation of space char width in PDFStreamEngine Key: PDFBOX-2072 URL: https://issues.apache.org/jira/browse/PDFBOX-2072 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc PDFStreamEngine calculates width of space character wrongly. Page's content stream contains this operation: 0 12 -12 0 562.3199 372.7105 Tm and that causes PDFStreamEngine calculate width of to value 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2072) Wrong calculation of space char width in PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2072: --- Attachment: PDFBOX-2072_PDFStreamEngine.diff I made fix for this. Wrong calculation of space char width in PDFStreamEngine Key: PDFBOX-2072 URL: https://issues.apache.org/jira/browse/PDFBOX-2072 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2072_PDFStreamEngine.diff PDFStreamEngine calculates width of space character wrongly. Page's content stream contains this operation: 0 12 -12 0 562.3199 372.7105 Tm and that causes PDFStreamEngine calculate width of to value 0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2074) 4-bytes CMap entry causes exception
[ https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2074: --- Attachment: PDFBOX-2074_CMap.diff pdf_with_4B_cmap_entry.pdf 4-bytes CMap entry causes exception --- Key: PDFBOX-2074 URL: https://issues.apache.org/jira/browse/PDFBOX-2074 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with that size, other entries have 2-bytes. Adobe reader has no problems with that, PDFBox throws Exception. I think this Exception should not be thrown. It should be skipped or truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2074) 4-bytes CMap entry causes exception
[ https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995559#comment-13995559 ] Juraj Lonc commented on PDFBOX-2074: I have no idea how to properly handle entries that are longer than 2 bytes. But I think it is better skip them and not throw Exception there. Just logging warning or error should be fine. If somebody tries to render (to image) such PDF now it will fail. I suggest to remove that Exception so PDF will be rendered. Rendered image will be most likely ok. Maybe some char will not be drawn. 4-bytes CMap entry causes exception --- Key: PDFBOX-2074 URL: https://issues.apache.org/jira/browse/PDFBOX-2074 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with that size, other entries have 2-bytes. Adobe reader has no problems with that, PDFBox throws Exception. I think this Exception should not be thrown. It should be skipped or truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2074) 4-bytes CMap entry causes exception
Juraj Lonc created PDFBOX-2074: -- Summary: 4-bytes CMap entry causes exception Key: PDFBOX-2074 URL: https://issues.apache.org/jira/browse/PDFBOX-2074 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with that size, other entries have 2-bytes. Adobe reader has no problems with that, PDFBox throws Exception. I think this Exception should not be thrown. It should be skipped or truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2075) Texts are not properly positioned/sized
[ https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2075: --- Attachment: output.png ozn_dmv_1_2008.pdf Texts are not properly positioned/sized --- Key: PDFBOX-2075 URL: https://issues.apache.org/jira/browse/PDFBOX-2075 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: output.png, ozn_dmv_1_2008.pdf Texts in this PDF are displayed somehow strange. It seems that first half of texts are little bit wider so that causes texts to overlap on several places. I was not able to figure out what caused it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2075) Texts are not properly positioned/sized
Juraj Lonc created PDFBOX-2075: -- Summary: Texts are not properly positioned/sized Key: PDFBOX-2075 URL: https://issues.apache.org/jira/browse/PDFBOX-2075 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: output.png, ozn_dmv_1_2008.pdf Texts in this PDF are displayed somehow strange. It seems that first half of texts are little bit wider so that causes texts to overlap on several places. I was not able to figure out what caused it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2075) Texts are not properly positioned/sized
[ https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995696#comment-13995696 ] Juraj Lonc commented on PDFBOX-2075: Or it is something wrong with positioning, so the right parts of those lines are not moved enough to the right side and thus overlapping with the left parts Texts are not properly positioned/sized --- Key: PDFBOX-2075 URL: https://issues.apache.org/jira/browse/PDFBOX-2075 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: output.png, ozn_dmv_1_2008.pdf Texts in this PDF are displayed somehow strange. It seems that first half of texts are little bit wider so that causes texts to overlap on several places. I was not able to figure out what caused it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-2075) Texts are not properly positioned/sized
[ https://issues.apache.org/jira/browse/PDFBOX-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc closed PDFBOX-2075. -- Resolution: Not a Problem I had source code inconsistency. My bad. Texts are not properly positioned/sized --- Key: PDFBOX-2075 URL: https://issues.apache.org/jira/browse/PDFBOX-2075 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: output.png, ozn_dmv_1_2008.pdf Texts in this PDF are displayed somehow strange. It seems that first half of texts are little bit wider so that causes texts to overlap on several places. I was not able to figure out what caused it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2074) 4-bytes CMap entry causes exception
[ https://issues.apache.org/jira/browse/PDFBOX-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995559#comment-13995559 ] Juraj Lonc edited comment on PDFBOX-2074 at 5/12/14 7:57 PM: - I have no idea how to properly handle entries that are longer than 2 bytes. But I think it is better to skip them and not throw Exception there. Just logging of warning or error should be fine. If somebody tries to render (to image) such PDF now it will fail. I suggest to remove that Exception so PDF will be rendered. Rendered image will be most likely ok. Maybe some char will not be drawn. was (Author: chupacabras): I have no idea how to properly handle entries that are longer than 2 bytes. But I think it is better skip them and not throw Exception there. Just logging warning or error should be fine. If somebody tries to render (to image) such PDF now it will fail. I suggest to remove that Exception so PDF will be rendered. Rendered image will be most likely ok. Maybe some char will not be drawn. 4-bytes CMap entry causes exception --- Key: PDFBOX-2074 URL: https://issues.apache.org/jira/browse/PDFBOX-2074 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: PDFBOX-2074_CMap.diff, pdf_with_4B_cmap_entry.pdf I have PDF that has CMap entry consisting of 4 bytes. It is just one entry with that size, other entries have 2-bytes. Adobe reader has no problems with that, PDFBox throws Exception. I think this Exception should not be thrown. It should be skipped or truncated tu 2 bytes and write warning to log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2067) Error creating JPEG image with SMask
[ https://issues.apache.org/jira/browse/PDFBOX-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992584#comment-13992584 ] Juraj Lonc commented on PDFBOX-2067: My problem was, that I was adding some images and then reading xobjects from page's resources. This gave me an exception: Exception in thread main java.lang.ClassCastException: java.awt.image.DataBufferInt cannot be cast to java.awt.image.DataBufferByte at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:124) at org.apache.pdfbox.filter.Filter.decode(Filter.java:58) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:337) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278) at org.apache.pdfbox.cos.COSStream.getDecodeResult(COSStream.java:235) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.init(PDImageXObject.java:94) at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:65) at org.apache.pdfbox.pdmodel.PDResources.getXObjects(PDResources.java:247) It took me a half day to track the cause down to that single line ;) But I like brain teasers ;) Error creating JPEG image with SMask Key: PDFBOX-2067 URL: https://issues.apache.org/jira/browse/PDFBOX-2067 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Assignee: Tilman Hausherr Attachments: PDFBOX-2067_JPEGFactory.diff JPEGFactory.createFromImage() has problems with images with transparency (alpha data). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2057) Importing BufferedImage into PDPixelMap is broken in 1.8.5
[ https://issues.apache.org/jira/browse/PDFBOX-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992976#comment-13992976 ] Juraj Lonc commented on PDFBOX-2057: That was just a suggestion ;) I did not make wide testing. Importing BufferedImage into PDPixelMap is broken in 1.8.5 -- Key: PDFBOX-2057 URL: https://issues.apache.org/jira/browse/PDFBOX-2057 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.5, 1.8.6 Environment: windows vista / jdk 1.7.0_45 Reporter: Michaël Michaud Assignee: Tilman Hausherr Labels: regression Fix For: 1.8.6, 2.0.0 Attachments: CS-Convocation entretien signed.pdf, CS-Convocation entretien-IText.pdf, CS-Convocation entretien-PDFBox-with-workarround.pdf, CS-Convocation entretien-PDFBox.pdf, ImageFilterOp.java, differentBufferedImages.pdf, renderTransparentImage.zip Try to import a BufferedImage in a PDDocument with PDPixelMap BufferedImage with TYPE_4BYTE_ABGR works fine with PDFBox 1.8.4 (though, the pdf file contains instruction /ColorSpace /DeviceGray) BufferedImage with TYPE_4BYTE_ABGR produces an unreadable PDF with PDFBox 1.8.5 (though, the pdf file contains instruction /ColorSpace /DeviceRGB). Code used to demonstrate the problem is as follows (image has also been colored with some Graphics instructions to demonstrate that 1.8.4 is working) : {code} try { PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); BufferedImage awtImage = new BufferedImage(100,100, BufferedImage.TYPE_4BYTE_ABGR); PDPixelMap ximage = new PDPixelMap(doc, awtImage); PDPageContentStream contentStream = new PDPageContentStream(doc, page); contentStream.drawXObject(ximage, 200, 200, 100, 100); contentStream.close(); doc.save(C:\\Temp\\PDF\\test185_4babgr.pdf); } catch(COSVisitorException|IOException e) { e.printStackTrace(); } {code} I also tried with a BufferedImage with TYPE_INT_ARGB but it throws an exception with PDFBox 1.8.4 and 1.8.5 : {code} Exception in thread main java.lang.IllegalArgumentException: Raster IntegerInterleavedRaster: width = 100 height = 100 #Bands = 1 xOff = 0 yOff = 0 dataOffset[0] 0 is incompatible with ColorModel ColorModel: #pixelBits = 8 numComponents = 1 color space = java.awt.color.ICC_ColorSpace@1dc80063 transparency = 1 has alpha = false isAlphaPre = false at java.awt.image.BufferedImage.init(BufferedImage.java:630) at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.createImageStream(PDPixelMap.java:107) {code} My main purpose was to use a BufferedImage with a CMYK ColorSpace, but PDPixelMap seems to accept 1 component and 3 component ColorSpace only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)
[ https://issues.apache.org/jira/browse/PDFBOX-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990395#comment-13990395 ] Juraj Lonc commented on PDFBOX-2059: I made 2 fixes for this. Both are for PDTrueTypeFont. I don't know whether it is useful or you already have some other plans for this bug. 1. added getFontWidth() where I calculate widths from TTF font. Right now you are relying only on widths defined within PDF 2. modified getExternalFontFile2() so I am looking for system fonts too. Right now you are using only fonts defined in PDFBox_External_Fonts.properties Characters are not positioned properly (due to wrong widthheight of chars) --- Key: PDFBOX-2059 URL: https://issues.apache.org/jira/browse/PDFBOX-2059 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Assignee: Andreas Lehmkühler Attachments: DPH 032014.pdf Characters in this PDF are not positioned properly. All characters are rendered at position x=0.0 Problem is in PDFont.getFontWidth(). it returns 0.0 for every char. The same applies for PDFont.getFontHeight() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-62) Incorrect (zero) character widths returned in some docs
[ https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-62: - Attachment: PDFBOX-2059_PDTrueTypeFont.diff Just let me know if it is useful somehow ;) Incorrect (zero) character widths returned in some docs --- Key: PDFBOX-62 URL: https://issues.apache.org/jira/browse/PDFBOX-62 Project: PDFBox Issue Type: Bug Components: Rendering, Text extraction Assignee: Andreas Lehmkühler Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, pdfbox-62-zerowidth.pdf-1.png [imported from SourceForge] http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674 Originally submitted by tamirhassan on 2005-06-07 13:42. For certain PDF documents (such as the one attached) the character/string widths (as obtained e.g. by the PDFont.getStringWidth method) are not returned correctly, i.e. they appear to be correct for punctuation characters but are zero for alphanumeric characters. It seems as if these alphanumeric characters are NOT within PDFont.firstChar and PDFont.lastChar in the Type 1 font. The method therefore attempts to obtain the font widths from the AFM (font metric) file, but fails (silently) with a 'resource is null' logline message. (Note that this problem doesn't seem to occur with Type 1 fonts in other documents.) A more detailed discussion regarding this issue can be found in this link: http://sourceforge.net/forum/forum.php? thread_id=1260349forum_id=267205 Thanks in advance for any help that can be obtained, Tam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-62) Incorrect (zero) character widths returned in some docs
[ https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990540#comment-13990540 ] Juraj Lonc commented on PDFBOX-62: -- I made 2 quick fixes for this (actually it is for PDFBOX-2059). Both are for PDTrueTypeFont. I don't know whether it is useful or you already have some other plans for this bug. 1. added getFontWidth() where I calculate widths from TTF font. Right now you are relying only on widths defined within PDF 2. modified getExternalFontFile2() so I am looking for system fonts too. Right now you are using only fonts defined in PDFBox_External_Fonts.properties It works, but take it just for inspiration. It should be moved do PDFont (I guess) and make it more robust. Incorrect (zero) character widths returned in some docs --- Key: PDFBOX-62 URL: https://issues.apache.org/jira/browse/PDFBOX-62 Project: PDFBox Issue Type: Bug Components: Rendering, Text extraction Assignee: Andreas Lehmkühler Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, pdfbox-62-zerowidth.pdf-1.png [imported from SourceForge] http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674 Originally submitted by tamirhassan on 2005-06-07 13:42. For certain PDF documents (such as the one attached) the character/string widths (as obtained e.g. by the PDFont.getStringWidth method) are not returned correctly, i.e. they appear to be correct for punctuation characters but are zero for alphanumeric characters. It seems as if these alphanumeric characters are NOT within PDFont.firstChar and PDFont.lastChar in the Type 1 font. The method therefore attempts to obtain the font widths from the AFM (font metric) file, but fails (silently) with a 'resource is null' logline message. (Note that this problem doesn't seem to occur with Type 1 fonts in other documents.) A more detailed discussion regarding this issue can be found in this link: http://sourceforge.net/forum/forum.php? thread_id=1260349forum_id=267205 Thanks in advance for any help that can be obtained, Tam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-62) Incorrect (zero) character widths returned in some docs
[ https://issues.apache.org/jira/browse/PDFBOX-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990699#comment-13990699 ] Juraj Lonc commented on PDFBOX-62: -- Yes, I am aware of that. However I think it is better than nothing. It works on current JDK. It is possible to replace those used sun.* classes by own implementation of system font look-up. That would be quite easy. Incorrect (zero) character widths returned in some docs --- Key: PDFBOX-62 URL: https://issues.apache.org/jira/browse/PDFBOX-62 Project: PDFBox Issue Type: Bug Components: Rendering, Text extraction Assignee: Andreas Lehmkühler Attachments: 5542.pdf, PDFBOX-2059_PDTrueTypeFont.diff, PDTrueTypeFont.diff, pdfbox-2006-zerowidth.pdf-1.png, pdfbox-62-zerowidth.pdf-1.png [imported from SourceForge] http://sourceforge.net/tracker/index.php?group_id=78314atid=552832aid=1216674 Originally submitted by tamirhassan on 2005-06-07 13:42. For certain PDF documents (such as the one attached) the character/string widths (as obtained e.g. by the PDFont.getStringWidth method) are not returned correctly, i.e. they appear to be correct for punctuation characters but are zero for alphanumeric characters. It seems as if these alphanumeric characters are NOT within PDFont.firstChar and PDFont.lastChar in the Type 1 font. The method therefore attempts to obtain the font widths from the AFM (font metric) file, but fails (silently) with a 'resource is null' logline message. (Note that this problem doesn't seem to occur with Type 1 fonts in other documents.) A more detailed discussion regarding this issue can be found in this link: http://sourceforge.net/forum/forum.php? thread_id=1260349forum_id=267205 Thanks in advance for any help that can be obtained, Tam -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)
Juraj Lonc created PDFBOX-2059: -- Summary: Characters are not positioned properly (due to wrong widthheight of chars) Key: PDFBOX-2059 URL: https://issues.apache.org/jira/browse/PDFBOX-2059 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: DPH 032014.pdf Characters in this PDF are not positioned properly. All characters are rendered at position x=0.0 Problem is in PDFont.getFontWidth(). it returns 0.0 for every char. The same applies for PDFont.getFontHeight() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2059) Characters are not positioned properly (due to wrong widthheight of chars)
[ https://issues.apache.org/jira/browse/PDFBOX-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2059: --- Attachment: DPH 032014.pdf Characters are not positioned properly (due to wrong widthheight of chars) --- Key: PDFBOX-2059 URL: https://issues.apache.org/jira/browse/PDFBOX-2059 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: DPH 032014.pdf Characters in this PDF are not positioned properly. All characters are rendered at position x=0.0 Problem is in PDFont.getFontWidth(). it returns 0.0 for every char. The same applies for PDFont.getFontHeight() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2042: --- Attachment: ModifyTest.java Here is the sample code. Actually I do not need to modify content of page. Problem is caused just by calling pdResources.getColorSpaces(); and then saving document. ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2042) ColorSpace with empty Range array
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980165#comment-13980165 ] Juraj Lonc commented on PDFBOX-2042: Thanks for fix ;) ColorSpace with empty Range array - Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.8.4, 1.8.5, 2.0.0 Reporter: Juraj Lonc Assignee: Tilman Hausherr Fix For: 1.8.5, 2.0.0 Attachments: ModifyTest.java, pdfbox18.pdf, pdfbox20.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2042) ColorSpace without Range
Juraj Lonc created PDFBOX-2042: -- Summary: ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2042: --- Attachment: pdfbox18.pdf Original (working) file. ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: pdfbox18.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2042) ColorSpace without Range
[ https://issues.apache.org/jira/browse/PDFBOX-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-2042: --- Attachment: pdfbox20.pdf Modified file in pdfbox 2.0.0 (error in Adobe Reader) ColorSpace without Range Key: PDFBOX-2042 URL: https://issues.apache.org/jira/browse/PDFBOX-2042 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: Juraj Lonc Attachments: pdfbox18.pdf, pdfbox20.pdf I have PDF document where I am modifying PDPage content stream. Saved document is invalid (Adobe reader complains about it). I have narrowed it down to ColorSpace. Original document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 ] Modified document has colorspace: /ColorSpace /Cs6 [/ICCBased /Alternate /DeviceRGB /Filter /FlateDecode /Length 2597 /N 3 /Range [] ] When I manually remove /Range [] from PDF then Adobe reader opens it without an error. Obviously that range is added by calling PDICCBased.getRangeArray(0) somewhere. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox
Juraj Lonc created PDFBOX-1547: -- Summary: TextPosition.getX() and getY() do not work properly with CropBox Key: PDFBOX-1547 URL: https://issues.apache.org/jira/browse/PDFBOX-1547 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc TextPosition.getX() and getY() are supposed to calculate position relative to upper left corner of page. When PDF contains CropBox then these functions return incorrect values. CropBox is ignored. Text is relative to CropBox coordinates. page in function description means MediaBox or CropBox? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox
[ https://issues.apache.org/jira/browse/PDFBOX-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1547: --- Attachment: redig_test_crop3.pdf TextPosition.getX() and getY() do not work properly with CropBox Key: PDFBOX-1547 URL: https://issues.apache.org/jira/browse/PDFBOX-1547 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc Attachments: redig_test_crop3.pdf TextPosition.getX() and getY() are supposed to calculate position relative to upper left corner of page. When PDF contains CropBox then these functions return incorrect values. CropBox is ignored. Text is relative to CropBox coordinates. page in function description means MediaBox or CropBox? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1547) TextPosition.getX() and getY() do not work properly with CropBox
[ https://issues.apache.org/jira/browse/PDFBOX-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1547: --- Description: TextPosition.getX() and getY() are supposed to calculate position relative to upper left corner of page. When PDF contains CropBox then these functions return incorrect values. CropBox is ignored. Text is relative to CropBox coordinates but calculations are made only with pageWidth and pageHeight, and that is wrong. page in function description means MediaBox or CropBox? was: TextPosition.getX() and getY() are supposed to calculate position relative to upper left corner of page. When PDF contains CropBox then these functions return incorrect values. CropBox is ignored. Text is relative to CropBox coordinates. page in function description means MediaBox or CropBox? TextPosition.getX() and getY() do not work properly with CropBox Key: PDFBOX-1547 URL: https://issues.apache.org/jira/browse/PDFBOX-1547 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc Attachments: redig_test_crop3.pdf TextPosition.getX() and getY() are supposed to calculate position relative to upper left corner of page. When PDF contains CropBox then these functions return incorrect values. CropBox is ignored. Text is relative to CropBox coordinates but calculations are made only with pageWidth and pageHeight, and that is wrong. page in function description means MediaBox or CropBox? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1538: --- Attachment: PDFBOX-1538_PageDrawer.diff redig_test_textAdded_annot.pdf I made fix for this in PageDrawer. I was not able to find any information about this in PDF reference. So I don't know whether FreeText annotation subtype has to be handled in different way than other annotation subtypes or not. But my fix works for attached sample PDF file. Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: output.png, PDFBOX-1538_PageDrawer.diff, redig_test_textAdded_annot.pdf, redig_test_textAdded.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine
[ https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609022#comment-13609022 ] Juraj Lonc commented on PDFBOX-1545: This iteration is not supposed to give you whole words. This iteration gives you tokes exactly in the same way they are stored in PDF. Every single letter could be stored separately. ReplaceString fails to replace text, however RemoveText or TextExtraction works fine Key: PDFBOX-1545 URL: https://issues.apache.org/jira/browse/PDFBOX-1545 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.7.1 Environment: ubuntu 32bit, Java 6 Reporter: MartinV Labels: patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf : https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing (anyone with link can view and download it...) As i found during iteration in Tj and tj operations : COSString previous = (COSString)tokens.get( j-1 ); String string = previous.getString(); Those strings are just empty or with length of 2 (some whitespaces only) ... i would expect to get some separated group of words from my PDF. I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?). I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine
[ https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609022#comment-13609022 ] Juraj Lonc edited comment on PDFBOX-1545 at 3/21/13 3:13 PM: - This iteration is not supposed to give you whole words. This iteration gives you tokens exactly in the same way they are stored in PDF. Every single letter could be stored separately. was (Author: chupacabras): This iteration is not supposed to give you whole words. This iteration gives you tokes exactly in the same way they are stored in PDF. Every single letter could be stored separately. ReplaceString fails to replace text, however RemoveText or TextExtraction works fine Key: PDFBOX-1545 URL: https://issues.apache.org/jira/browse/PDFBOX-1545 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.7.1 Environment: ubuntu 32bit, Java 6 Reporter: MartinV Labels: patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf : https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing (anyone with link can view and download it...) As i found during iteration in Tj and tj operations : COSString previous = (COSString)tokens.get( j-1 ); String string = previous.getString(); Those strings are just empty or with length of 2 (some whitespaces only) ... i would expect to get some separated group of words from my PDF. I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?). I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1538: --- Attachment: PDFBOX-1538_PageDrawer.diff Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: output.png, PDFBOX-1538_PageDrawer.diff, redig_test_textAdded_annot.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1538: --- Attachment: (was: PDFBOX-1538_PageDrawer.diff) Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: output.png, PDFBOX-1538_PageDrawer.diff, redig_test_textAdded_annot.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601283#comment-13601283 ] Juraj Lonc commented on PDFBOX-1538: It seems that problem is in transformations. Problem is not with font or color. Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: output.png, redig_test_textAdded.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1538: --- Attachment: output.png redig_test_textAdded.pdf Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc Attachments: output.png, redig_test_textAdded.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
Juraj Lonc created PDFBOX-1538: -- Summary: Content of annotation not visible in image (converted from pdf) Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc Attachments: output.png, redig_test_textAdded.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1538) Content of annotation not visible in image (converted from pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1538: --- Affects Version/s: 1.7.1 Content of annotation not visible in image (converted from pdf) --- Key: PDFBOX-1538 URL: https://issues.apache.org/jira/browse/PDFBOX-1538 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: output.png, redig_test_textAdded.pdf pdPage.convertToImage converts pdf to image but content of annotation is missing -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PDFBOX-1503) double logging of exceptions
Juraj Lonc created PDFBOX-1503: -- Summary: double logging of exceptions Key: PDFBOX-1503 URL: https://issues.apache.org/jira/browse/PDFBOX-1503 Project: PDFBox Issue Type: Improvement Reporter: Juraj Lonc I made web application which uses pdfbox library and its funcionality. This web application is deployed on jboss. (log4j is used for logging) If some exception occurs in pdfbox then exception is printed twice that is not good. It makes mess and it is hard to use SMTPAppender or other appender that processes log events. Problem is that you are using everywhere construction: try { ... } catch( Exception e ) { e.printStackTrace(); LOG.error(e, e); } I think it would be nice to have it like this: try { ... } catch( Exception e ) { LOG.error(e, e); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1503) double logging of exceptions
[ https://issues.apache.org/jira/browse/PDFBOX-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1503: --- Description: I made web application which uses pdfbox library and its funcionality. This web application is deployed on jboss. (log4j is used for logging) If some exception occurs in pdfbox then exception is printed twice. That is not good. It makes mess in log and it is hard to use SMTPAppender or other appender that processes log events. Problem is that you are using everywhere construction: try { ... } catch( Exception e ) { e.printStackTrace(); LOG.error(e, e); } I think it would be nice to have it like this: try { ... } catch( Exception e ) { LOG.error(e, e); } was: I made web application which uses pdfbox library and its funcionality. This web application is deployed on jboss. (log4j is used for logging) If some exception occurs in pdfbox then exception is printed twice that is not good. It makes mess and it is hard to use SMTPAppender or other appender that processes log events. Problem is that you are using everywhere construction: try { ... } catch( Exception e ) { e.printStackTrace(); LOG.error(e, e); } I think it would be nice to have it like this: try { ... } catch( Exception e ) { LOG.error(e, e); } double logging of exceptions Key: PDFBOX-1503 URL: https://issues.apache.org/jira/browse/PDFBOX-1503 Project: PDFBox Issue Type: Improvement Reporter: Juraj Lonc I made web application which uses pdfbox library and its funcionality. This web application is deployed on jboss. (log4j is used for logging) If some exception occurs in pdfbox then exception is printed twice. That is not good. It makes mess in log and it is hard to use SMTPAppender or other appender that processes log events. Problem is that you are using everywhere construction: try { ... } catch( Exception e ) { e.printStackTrace(); LOG.error(e, e); } I think it would be nice to have it like this: try { ... } catch( Exception e ) { LOG.error(e, e); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538870#comment-13538870 ] Juraj Lonc commented on PDFBOX-1473: Yes. But you have to remember that any sequence of operators and operands which comes to CharStringConverter must be already inflated (subrs replaced). If there is any subr command in that sequence then exception should be thrown. You are right, handling of subr commands in CharStringConverter is obsolete and not necessary. If you remove handling of subr commands than CharStringConverter does not need fontGlobalSubrIndex/fontLocalSubrIndex. Type1CharStringParser should be fixed the same way. Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Assignee: Andreas Lehmkühler Attachments: CFFParser.patch, parsingfix_CFFFont.patch, parsingfix_Type2CharStringParser.patch, PDType1CFont.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1473: --- Attachment: parsingfix_Type2CharStringParser.patch parsingfix_CFFFont.patch I made fix in parser. callsubr and callgsubr commands are processed directly in parser so parser works with correct count of stems. Parser works OK with this modification and CharString is being decoded correctly. But I found another bug. OpenType fonts may have CMAP definition (like the one in attached PDF). And this CMAP is not handled/processed. You are expecting that Type1 font cannot have CMAP. This causes that no character is printed/drawed despite font is parsed correctly (now). Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Assignee: Andreas Lehmkühler Attachments: CFFParser.patch, parsingfix_CFFFont.patch, parsingfix_Type2CharStringParser.patch, PDType1CFont.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1473: --- Attachment: redig_test_textAdded.pdf Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1473: --- Attachment: CFFParser.patch I made fix in CFFParser so it can properly read CFF data from OpenType font. Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: CFFParser.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1473: --- Attachment: PDType1CFont.patch I made enhancement to PDType1CFont so the proper font is selected. OpenType font could contain multiple fonts. Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: CFFParser.patch, PDType1CFont.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535850#comment-13535850 ] Juraj Lonc commented on PDFBOX-1473: Partly yes. I detected 3 problems. I provided patch files for 2 of them. The last one equals to mentioned PDFBOX-969 Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: CFFParser.patch, PDType1CFont.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1473) Incorrect handling of OpenType fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535928#comment-13535928 ] Juraj Lonc commented on PDFBOX-1473: I think I found where is the problem. It seems that command hintmask is incorrectly parsed. It results to incorrect count of hints and that causes that incorrect number of following bytes is read (as part of hintmask operator). And this naturally leads to incorrect parsing of data following the hintmask. Commands should be parsed in inline way, so callsubr and callgsubr should be expanded to the stack before the following data is parsed. I am not sure if I explained it understoodable ;) Incorrect handling of OpenType fonts Key: PDFBOX-1473 URL: https://issues.apache.org/jira/browse/PDFBOX-1473 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 1.7.1 Reporter: Juraj Lonc Assignee: Andreas Lehmkühler Attachments: CFFParser.patch, PDType1CFont.patch, redig_test_textAdded.pdf There is embedded font in this PDF which pdfbox/fontbox does not handle properly. This OpenType font contains CFF data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PDFBOX-1468) Decrypting unencrypted strings
Juraj Lonc created PDFBOX-1468: -- Summary: Decrypting unencrypted strings Key: PDFBOX-1468 URL: https://issues.apache.org/jira/browse/PDFBOX-1468 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - Drabikova.pdf I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings
[ https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1468: --- Attachment: Protokol o kontrole originality - Drabikova.pdf PDFBOX-1468.diff Decrypting unencrypted strings -- Key: PDFBOX-1468 URL: https://issues.apache.org/jira/browse/PDFBOX-1468 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - Drabikova.pdf I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1468) Decrypting unencrypted strings
[ https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531041#comment-13531041 ] Juraj Lonc commented on PDFBOX-1468: that fix was done in pdfbox-1.7.1\org\apache\pdfbox\pdmodel\encryption\SecurityHandler.java Decrypting unencrypted strings -- Key: PDFBOX-1468 URL: https://issues.apache.org/jira/browse/PDFBOX-1468 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - Drabikova.pdf I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings
[ https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1468: --- Description: I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Some string are encrypted: /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265) Some are not: /Registry(Adobe) /Ordering(Identity) was: I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Decrypting unencrypted strings -- Key: PDFBOX-1468 URL: https://issues.apache.org/jira/browse/PDFBOX-1468 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - Drabikova.pdf I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Some string are encrypted: /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265) Some are not: /Registry(Adobe) /Ordering(Identity) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1468) Decrypting unencrypted strings
[ https://issues.apache.org/jira/browse/PDFBOX-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1468: --- Description: I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Some strings are encrypted: /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265) Some are not: /Registry(Adobe) /Ordering(Identity) was: I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Some string are encrypted: /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265) Some are not: /Registry(Adobe) /Ordering(Identity) Decrypting unencrypted strings -- Key: PDFBOX-1468 URL: https://issues.apache.org/jira/browse/PDFBOX-1468 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 1.7.1 Reporter: Juraj Lonc Attachments: PDFBOX-1468.diff, Protokol o kontrole originality - Drabikova.pdf I have received encrypted PDF which contains several string objects but not all of them are encrypted. I am not sure whether it is or it is not compliant with pdf reference. But I have created fix so pdfbox can handle this. If string contains only chars between 32-127 then decryption is not necessary (I know, this is not true in 100% of cases but I think it is swallowable) Some strings are encrypted: /CreationDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /ModDate(\222\202\376k\003\372\306\236\(IP\327C\215\375k\357) /Producer(\241\350\210\035\001\352\224\3219\(0\247\006\333\2537\225\334\300\232\265) Some are not: /Registry(Adobe) /Ordering(Identity) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PDFBOX-1408) Width of space character is calculated wrong
Juraj Lonc created PDFBOX-1408: -- Summary: Width of space character is calculated wrong Key: PDFBOX-1408 URL: https://issues.apache.org/jira/browse/PDFBOX-1408 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc PDFStreamEngine calculates width of space (line 357): spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 )*glyphSpaceToTextSpaceFactor); In some cases it the result is 0. Problem is that getFontWidth requires code number of . If there is ToUnicode mapping for that font that it is necessary to lookup CMap for code number and NOT to use 0x20 (space) as it is in souce code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1408) Width of space character is calculated wrong
[ https://issues.apache.org/jira/browse/PDFBOX-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juraj Lonc updated PDFBOX-1408: --- Description: PDFStreamEngine calculates width of space (line 357): spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 )*glyphSpaceToTextSpaceFactor); In some cases the result is 0. Problem is that getFontWidth requires code number of . If there is ToUnicode mapping for that font that it is necessary to lookup CMap for code number and NOT to use 0x20 (space) as it is in souce code. was: PDFStreamEngine calculates width of space (line 357): spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 )*glyphSpaceToTextSpaceFactor); In some cases it the result is 0. Problem is that getFontWidth requires code number of . If there is ToUnicode mapping for that font that it is necessary to lookup CMap for code number and NOT to use 0x20 (space) as it is in souce code. Width of space character is calculated wrong Key: PDFBOX-1408 URL: https://issues.apache.org/jira/browse/PDFBOX-1408 Project: PDFBox Issue Type: Bug Reporter: Juraj Lonc PDFStreamEngine calculates width of space (line 357): spaceWidthText = (font.getFontWidth( SPACE_BYTES, 0, 1 )*glyphSpaceToTextSpaceFactor); In some cases the result is 0. Problem is that getFontWidth requires code number of . If there is ToUnicode mapping for that font that it is necessary to lookup CMap for code number and NOT to use 0x20 (space) as it is in souce code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira