[jira] [Comment Edited] (PDFBOX-2094) Add PrintRequestAttributeSet parameter to silentPrint()
[ https://issues.apache.org/jira/browse/PDFBOX-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038465#comment-14038465 ] senthuran edited comment on PDFBOX-2094 at 6/20/14 6:04 AM: [~jahewson] silentPrint() has been implemented with out accepting any argument. Appreciated if you can implement silentPrint() to accepetd the printRequestAttributeSet and printJob as a argument. same way you have implemented print(). was (Author: stacktome): [~jahewson] silentPrint() has been implemented with out accepting any pramters form outside. Appreciated if you can implement silentPrint() to accepetd the printRequestAttributeSet and printJob as a paramter. same way you have implemented print(). Add PrintRequestAttributeSet parameter to silentPrint() --- Key: PDFBOX-2094 URL: https://issues.apache.org/jira/browse/PDFBOX-2094 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: senthuran Assignee: John Hewson Priority: Minor Fix For: 2.0.0 The current implementation is not allow us to set the printer , paper Attribute. Could you please implement the silentPrint() to accept printRequestAttributeSet as parameter. affected version from pdfbox-app-2.0.0-20140506.050443-277jar to pdfbox-app-2.0.0-20140506.050443-301jar . -- This message was sent by Atlassian JIRA (v6.2#6252)
PDFBox and XMP - retire jempbox
Hi, we currently have two libraries handling XMP metadata jempbox and xmpbox. Part of PDFBOX-1187/PDFBOX-2197 was to remove a direct dependency from jempbox as now XMP metadata could be generated by any library and added as a stream. This will be available for PDFBox 2.0.0. I would like to propose to now retire jempbox as xmpbox # is closer to the spec (naming conventions) # used for PDF/A validation where we can not remove a dependency on XMP handling as checking metadata is necessary for PDF/A compliance. In case there is functionality in jempbox that is missing in xmpbox that could be added at a later stage upon request. WDYT? BR Maruan
[jira] [Comment Edited] (PDFBOX-2094) Add PrintRequestAttributeSet parameter to silentPrint()
[ https://issues.apache.org/jira/browse/PDFBOX-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038465#comment-14038465 ] senthuran edited comment on PDFBOX-2094 at 6/20/14 6:14 AM: [~jahewson] silentPrint() has been implemented with accepting printerJob as argument also not accepting any argument. Appreciated if you can implement silentPrint() to accepetd the printRequestAttributeSet and printJob as a argument. same way you have implemented print(). was (Author: stacktome): [~jahewson] silentPrint() has been implemented with out accepting any argument. Appreciated if you can implement silentPrint() to accepetd the printRequestAttributeSet and printJob as a argument. same way you have implemented print(). Add PrintRequestAttributeSet parameter to silentPrint() --- Key: PDFBOX-2094 URL: https://issues.apache.org/jira/browse/PDFBOX-2094 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: senthuran Assignee: John Hewson Priority: Minor Fix For: 2.0.0 The current implementation is not allow us to set the printer , paper Attribute. Could you please implement the silentPrint() to accept printRequestAttributeSet as parameter. affected version from pdfbox-app-2.0.0-20140506.050443-277jar to pdfbox-app-2.0.0-20140506.050443-301jar . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2153) Setting the correct clipping path for shading
Tilman Hausherr created PDFBOX-2153: --- Summary: Setting the correct clipping path for shading Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2153: Labels: shading shadingpattern (was: ) Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038567#comment-14038567 ] Petr Slaby commented on PDFBOX-2149: Attached a file which runs into a NPE in PDFont#isSymbolicFont() now. {noformat} Caused by: java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDFont.isSymbolicFont(PDFont.java:694) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getGIDForCharacterCode(PDTrueTypeFont.java:408) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:378) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:259) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:226) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:209) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:175) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:227) at org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:190) at org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:174) {noformat} Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2149: --- Attachment: 000467.pdf Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2149: --- Attachment: 39.pdf Here is another one. Hope this helps. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038579#comment-14038579 ] Andreas Lehmkühler commented on PDFBOX-2149: [~jahewson] According to the spec you are totally right, but the real world is quite different. There are a lot of pdf generators which don't care about the spec. And more important the user doesn't care about the spec either. If the pdf opens in acrobat than it has to be opened by any other pdf reader as well. Tilman has an example, Petr as well, have a look at PDFBOX-62 and you'll find another one and I guess there are a lot more in with wild out there triggering the described NPE. Either you revert your changes to reinstate my workaround or you'll come up with another/better one yourself. We, the PDFBox community don't like it, but we've learned that we have to live with such workarounds. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: PDFBox and XMP - retire jempbox
Hi, Am 20.06.2014 08:05, schrieb Maruan Sahyoun: Hi, we currently have two libraries handling XMP metadata jempbox and xmpbox. Part of PDFBOX-1187/PDFBOX-2197 was to remove a direct dependency from jempbox as now XMP metadata could be generated by any library and added as a stream. This will be available for PDFBox 2.0.0. I would like to propose to now retire jempbox as xmpbox # is closer to the spec (naming conventions) # used for PDF/A validation where we can not remove a dependency on XMP handling as checking metadata is necessary for PDF/A compliance. In case there is functionality in jempbox that is missing in xmpbox that could be added at a later stage upon request. WDYT? I've nothing to add +1 BR Maruan BR Andreas Lehmkühler
Re: Travis CI
Hi, Am 19.06.2014 22:03, schrieb John Hewson: Hi All The recent instability of Jenkins prompted me to set up Travis CI to build the PDFBox mirror on GitHub. Automatic builds are triggered after every commit, and they can often run much faster than on the busy Jenkins server, so this gives committers an additional means to quickly determine if their build has problems or not. Good idea. The builds are public at: https://travis-ci.org/apache/pdfbox The Jenkins build is still the “ground truth” and passing that is what counts, it *might* be possible to pass Travis CI and still fail on Jenkins, so that’s something to keep in mind. Especially as the travis build uses oraclejdk7 as compiler. PDFBox has java6 as minimum requirement and that configuration may hide incompatibilities because of the choosen java version. -- John BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038601#comment-14038601 ] Andreas Lehmkühler commented on PDFBOX-2153: Hmmm, I thought that it would be sufficient to use the clipping path as argument when calling the fill method, but obviously it isn't. IMHO go ahead Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038604#comment-14038604 ] Petr Slaby commented on PDFBOX-2153: Sounds reasonable. Current clipping path is passed to graphics.fill(), so if the graphics has a clipping path from a previous operation, it might interfere with that. I vote for setClip(null) because setClip() is a time and memory consuming operation if called with a complex path. The change does not show any effect on my test suite documents, it seems that I do not have an example that would be affected. Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: PDFBox and XMP - retire jempbox
Hi, Am 20.06.2014 08:05, schrieb Maruan Sahyoun: Hi, we currently have two libraries handling XMP metadata jempbox and xmpbox. Part of PDFBOX-1187/PDFBOX-2197 was to remove a direct dependency from jempbox as now XMP metadata could be generated by any library and added as a stream. This will be available for PDFBox 2.0.0. I would like to propose to now retire jempbox as xmpbox # is closer to the spec (naming conventions) # used for PDF/A validation where we can not remove a dependency on XMP handling as checking metadata is necessary for PDF/A compliance. In case there is functionality in jempbox that is missing in xmpbox that could be added at a later stage upon request. WDYT? +1 Best, Timo -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
Re: [VOTE] Release Apache PDFBox 1.8.6
Hi, +1 many thanks for preparing the release. Best, Timo Am 19.06.2014 14:28, schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 1.8.6 release is available at: http://people.apache.org/~lehmi/pdfbox/1.8.6/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/1.8.6/ The SHA1 checksum of the archive is 543c49ebe34a443654a0c3c264f36acc07983cc6. Please vote on releasing this package as Apache PDFBox 1.8.6. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 1.8.6 [ ] -1 Do not release this package because... Here is my +1 BR Andreas Lehmkühler -- Timo Boehme OntoChem GmbH H.-Damerow-Str. 4 06120 Halle/Saale T: +49 345 4780474 F: +49 345 4780471 timo.boe...@ontochem.com _ OntoChem GmbH Geschäftsführer: Dr. Lutz Weber Sitz: Halle / Saale Registergericht: Stendal Registernummer: HRB 215461 _
[jira] [Commented] (PDFBOX-2118) Remove ICU4J dependency
[ https://issues.apache.org/jira/browse/PDFBOX-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038860#comment-14038860 ] Andreas Lehmkühler commented on PDFBOX-2118: I've just realized that we don't have to replace the Bidi usage as it isn't used anymore, so that I just removed it from the trunk in revision http://svn.apache.org/r1604181. I've marked the deleted method and class as deprecated in the 1.8 branch in revision http://svn.apache.org/r1604182 Remove ICU4J dependency --- Key: PDFBOX-2118 URL: https://issues.apache.org/jira/browse/PDFBOX-2118 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Labels: ICU4J Fix For: 2.0.0 The ICU4J lib is quite big and we are just using a small part of it. Both features are provided by the JDK (java.text.Normalizer and java.text.Bidi) since 1.6 so that it should be possible to remove the ICU4J dependency. -- This message was sent by Atlassian JIRA (v6.2#6252)
Jenkins build is back to normal : PDFBox-trunk #1064
See https://builds.apache.org/job/PDFBox-trunk/1064/changes
Jenkins build is back to normal : PDFBox-trunk » PDFBox parent #1064
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox-parent/1064/changes
[jira] [Commented] (PDFBOX-2118) Remove ICU4J dependency
[ https://issues.apache.org/jira/browse/PDFBOX-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038890#comment-14038890 ] Maruan Sahyoun commented on PDFBOX-2118: [~lehmi] Shouldn’t we deprecate the methods normalizePres() and normalizeDiac() in 1.8 as 2.0 uses normalizePresentationForm() and normalizeDiacritic(). It might also be beneficial to add both methods - using the old code i.e. normalizePresentationForm() calls normalizePres() - to 1.8 so people can start using the new methods in 1.8 already. Remove ICU4J dependency --- Key: PDFBOX-2118 URL: https://issues.apache.org/jira/browse/PDFBOX-2118 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Labels: ICU4J Fix For: 2.0.0 The ICU4J lib is quite big and we are just using a small part of it. Both features are provided by the JDK (java.text.Normalizer and java.text.Bidi) since 1.6 so that it should be possible to remove the ICU4J dependency. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2118) Remove ICU4J dependency
[ https://issues.apache.org/jira/browse/PDFBOX-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038914#comment-14038914 ] Andreas Lehmkühler commented on PDFBOX-2118: [~msahyoun] I'm not sure if I got your point. Both methods normalizePres() and normalizeDiac() aren't used directly. They are called through the TextNormalizer class but only if the ICU4J lib is present. I've deprecated the whole class which includes both methods. Remove ICU4J dependency --- Key: PDFBOX-2118 URL: https://issues.apache.org/jira/browse/PDFBOX-2118 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Labels: ICU4J Fix For: 2.0.0 The ICU4J lib is quite big and we are just using a small part of it. Both features are provided by the JDK (java.text.Normalizer and java.text.Bidi) since 1.6 so that it should be possible to remove the ICU4J dependency. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038925#comment-14038925 ] Tilman Hausherr commented on PDFBOX-2153: - Fixed in rev http://svn.apache.org/r1604192 for the trunk. Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2118) Remove ICU4J dependency
[ https://issues.apache.org/jira/browse/PDFBOX-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038927#comment-14038927 ] Maruan Sahyoun commented on PDFBOX-2118: [~lehmi] I’ve missed that. As TextNormalize is public one could use it directly ... Remove ICU4J dependency --- Key: PDFBOX-2118 URL: https://issues.apache.org/jira/browse/PDFBOX-2118 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: Andreas Lehmkühler Assignee: Andreas Lehmkühler Labels: ICU4J Fix For: 2.0.0 The ICU4J lib is quite big and we are just using a small part of it. Both features are provided by the JDK (java.text.Normalizer and java.text.Bidi) since 1.6 so that it should be possible to remove the ICU4J dependency. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038931#comment-14038931 ] Tilman Hausherr commented on PDFBOX-1915: - Sure, go ahead. I'll look at your code later today or this WE. I remember I tried inserting a break in your code but took that back for some reason. Please correct PageDrawer.shfill() by inserting graphics.setClip(null); before the last line, see PDFBOX-2153. Then try rendering the eci file and you'll be pleasantly suprised :-) Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were
[jira] [Closed] (PDFBOX-1947) Axial shading doesn't appear
[ https://issues.apache.org/jira/browse/PDFBOX-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-1947. --- Resolution: Duplicate Axial shading doesn't appear Key: PDFBOX-1947 URL: https://issues.apache.org/jira/browse/PDFBOX-1947 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: shading, shadingpattern Attachments: PDFBOX-1940.pdf, pdfbox-1940.pdf-1.png ShadingType 2 (axial shading) doesn't appear in attached file. Maybe related to PDFBOX-1442. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-1451) Error in converting a pdf to image using convertToImage
[ https://issues.apache.org/jira/browse/PDFBOX-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed PDFBOX-1451. --- Resolution: Duplicate Shading issue fixed in PDFBOX-2153. Error in converting a pdf to image using convertToImage --- Key: PDFBOX-1451 URL: https://issues.apache.org/jira/browse/PDFBOX-1451 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.5.0, 1.7.1, 1.8.6, 2.0.0 Reporter: Emanuele Lombardi Assignee: Andreas Lehmkühler Labels: shading, shadingpattern Attachments: Alfresco_Enterprise4_Mobile.pdf, Alfresco_Enterprise4_Mobile1.5.0.png, Alfresco_Enterprise4_Mobile1.7.1.png Hi, I converted a pdf to image using Class : PDPage API : public BufferedImage convertToImage() i obtained an image with the first line of the bulleted list on the right with strange character and: with 1.5.0 version is missing the image on the top with 1.7.1 i had a strange color issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039024#comment-14039024 ] Tilman Hausherr commented on PDFBOX-2153: - Fixed in rev 1604211 for the 1.8 branch. Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2153: Affects Version/s: 2.0.0 1.8.6 1.8.5 Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039084#comment-14039084 ] Tilman Hausherr commented on PDFBOX-2149: - The file of PDFBOX-2059 has also the NPE. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: log4j
PDFBOX-2151 Am 17.06.2014 09:36, schrieb Simon Steiner: Hi, Should pdfbox move few bits of log4j to commons logging? Thanks
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039252#comment-14039252 ] Tilman Hausherr commented on PDFBOX-1915: - I ran my tests; some patch boundaries look more like a line than like a curve, especially tensor-nofunction-CMYK.pdf and tensor-nofunction-RGB.pdf (all at 96dpi). The weird thing is that I could observe these effects only with tensor patches, not with coons patches. The coons patches are 100% identical. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and
[jira] [Assigned] (PDFBOX-1995) AdobePDFSchema.getProducer() returns empty string
[ https://issues.apache.org/jira/browse/PDFBOX-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillaume Bailleul reassigned PDFBOX-1995: -- Assignee: Guillaume Bailleul AdobePDFSchema.getProducer() returns empty string - Key: PDFBOX-1995 URL: https://issues.apache.org/jira/browse/PDFBOX-1995 Project: PDFBox Issue Type: Bug Components: XmpBox Affects Versions: 1.8.4 Reporter: Alexandre Garino Assignee: Guillaume Bailleul I experienced this bug while PDF/A validation process. The document is not considered valid because the producer value is not in sync with PDDocumentInformation. {quote} PDDocumentInformation.getProducer() = ` ' (one space) AdobePDFSchema.getProducer() = `' (empty) {quote} Below the metadata extracted from the PDF document: {quote} ?xpacket begin= id=W5M0MpCehiHzreSzNTczkc9d? x:xmpmeta xmlns:x=adobe:ns:meta/ rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; rdf:Description rdf:about= xmlns:xap=http://ns.adobe.com/xap/1.0/; xap:CreatorToolCanon /xap:CreatorTool xap:CreateDate2014-01-23T20:09:45+01:00/xap:CreateDate /rdf:Description rdf:Description rdf:about= xmlns:pdf=http://ns.adobe.com/pdf/1.3/; pdf:Producer /pdf:Producer /rdf:Description rdf:Description rdf:about= xmlns:pdfaid=http://www.aiim.org/pdfa/ns/id/; pdfaid:part1/pdfaid:part pdfaid:conformanceB/pdfaid:conformance /rdf:Description /rdf:RDF /x:xmpmeta ?xpacket end=w? {quote} As you can see the Producer value should be equal to ` ' (one space). The bug is located within the method DomXmpParser.removeComments. This method is invoked during the unmarshalling process and removes much more than comments, text nodes too! I can fix (badly) MY issue by changing the code base from : {quote} Text t = (Text) node; if (t.getTextContent().trim().length() == 0) { // XXX is there a better way to remove useless Text ? node.getParentNode().removeChild(node); } {quote} into : {quote} Text t = (Text) node; if (t.getTextContent().startsWith(\n)) { // XXX is there a better way to remove useless Text ? node.getParentNode().removeChild(node); } {quote} But this is not a long term fix. IMHO, the unmarshalling process should be reworked. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-1995) AdobePDFSchema.getProducer() returns empty string
[ https://issues.apache.org/jira/browse/PDFBOX-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillaume Bailleul resolved PDFBOX-1995. Resolution: Fixed Fix Version/s: 2.0.0 Fix and new test added in r1604276 AdobePDFSchema.getProducer() returns empty string - Key: PDFBOX-1995 URL: https://issues.apache.org/jira/browse/PDFBOX-1995 Project: PDFBox Issue Type: Bug Components: XmpBox Affects Versions: 1.8.4 Reporter: Alexandre Garino Assignee: Guillaume Bailleul Fix For: 2.0.0 I experienced this bug while PDF/A validation process. The document is not considered valid because the producer value is not in sync with PDDocumentInformation. {quote} PDDocumentInformation.getProducer() = ` ' (one space) AdobePDFSchema.getProducer() = `' (empty) {quote} Below the metadata extracted from the PDF document: {quote} ?xpacket begin= id=W5M0MpCehiHzreSzNTczkc9d? x:xmpmeta xmlns:x=adobe:ns:meta/ rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; rdf:Description rdf:about= xmlns:xap=http://ns.adobe.com/xap/1.0/; xap:CreatorToolCanon /xap:CreatorTool xap:CreateDate2014-01-23T20:09:45+01:00/xap:CreateDate /rdf:Description rdf:Description rdf:about= xmlns:pdf=http://ns.adobe.com/pdf/1.3/; pdf:Producer /pdf:Producer /rdf:Description rdf:Description rdf:about= xmlns:pdfaid=http://www.aiim.org/pdfa/ns/id/; pdfaid:part1/pdfaid:part pdfaid:conformanceB/pdfaid:conformance /rdf:Description /rdf:RDF /x:xmpmeta ?xpacket end=w? {quote} As you can see the Producer value should be equal to ` ' (one space). The bug is located within the method DomXmpParser.removeComments. This method is invoked during the unmarshalling process and removes much more than comments, text nodes too! I can fix (badly) MY issue by changing the code base from : {quote} Text t = (Text) node; if (t.getTextContent().trim().length() == 0) { // XXX is there a better way to remove useless Text ? node.getParentNode().removeChild(node); } {quote} into : {quote} Text t = (Text) node; if (t.getTextContent().startsWith(\n)) { // XXX is there a better way to remove useless Text ? node.getParentNode().removeChild(node); } {quote} But this is not a long term fix. IMHO, the unmarshalling process should be reworked. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2154) NPE while rendering files with type3 fonts
Tilman Hausherr created PDFBOX-2154: --- Summary: NPE while rendering files with type3 fonts Key: PDFBOX-2154 URL: https://issues.apache.org/jira/browse/PDFBOX-2154 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.5, 1.8.4, 1.8.3, 1.8.6 Reporter: Tilman Hausherr I get this NPE with the files of PDFBOX-1145, PDFBOX-1794, PDFBOX-2023 in 1.8 only: java.lang.NullPointerException at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:210) at org.apache.pdfbox.pdmodel.font.Type3StreamParser.createImage(Type3StreamParser.java:59) at org.apache.pdfbox.pdmodel.font.PDType3Font.createImageIfNecessary(PDType3Font.java:80) at org.apache.pdfbox.pdmodel.font.PDType3Font.drawString(PDType3Font.java:102) at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:256) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:499) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:135) at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801) at org.apache.pdfbox.util.TestPDFToImage.doTestFile(TestPDFToImage.java:232) at org.apache.pdfbox.util.TestPDFToImage.testRenderImage(TestPDFToImage.java:344) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at junit.textui.TestRunner.doRun(TestRunner.java:116) at junit.textui.TestRunner.start(TestRunner.java:180) at junit.textui.TestRunner.main(TestRunner.java:138) at org.apache.pdfbox.util.TestPDFToImage.main(TestPDFToImage.java:394) After fixing PDFStreamEngine.processStream() like this {code} if (aPage == null) { graphicsState = new PDGraphicsState(); } else { graphicsState = new PDGraphicsState(aPage.findCropBox()); } {code} I get another NPE: java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.Type3StreamParser.createImage(Type3StreamParser.java:60) at org.apache.pdfbox.pdmodel.font.PDType3Font.createImageIfNecessary(PDType3Font.java:80) at org.apache.pdfbox.pdmodel.font.PDType3Font.drawString(PDType3Font.java:102) at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:256) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:506) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:564) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:275) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:242) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:222) at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:135) at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801) at org.apache.pdfbox.util.TestPDFToImage.doTestFile(TestPDFToImage.java:232) at org.apache.pdfbox.util.TestPDFToImage.testRenderImage(TestPDFToImage.java:344) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at
[jira] [Reopened] (PDFBOX-1940) Faulty pdf-image rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened PDFBOX-1940: - Assignee: Tilman Hausherr (was: John Hewson) Reopening to apply the fix to 1.8 Faulty pdf-image rendering --- Key: PDFBOX-1940 URL: https://issues.apache.org/jira/browse/PDFBOX-1940 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Daniel Kozimor Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: input.pdf, output.jpg A particular PDF is producing improper output jpg. The pdf in question, as well as the produced jpg can be found attached to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1940) Faulty pdf-image rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1940: Attachment: PDFBOX-1940-v1.8.jpg Faulty pdf-image rendering --- Key: PDFBOX-1940 URL: https://issues.apache.org/jira/browse/PDFBOX-1940 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Daniel Kozimor Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: PDFBOX-1940-v1.8.jpg, input.pdf, output.jpg A particular PDF is producing improper output jpg. The pdf in question, as well as the produced jpg can be found attached to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1940) Faulty pdf-image rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039370#comment-14039370 ] Tilman Hausherr edited comment on PDFBOX-1940 at 6/20/14 9:19 PM: -- Reopening to apply [the fix|https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdfviewer/PageDrawer.java?r1=1571581r2=1571803pathrev=1571803diff_format=h] to 1.8 was (Author: tilman): Reopening to apply the fix to 1.8 Faulty pdf-image rendering --- Key: PDFBOX-1940 URL: https://issues.apache.org/jira/browse/PDFBOX-1940 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Daniel Kozimor Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: PDFBOX-1940-v1.8.jpg, input.pdf, output.jpg A particular PDF is producing improper output jpg. The pdf in question, as well as the produced jpg can be found attached to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1940) Faulty pdf-image rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039377#comment-14039377 ] Tilman Hausherr commented on PDFBOX-1940: - Done in rev 1604279 for the 1.8 branch. Faulty pdf-image rendering --- Key: PDFBOX-1940 URL: https://issues.apache.org/jira/browse/PDFBOX-1940 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.6, 2.0.0 Reporter: Daniel Kozimor Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: PDFBOX-1940-v1.8.jpg, input.pdf, output.jpg A particular PDF is producing improper output jpg. The pdf in question, as well as the produced jpg can be found attached to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1940) Faulty pdf-image rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1940: Affects Version/s: 1.8.6 Faulty pdf-image rendering --- Key: PDFBOX-1940 URL: https://issues.apache.org/jira/browse/PDFBOX-1940 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.6, 2.0.0 Reporter: Daniel Kozimor Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: PDFBOX-1940-v1.8.jpg, input.pdf, output.jpg A particular PDF is producing improper output jpg. The pdf in question, as well as the produced jpg can be found attached to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
Release Apache PDFBox 1.8.6 - API docs
the apidocs for 1.8.6 are available at http://pdfbox.staging.apache.org/docs/1.8.6/javadocs/ upon release they will be put into production. BR Maruan Sahyoun Am 19.06.2014 um 14:28 schrieb Andreas Lehmkuehler andr...@lehmi.de: Hi, a candidate for the PDFBox 1.8.6 release is available at: http://people.apache.org/~lehmi/pdfbox/1.8.6/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/1.8.6/ The SHA1 checksum of the archive is 543c49ebe34a443654a0c3c264f36acc07983cc6. Please vote on releasing this package as Apache PDFBox 1.8.6. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 1.8.6 [ ] -1 Do not release this package because... Here is my +1 BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039402#comment-14039402 ] Tilman Hausherr commented on PDFBOX-2141: - Committed in rev 1604282 for the trunk. Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039490#comment-14039490 ] Tilman Hausherr commented on PDFBOX-2141: - Fixed in the 1.8 version in rev 1604297. While working on the 1.8 version I noticed a comment that relates to PDFBOX-485. In it, [~vbier] told about printing problems with hp laserjet 8150 hp laserjet 1320. [~vbier], are you still using PDFBox and these two printers? If yes, could you please test a snapshot version? (I will tell the URL tomorrow) Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson commented on PDFBOX-2149: - [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting in a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing it's FontDescriptor - it's related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:26 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting in a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:25 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting in a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting in a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing it's FontDescriptor - it's related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 -
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:27 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're substituting it but we should be synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're replacing it with the default font but we should be synthesising a FontDescriptor from the default font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:27 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing and has not FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing we're substituting it but we should be synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 -
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:29 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs is that when a font is missing and has no FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs in that when a font is missing and has not FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:30 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs is that when a font is missing and has no FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: returning false is not going to produce the correct results. Just because it's possible to get rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs is that when a font is missing and has no FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap.
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039594#comment-14039594 ] John Hewson edited comment on PDFBOX-2149 at 6/21/14 12:29 AM: --- [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs is that when a font is missing and has no FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. was (Author: jahewson): [~lehmi], you've missed the point, PDFBox is already equipped to handle the cases where the FontDescriptor is missing by substituting a synthetic FontDescriptor, so we shouldn't be seeing cases where the getFontDescriptor() returns null. It's a bug in PDFBox. Defaulting to returning false from isSymbolicFont() is incorrect, for example if it's the Symbol font which is missing a FontDescriptor - this issue is related to PDFBOX-2140 which I'm trying to fix. What we're seeing in these PDFs is that when a font is missing and has no FontDescriptor we're substituting it but we're not synthesising a FontDescriptor from the substituted font which we loaded from disk. In other words the NPE is actually showing us that there is a bug in PDFBox which needs a fix: and returning false is not going to produce the correct results. Just because you got rid of an exception doesn't mean that PDFBox's behaviour has been corrected: there's more work to be done here to synthesise the missing FontDescriptor correctly. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see.
[jira] [Closed] (PDFBOX-2094) Add PrintRequestAttributeSet parameter to silentPrint()
[ https://issues.apache.org/jira/browse/PDFBOX-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson closed PDFBOX-2094. --- Resolution: Fixed Yes, I see print() was fixed but not silentPrint(), I've now fixed this in r1604305. Add PrintRequestAttributeSet parameter to silentPrint() --- Key: PDFBOX-2094 URL: https://issues.apache.org/jira/browse/PDFBOX-2094 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: senthuran Assignee: John Hewson Priority: Minor Fix For: 2.0.0 The current implementation is not allow us to set the printer , paper Attribute. Could you please implement the silentPrint() to accept printRequestAttributeSet as parameter. affected version from pdfbox-app-2.0.0-20140506.050443-277jar to pdfbox-app-2.0.0-20140506.050443-301jar . -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Travis CI
On 20 Jun 2014, at 01:24, Andreas Lehmkuehler andr...@lehmi.de wrote: Hi, Am 19.06.2014 22:03, schrieb John Hewson: Hi All The recent instability of Jenkins prompted me to set up Travis CI to build the PDFBox mirror on GitHub. Automatic builds are triggered after every commit, and they can often run much faster than on the busy Jenkins server, so this gives committers an additional means to quickly determine if their build has problems or not. Good idea. The builds are public at: https://travis-ci.org/apache/pdfbox The Jenkins build is still the “ground truth” and passing that is what counts, it *might* be possible to pass Travis CI and still fail on Jenkins, so that’s something to keep in mind. Especially as the travis build uses oraclejdk7 as compiler. PDFBox has java6 as minimum requirement and that configuration may hide incompatibilities because of the choosen java version. Good point - I’ve added OpenJDK 6 to the Travis CI build now. -- John BR Andreas Lehmkühler — John
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039610#comment-14039610 ] John Hewson commented on PDFBOX-2153: - Yep, this fix makes sense because: {code} graphics.fill(getGraphicsState().getCurrentClippingPath()); {code} Fills a shape, which happens to be the current clipping path, but like any paint operation it's subject to the current clipping path of the Graphics2D which we haven't updated, so it's stale. --- One nitpick: can we leave out comments like: {code} graphics.setClip(null); // PDFBOX-2153 don't use obsolete clipping path {code} because that's what _svn blame_ is for :) Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: PDFBox and XMP - retire jempbox
+ 1 -- John On 19 Jun 2014, at 23:05, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, we currently have two libraries handling XMP metadata jempbox and xmpbox. Part of PDFBOX-1187/PDFBOX-2197 was to remove a direct dependency from jempbox as now XMP metadata could be generated by any library and added as a stream. This will be available for PDFBox 2.0.0. I would like to propose to now retire jempbox as xmpbox # is closer to the spec (naming conventions) # used for PDF/A validation where we can not remove a dependency on XMP handling as checking metadata is necessary for PDF/A compliance. In case there is functionality in jempbox that is missing in xmpbox that could be added at a later stage upon request. WDYT? BR Maruan
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039614#comment-14039614 ] Shaola Ren commented on PDFBOX-1915: Thanks. Re your last comment, you are absolutely right, hahaha..., I thought you may point out this, that's because I changed the level = 4 to level = 3 in order to get a faster speed in other test cases especially for the lamp_cario and macfeeU5 to see there is nothing wrong with the code, and I know when I changed the level back to 4, everything will remain the same as before. For this time's updating, only the code related to shading type 6 had a relatively more change than shading type 7, shading type 7 is almost suitable to edit the level parameter at the beginning. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039678#comment-14039678 ] Tilman Hausherr commented on PDFBOX-2153: - Yes, but i always see a risk that people change code in the future without doing this. Someone, maybe 5 years from now, might delete setclip(null) with the thought this is useless!. The comment means don't touch this, read PDFBOX-2153 first. Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)