[jira] [Created] (PDFBOX-4756) ScratchFileBuffer seek beyond the last page
Petr Slaby created PDFBOX-4756: -- Summary: ScratchFileBuffer seek beyond the last page Key: PDFBOX-4756 URL: https://issues.apache.org/jira/browse/PDFBOX-4756 Project: PDFBox Issue Type: Bug Reporter: Petr Slaby Attachments: ScratchFileBuffer.java.patch, ScratchFileBufferRegressionTest.java When rendering a confidential PDF, we get a java.io.EOFException in ScratchFileBuffer.seek(). The problem is demonstrated in the attached test and fixed by the attached patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4627) Wrong color of uncolored tiling pattern
[ https://issues.apache.org/jira/browse/PDFBOX-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906178#comment-16906178 ] Petr Slaby commented on PDFBOX-4627: We use PDFBox to render PDFs. For our customers, Adobe Reader is the only source of truth. Even if I managed to convince them that the given PDF is not correct according to the specification, they will not be able to change it. The PDFs most usually come from a source they do not have an influence on. I think the argument "but it renders with Adobe Reader" is a valid one. I agree that it is not possible to be 100% compatible with Adobe reader, but PDBox should try to head in this direction - much rather than attempting to be 100% compatible with the norm and trying to punish you for every mistake a PDF writer software has made. > Wrong color of uncolored tiling pattern > --- > > Key: PDFBOX-4627 > URL: https://issues.apache.org/jira/browse/PDFBOX-4627 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.16 >Reporter: Jiri Kunhart >Priority: Major > Attachments: after_fix.png, before_fix.png, > uncolored_tiling_pattern.patch, uncolored_tiling_pattern.pdf > > > The attached pdf file with uncolored tiling pattern is rendered wrongly (see > "before_fix.png"). The problem is that pattern stream contains > /DevGrayCS cs > which overwrites PDPattern color space stored in > PDGraphicsState#nonStrokingColor. I did a small fix which ignores all > settings of color space inside of uncolored tiling pattern stream and the > result seems to be good (see "after_fix.png"). > Note: the pattern in the png file looks diferently than in the original pdf > file, but this should be handled probably in the other issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623694#comment-16623694 ] Petr Slaby commented on PDFBOX-4309: Trying to load a sun class does not seem to be a good approach to me. Isn't it so that KCMS is only used if Java < 7 or Java in (7, 8) and the system property sun.java2d.cmm is set to sun.java2d.cmm.kcms.KcmsServiceProvider ? In all other cases, LCMS can be assumed. However, there is also IBM Java, JRebel and the like. I do not know what CMS these are using and whether it is "slow" or "fast". > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF documents (e.g.
[jira] [Commented] (PDFBOX-4245) wrong rendering of the transparency group at the specific position on a page
[ https://issues.apache.org/jira/browse/PDFBOX-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552145#comment-16552145 ] Petr Slaby commented on PDFBOX-4245: Ooops, sorry. Thx. > wrong rendering of the transparency group at the specific position on a page > > > Key: PDFBOX-4245 > URL: https://issues.apache.org/jira/browse/PDFBOX-4245 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.10 >Reporter: Jiri Kunhart >Assignee: Tilman Hausherr >Priority: Major > Labels: patch > Attachments: gs-bugzilla690022-reduced-rotations-cropbox.pdf, > gs-bugzilla690022-reduced-rotations.pdf, gs-bugzilla690022.pdf, > pdfbox-2.0.10-SNAPSHOT_transparency_group_all.patch, > pdfbox-2.0.10-SNAPSHOT_transparency_group_resources.zip, > pdfbox-2.0.10-SNAPSHOT_transparency_group_sources.patch > > > The rendering of the transparency groups works only if the whole page is > rendered. If you try to render only a part of the page where is a > transparency group placed, you will get only the white image or an image with > shifted pixels representing applied soft mask. The simple fix is attached in > the patch, including the test and the resources used for testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4245) wrong rendering of the transparency group at the specific position on a page
[ https://issues.apache.org/jira/browse/PDFBOX-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552140#comment-16552140 ] Petr Slaby commented on PDFBOX-4245: [~tilman]: Could you please link or attach the files which had the regression so that we could have a look? We cannot use PDFBox 2.0 in our product until this issue is resolved. > wrong rendering of the transparency group at the specific position on a page > > > Key: PDFBOX-4245 > URL: https://issues.apache.org/jira/browse/PDFBOX-4245 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.10 >Reporter: Jiri Kunhart >Assignee: Tilman Hausherr >Priority: Major > Labels: patch > Attachments: gs-bugzilla690022-reduced-rotations-cropbox.pdf, > gs-bugzilla690022-reduced-rotations.pdf, gs-bugzilla690022.pdf, > pdfbox-2.0.10-SNAPSHOT_transparency_group_all.patch, > pdfbox-2.0.10-SNAPSHOT_transparency_group_resources.zip, > pdfbox-2.0.10-SNAPSHOT_transparency_group_sources.patch > > > The rendering of the transparency groups works only if the whole page is > rendered. If you try to render only a part of the page where is a > transparency group placed, you will get only the white image or an image with > shifted pixels representing applied soft mask. The simple fix is attached in > the patch, including the test and the resources used for testing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4229) allow user to set FontProvider
[ https://issues.apache.org/jira/browse/PDFBOX-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490348#comment-16490348 ] Petr Slaby commented on PDFBOX-4229: That class is not public, so you have to copy it to your own source base to be able to set the provider (I did not try, not sure if that is even possible without copying a lot of dependencies, too)... The FontMapper topic has been discussed some time ago in PDFBOX-2539 and on the mailing list. I (still) share the opinion that the external font mapping customisation needs an improvement. In our application, we need a different font configuration for tasks running in parallel threads in an application server. This can only be achieved using a ThreadLocal variable in a custom FontMapper implementation (at least I hope it can work this way, we did not implement it yet). > allow user to set FontProvider > -- > > Key: PDFBOX-4229 > URL: https://issues.apache.org/jira/browse/PDFBOX-4229 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.9 >Reporter: Michael Brackx >Priority: Major > > Allow a user to set FontProvider without "hacking". > Currently when using pubic interfaces only a complete FontMapper needs to be > implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4112) Build and test PDFBox with JDK10
[ https://issues.apache.org/jira/browse/PDFBOX-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378663#comment-16378663 ] Petr Slaby commented on PDFBOX-4112: I get the funny message "Oh-oh, sun.java2d.cmm.kcms.KcmsServiceProvider no longer exists, so image rendering will be much slower :-(" written on console when running the PDFDebugger on java 1.7.0_72-b14. I am not sure if that was the intention. > Build and test PDFBox with JDK10 > > > Key: PDFBOX-4112 > URL: https://issues.apache.org/jira/browse/PDFBOX-4112 > Project: PDFBox > Issue Type: Task >Affects Versions: 2.0.8 >Reporter: Tilman Hausherr >Priority: Major > Labels: jdk10 > > Issue to collect problems and solutions for building and testing PDFBox with > JDK10. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly
[ https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293218#comment-16293218 ] Petr Slaby commented on PDFBOX-4038: {quote} But I doubt that the result values must have an increasing order {quote} You are right, that was my misinterpretation of the information I got. The Type1 specification says exactly this: The value associated with BlueValues is an array containing an even number of integers taken in pairs, and which follow a small number of rules: - The first integer in each pair is less than or equal to the second integer in that pair. ... But that is relatively unimportant in this context. The important information is the delta encoding of the integer array in CFF which must be taken into account by the CFFParser. Thanks. > CFF font Blue values and other delta encoded lists read incorrectly > --- > > Key: PDFBOX-4038 > URL: https://issues.apache.org/jira/browse/PDFBOX-4038 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.8 >Reporter: Petr Slaby >Assignee: Tilman Hausherr > Fix For: 2.0.9, 3.0.0 PDFBox > > Attachments: BlueValuesTest.java, CFFParser.java.patch > > > The attached test compares the values retrieved via CFFParser from an > OpenType font with the expected values as seen in FontForge (go to > Element->Font Info->PS Private). > The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans > The CFF font specification explaining the encoding of the entries which are > incorrectly parsed by FontBox CFFParser can be found here > https://typekit.files.wordpress.com/2013/05/5176.cff.pdf > We use FontBox to read the font when we need to embed it into an PDF which we > produce via our Apache FOP based software. Adobe validator complains about > incorrect "Blue values" sorting then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly
[ https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-4038: --- Attachment: BlueValuesTest.java > CFF font Blue values and other delta encoded lists read incorrectly > --- > > Key: PDFBOX-4038 > URL: https://issues.apache.org/jira/browse/PDFBOX-4038 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.8 >Reporter: Petr Slaby > Attachments: BlueValuesTest.java, CFFParser.java.patch > > > The attached test compares the values retrieved via CFFParser from an > OpenType font with the expected values as seen in FontForge (go to > Element->Font Info->PS Private). > The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans > The CFF font specification explaining the encoding of the entries which are > incorrectly parsed by FontBox CFFParser can be found here > https://typekit.files.wordpress.com/2013/05/5176.cff.pdf > We use FontBox to read the font when we need to embed it into an PDF which we > produce via our Apache FOP based software. Adobe validator complains about > incorrect "Blue values" sorting then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly
[ https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-4038: --- Attachment: (was: BlueValuesTest.java) > CFF font Blue values and other delta encoded lists read incorrectly > --- > > Key: PDFBOX-4038 > URL: https://issues.apache.org/jira/browse/PDFBOX-4038 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.8 >Reporter: Petr Slaby > Attachments: CFFParser.java.patch > > > The attached test compares the values retrieved via CFFParser from an > OpenType font with the expected values as seen in FontForge (go to > Element->Font Info->PS Private). > The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans > The CFF font specification explaining the encoding of the entries which are > incorrectly parsed by FontBox CFFParser can be found here > https://typekit.files.wordpress.com/2013/05/5176.cff.pdf > We use FontBox to read the font when we need to embed it into an PDF which we > produce via our Apache FOP based software. Adobe validator complains about > incorrect "Blue values" sorting then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly
Petr Slaby created PDFBOX-4038: -- Summary: CFF font Blue values and other delta encoded lists read incorrectly Key: PDFBOX-4038 URL: https://issues.apache.org/jira/browse/PDFBOX-4038 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.8 Reporter: Petr Slaby Attachments: BlueValuesTest.java, CFFParser.java.patch The attached test compares the values retrieved via CFFParser from an OpenType font with the expected values as seen in FontForge (go to Element->Font Info->PS Private). The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans The CFF font specification explaining the encoding of the entries which are incorrectly parsed by FontBox CFFParser can be found here https://typekit.files.wordpress.com/2013/05/5176.cff.pdf We use FontBox to read the font when we need to embed it into an PDF which we produce via our Apache FOP based software. Adobe validator complains about incorrect "Blue values" sorting then. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
[ https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172220#comment-16172220 ] Petr Slaby commented on PDFBOX-3933: Thanks, Tilman. The commit comment should rather be "don't swallow CR at the end of stream if there is *none* at the beginning" as that is what the code does (or "swallow CR at the end of stream if and only if there is one at the beginning") > PDFParser swallows a CR at the end of a stream > -- > > Key: PDFBOX-3933 > URL: https://issues.apache.org/jira/browse/PDFBOX-3933 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.13 >Reporter: Petr Slaby > Attachments: Beispiel2.pdf, EndlinePrediction2.patch, > EndlinePrediction.patch > > > I have a PDF which I cannot share at the moment, maybe later if I get a > permission from the customer. > The PDF is protected by an empty password, all streams are encrypted using > AES. The PDF consistently uses the LF character for line endings. One of the > streams looks like this: > {code} > 10 0 obj > <> > stream > <0x0D><0x0A> > endstream > {code} > i.e. Length field is a reference to an object, in the content, the length > object is stored immediately after the stream as > {code} > 9 0 obj > 2624 > endobj > {code} > The byte <0x0D> belongs to the stream and is not to be treated as line > separator in this case. The parser is not able to read the length field so it > manually searches for the stream end in the class EndstreamOutputStream. This > class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it > strips off the <0x0D> from this particular stream content. Since the stream > is encrypted, PDFBox runs into a BadPaddingException later on when trying to > decrypt the stream. > The problem is reproducible using org.apache.pdfbox.PDFToImage in current > 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably > because it uses the non-sequential parser by default. > The proposed fix is to analyze the PDF content while reading it and search > for the CR character only if it was ever encountered as a line separator > prior to the stream being parsed. > Note: I do not exactly know or understand the usage of the other classes > inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending > heuristic should be kept "as before" in these classes, by setting the new > field BaseParser.hasCR to true already in the constructor. > A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
[ https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3933: --- Attachment: EndlinePrediction2.patch I have attached a new patch which works fine both with your and my file. The line ending to search for is discovered after the "stream" keyword, with the expectation that there will be the same line ending after the "stream" keyword and after the stream content. > PDFParser swallows a CR at the end of a stream > -- > > Key: PDFBOX-3933 > URL: https://issues.apache.org/jira/browse/PDFBOX-3933 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.14 >Reporter: Petr Slaby > Attachments: Beispiel2.pdf, EndlinePrediction2.patch, > EndlinePrediction.patch > > > I have a PDF which I cannot share at the moment, maybe later if I get a > permission from the customer. > The PDF is protected by an empty password, all streams are encrypted using > AES. The PDF consistently uses the LF character for line endings. One of the > streams looks like this: > {code} > 10 0 obj > <> > stream > <0x0D><0x0A> > endstream > {code} > i.e. Length field is a reference to an object, in the content, the length > object is stored immediately after the stream as > {code} > 9 0 obj > 2624 > endobj > {code} > The byte <0x0D> belongs to the stream and is not to be treated as line > separator in this case. The parser is not able to read the length field so it > manually searches for the stream end in the class EndstreamOutputStream. This > class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it > strips off the <0x0D> from this particular stream content. Since the stream > is encrypted, PDFBox runs into a BadPaddingException later on when trying to > decrypt the stream. > The problem is reproducible using org.apache.pdfbox.PDFToImage in current > 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably > because it uses the non-sequential parser by default. > The proposed fix is to analyze the PDF content while reading it and search > for the CR character only if it was ever encountered as a line separator > prior to the stream being parsed. > Note: I do not exactly know or understand the usage of the other classes > inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending > heuristic should be kept "as before" in these classes, by setting the new > field BaseParser.hasCR to true already in the constructor. > A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
[ https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3933: --- Attachment: Beispiel2.pdf I have got the permission to share the problematic PDF, it is attached now. > PDFParser swallows a CR at the end of a stream > -- > > Key: PDFBOX-3933 > URL: https://issues.apache.org/jira/browse/PDFBOX-3933 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.14 >Reporter: Petr Slaby > Attachments: Beispiel2.pdf, EndlinePrediction.patch > > > I have a PDF which I cannot share at the moment, maybe later if I get a > permission from the customer. > The PDF is protected by an empty password, all streams are encrypted using > AES. The PDF consistently uses the LF character for line endings. One of the > streams looks like this: > {code} > 10 0 obj > <> > stream > <0x0D><0x0A> > endstream > {code} > i.e. Length field is a reference to an object, in the content, the length > object is stored immediately after the stream as > {code} > 9 0 obj > 2624 > endobj > {code} > The byte <0x0D> belongs to the stream and is not to be treated as line > separator in this case. The parser is not able to read the length field so it > manually searches for the stream end in the class EndstreamOutputStream. This > class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it > strips off the <0x0D> from this particular stream content. Since the stream > is encrypted, PDFBox runs into a BadPaddingException later on when trying to > decrypt the stream. > The problem is reproducible using org.apache.pdfbox.PDFToImage in current > 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably > because it uses the non-sequential parser by default. > The proposed fix is to analyze the PDF content while reading it and search > for the CR character only if it was ever encountered as a line separator > prior to the stream being parsed. > Note: I do not exactly know or understand the usage of the other classes > inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending > heuristic should be kept "as before" in these classes, by setting the new > field BaseParser.hasCR to true already in the constructor. > A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
[ https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171204#comment-16171204 ] Petr Slaby commented on PDFBOX-3933: Yes, it prints 17661 with the proposed change, 17660 without it. The embedded ZIP is the only thing which is using line ending 0D0A in this PDF, so the parser does not know it should search for 0D in this case. This is kind of opposite to my example where all line endings are 0A, there is a 0D0A at the end of a stream, the 0D belongs to the stream content, and only the 0A is the line ending. > PDFParser swallows a CR at the end of a stream > -- > > Key: PDFBOX-3933 > URL: https://issues.apache.org/jira/browse/PDFBOX-3933 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.14 >Reporter: Petr Slaby > Attachments: EndlinePrediction.patch > > > I have a PDF which I cannot share at the moment, maybe later if I get a > permission from the customer. > The PDF is protected by an empty password, all streams are encrypted using > AES. The PDF consistently uses the LF character for line endings. One of the > streams looks like this: > {code} > 10 0 obj > <> > stream > <0x0D><0x0A> > endstream > {code} > i.e. Length field is a reference to an object, in the content, the length > object is stored immediately after the stream as > {code} > 9 0 obj > 2624 > endobj > {code} > The byte <0x0D> belongs to the stream and is not to be treated as line > separator in this case. The parser is not able to read the length field so it > manually searches for the stream end in the class EndstreamOutputStream. This > class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it > strips off the <0x0D> from this particular stream content. Since the stream > is encrypted, PDFBox runs into a BadPaddingException later on when trying to > decrypt the stream. > The problem is reproducible using org.apache.pdfbox.PDFToImage in current > 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably > because it uses the non-sequential parser by default. > The proposed fix is to analyze the PDF content while reading it and search > for the CR character only if it was ever encountered as a line separator > prior to the stream being parsed. > Note: I do not exactly know or understand the usage of the other classes > inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending > heuristic should be kept "as before" in these classes, by setting the new > field BaseParser.hasCR to true already in the constructor. > A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
[ https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3933: --- Affects Version/s: 1.8.14 > PDFParser swallows a CR at the end of a stream > -- > > Key: PDFBOX-3933 > URL: https://issues.apache.org/jira/browse/PDFBOX-3933 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.14 >Reporter: Petr Slaby > Attachments: EndlinePrediction.patch > > > I have a PDF which I cannot share at the moment, maybe later if I get a > permission from the customer. > The PDF is protected by an empty password, all streams are encrypted using > AES. The PDF consistently uses the LF character for line endings. One of the > streams looks like this: > {code} > 10 0 obj > <> > stream > <0x0D><0x0A> > endstream > {code} > i.e. Length field is a reference to an object, in the content, the length > object is stored immediately after the stream as > {code} > 9 0 obj > 2624 > endobj > {code} > The byte <0x0D> belongs to the stream and is not to be treated as line > separator in this case. The parser is not able to read the length field so it > manually searches for the stream end in the class EndstreamOutputStream. This > class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it > strips off the <0x0D> from this particular stream content. Since the stream > is encrypted, PDFBox runs into a BadPaddingException later on when trying to > decrypt the stream. > The problem is reproducible using org.apache.pdfbox.PDFToImage in current > 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably > because it uses the non-sequential parser by default. > The proposed fix is to analyze the PDF content while reading it and search > for the CR character only if it was ever encountered as a line separator > prior to the stream being parsed. > Note: I do not exactly know or understand the usage of the other classes > inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending > heuristic should be kept "as before" in these classes, by setting the new > field BaseParser.hasCR to true already in the constructor. > A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream
Petr Slaby created PDFBOX-3933: -- Summary: PDFParser swallows a CR at the end of a stream Key: PDFBOX-3933 URL: https://issues.apache.org/jira/browse/PDFBOX-3933 Project: PDFBox Issue Type: Bug Reporter: Petr Slaby Attachments: EndlinePrediction.patch I have a PDF which I cannot share at the moment, maybe later if I get a permission from the customer. The PDF is protected by an empty password, all streams are encrypted using AES. The PDF consistently uses the LF character for line endings. One of the streams looks like this: {code} 10 0 obj <> stream <0x0D><0x0A> endstream {code} i.e. Length field is a reference to an object, in the content, the length object is stored immediately after the stream as {code} 9 0 obj 2624 endobj {code} The byte <0x0D> belongs to the stream and is not to be treated as line separator in this case. The parser is not able to read the length field so it manually searches for the stream end in the class EndstreamOutputStream. This class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it strips off the <0x0D> from this particular stream content. Since the stream is encrypted, PDFBox runs into a BadPaddingException later on when trying to decrypt the stream. The problem is reproducible using org.apache.pdfbox.PDFToImage in current 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably because it uses the non-sequential parser by default. The proposed fix is to analyze the PDF content while reading it and search for the CR character only if it was ever encountered as a line separator prior to the stream being parsed. Note: I do not exactly know or understand the usage of the other classes inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending heuristic should be kept "as before" in these classes, by setting the new field BaseParser.hasCR to true already in the constructor. A patch is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3764) 100 times performance hit on creating images
[ https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981016#comment-15981016 ] Petr Slaby commented on PDFBOX-3764: The bad news is that the -Dsun.java2d.cmm switch seems to be unofficial (at least I did not find it in any Oracle documentation) and the KCMS implementation is likely to disappear completely in one of the future Java versions. The OpenJDK issue https://bugs.openjdk.java.net/browse/JDK-8041125 is closed and the comments in it do not give me much hope that the LCMS would get faster again or that the Oracle/OpenJDK guys would even consider working on its performance. All very sad... A workaround implementation in PDFBox would be very welcome because of that. BTW, the page http://www.subshell.com/en/subshell/blog/Wrong-Colors-in-Images-with-Java8-100.html mentioned in the Getting Started Guide does not open. > 100 times performance hit on creating images > > > Key: PDFBOX-3764 > URL: https://issues.apache.org/jira/browse/PDFBOX-3764 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.6 >Reporter: Daniel Persson > Labels: image, performance > Attachments: callstack_1.png, callstack_2.png, test.pdf > > > We found that PDFBox creates a better image than poppler so we wanted to > switch out our environment to get these improvements but found a file that > took about 10 minutes to create one image with PDFBox and only about 6 > seconds with poppler. So a 100 times performance hit if we where to change. > I've done some rudimentary profiling on the code and found that most of the > time is spent in ColorConvertOp.filter. Maybe there is a leaner way to > implement this in order to get a better result? > best regards > Daniel -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3340) Image decoded twice without a real need
Petr Slaby created PDFBOX-3340: -- Summary: Image decoded twice without a real need Key: PDFBOX-3340 URL: https://issues.apache.org/jira/browse/PDFBOX-3340 Project: PDFBox Issue Type: Bug Reporter: Petr Slaby Priority: Minor Take the pdf from PDFBOX-1708, put a breakpoint into the class CCITTFaxFilter, method decode() and run PDFToImage. You will see the debugger stop twice, even if the pdf contains a single image. The second call is arrives when the image is rendered to G2D, this is OK. But for the first time, the image is decompressed in the constructor of PDImageXObject - line 147 {noformat} this(stream, resources, stream.createInputStream()); {noformat} just to allow the filter (CCITTFaxFilter in this case) to provide additional dictionary parameters in case something is missing in the input (COLORSPACE would be set to DeviceGray if missing here). I think this is a complete waste. The filter should be able to fix the dictionary without having to decode the image. As far as I can tell, this could be done by implementing a repair method on COSStream and on implementations of Filter. Also, I do not see that the stream created in the above mentioned constructor of PDImageXObject would ever be closed. This seems to be a more general issue. I have put a counter into COSInputStream.create(), there where it creates new RandomAccessInputStream(buffer). With the testfile from PDFBOX-1708, I end up with 3 unclosed streams when the program finishes. I am not sure whether this is important, but I guess the unclosed streams are uselessly occupying space in the scratch file. Sorry if this is just lack of understanding of the code from my side, but I could not resist to report what I see. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271452#comment-15271452 ] Petr Slaby commented on PDFBOX-3338: OK, here is the patch. I did not manage to make it work with the G3 1D example from PDFBOX-1708, so I left the K=0 path untouched in the end. I have successfully tested a G3 2D example, G4 byte-aligned and G4 w/o byte align. Hope your results will be good as well. > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, CCITTFaxFilter.patch, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3338: --- Attachment: CCITTFaxFilter.patch > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, CCITTFaxFilter.patch, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271232#comment-15271232 ] Petr Slaby commented on PDFBOX-3338: I see. I misunderstood your earlier comment then, sorry. I have double-checked this now, the class compiles fine with java compliance set to 1.5. It would compile with older versions, too, except for the few annotations it is using. > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271045#comment-15271045 ] Petr Slaby commented on PDFBOX-3338: {quote} > It has an Apache license, so this isn't a problem. {quote} Cool, that saves me some sorrows. {quote} I suspect that the encodedByteAlign option isn't supported one would have to implement it. See in rev 1581603 and 1581602 / PDFBOX-1074. {quote} I can try, seems to be quite straightforward at a first glance. {quote} Another problem in that code is "continue" with label. I've never seen that one before, ever. When was this added to java? {quote} It is there since ever. See e.g. some examples at https://docs.oracle.com/javase/tutorial/java/nutsandbolts/branch.html. I hope you are just exaggerating with the word "problem"? I find the code much better and more readable than the current decoder class in PDFBox. To the least, it does not need to jump hence and forth in the input and reads it byte by byte instead. Not that I would really understand what is going on in detail in either of the implementations. For that, one would have to study the standard first. > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270954#comment-15270954 ] Petr Slaby commented on PDFBOX-3338: You mean to test my solution using the Twelve Monkeys implementation? Unfortunately, the decoder class in that library is not public, so for my quick and dirty test I have just copied it with some minor tweaks to avoid copying too many classes. Then, I have used it for the K>1 path only as it was used in my PDF. I believe this is the G3 and G32D variant, depending on the value of tiffOptions. As for G4, it would not be a big deal, except that I do not see a flag for the byte align option in the Twelve Monkeys library. Not sure whether it is not supported there or whether this is just lack of knowledge on my side. Apart from that, I could probably do this. The license of Twelve Monkeys allows copying provided that the copyright notice remains in the copied file. (At least this is how I understand it, but I am not a lawyer) This is no problem for a testing patch. But I do not know whether you could use it if you decide to take the solution instead of the current decoder implementation (which originally comes from Sun ImageIO and was made freely available by Sun some years ago). > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3338) CCITT Fax decoder fails
[ https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3338: --- Attachment: 1.tiff TestCCITTFaxDecoder.java > CCITT Fax decoder fails > --- > > Key: PDFBOX-3338 > URL: https://issues.apache.org/jira/browse/PDFBOX-3338 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.8.12, 2.0.1 >Reporter: Petr Slaby > Attachments: 1.tiff, TestCCITTFaxDecoder.java > > > I have a PDF which does not render in PDFBox. It contains pages from a > scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs > into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the > same message just with "white" instead of "black"). Unfortunately, the PDF > contains sensitive data and I cannot share it. > As a test, I have replaced the TIFFFaxDecoder by the class > CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked > fine after that and PDFToImage produced the expected result. > I have extracted the first few bytes of the TIFF to show the problem without > sharing the confidential content. See the attached test program and test file. > I have tested this against latest trunk version of PDFBox, but I think the > decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-3338) CCITT Fax decoder fails
Petr Slaby created PDFBOX-3338: -- Summary: CCITT Fax decoder fails Key: PDFBOX-3338 URL: https://issues.apache.org/jira/browse/PDFBOX-3338 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.1, 1.8.12 Reporter: Petr Slaby I have a PDF which does not render in PDFBox. It contains pages from a scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs into IOException("TIFFFaxDecoder: EOL encountered in black run.") (or the same message just with "white" instead of "black"). Unfortunately, the PDF contains sensitive data and I cannot share it. As a test, I have replaced the TIFFFaxDecoder by the class CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked fine after that and PDFToImage produced the expected result. I have extracted the first few bytes of the TIFF to show the problem without sharing the confidential content. See the attached test program and test file. I have tested this against latest trunk version of PDFBox, but I think the decoder implementation is basically the same in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-3191) PDFDebugger does not handle cancelling of "Open URL" dialog
[ https://issues.apache.org/jira/browse/PDFBOX-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-3191: --- Summary: PDFDebugger does not handle cancelling of "Open URL" dialog (was: PDFDebugger does not handle cancelling of the "Open URL") > PDFDebugger does not handle cancelling of "Open URL" dialog > --- > > Key: PDFBOX-3191 > URL: https://issues.apache.org/jira/browse/PDFBOX-3191 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Petr Slaby >Priority: Trivial > > In PDFDebugger, click the menu item "Open URL..." and then cancel the dialog. > A MalformedURLException caused by a NPE is thrown. After that, it is not > possible to open any other file nor to close the application, since both > throws a NPE in the code updating the list of last recently used files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3046) Specific PDF prints really (REALLY) slow
[ https://issues.apache.org/jira/browse/PDFBOX-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969796#comment-14969796 ] Petr Slaby commented on PDFBOX-3046: I have seen this effect in our software using PDFBox a few years ago and I believe this PDF has the same problem. The answer lies in the implementation of sun.print.RasterPrintJob. It lets you render the page first into a dummy PeekGraphics which just searches for the types of graphical objects you want to use. Next, for each transparent bitmap it finds, it creates a BufferedImage having the size and position of that bitmap and calls the Printable.print() method again passing the BufferedImage g2d surface to it, to "flatten" all layers into the bitmap. Put a breakpoint into PDFPrintable.print() and, with this specific PDF, you will see that you arrive into it million times, always printing the same page 1 again and again. The one I was analyzing when I have seen the problem for the first time, was using a bitmap font, each character being a single small transparent bitmap. I believe the one attached here will be the same. The only remedy I found is to render the whole page into a bitmap first and only then pass it to the java printing API. Since printing huge bitmaps is not desired in general, I count the transparent bitmaps in the page to be printed first and resort to the full page bitmap printing only if there are "many". > Specific PDF prints really (REALLY) slow > > > Key: PDFBOX-3046 > URL: https://issues.apache.org/jira/browse/PDFBOX-3046 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: Windows 10 >Reporter: Teon Metselaar > Attachments: mspubcol.pdf, mspubcol.prn > > > On Windows 10 I have printed a test page using the MS Publisher Color Printer > (which outputs a Postscript-file) and converted that file to PDF using > GhostScript ps2pdf. > The resulting single-page PDF file is printed really, really slow (180-190 > seconds) while other documents (even generated using ps2pdf) print a lot > faster (some seconds). > I can't figure out why this is. I guess it has someting to do with the used > font, but other PDF printing libraries (jPedal, jPDFPrint) are able print the > same documents in a couple of seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-3000) Transparency Group issues
[ https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940964#comment-14940964 ] Petr Slaby commented on PDFBOX-3000: Being the original author of the transparency groups contribution in PDFBOX-2104, I have tried to look into this again. The good news is that the file attached to PDFBOX-2104 renders fine with the patch from John. It might be correct for the first time ever, the image shadow on the first page is missing in all the 2.0 reference renderings I have in my repository since December 2014. Unfortunately, I have no older renderings, so I cannot tell whether we got it wrong already at the beginning or whether it got broken later. We have this working correctly in our source code based on PDFBox 1.7, but it seems to be too hard for me to figure out what exactly needs to be done to successfully port our implementation to PDFBox 2.0. The only thing I do not like about John's patch is that it creates a full page bitmap to render the transparency group. I have tried to bring back the original idea of creating a bitmap according to the intersection of the bleeding box of the group and of the current clip path. After some trials, I get the same results as John on my test documents using the attached patch. Maybe someone with more insight into all the transforms can use it as a starting point to get this right. > Transparency Group issues > - > > Key: PDFBOX-3000 > URL: https://issues.apache.org/jira/browse/PDFBOX-3000 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.0 >Reporter: John Hewson > Fix For: 2.1.0 > > Attachments: softmask-rewrite.patch > > > This is a follow-up issue for transparency group issues from PDFBOX-2423. > More details to come. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-2985) Potential NPE in PDMarkedContent#getMCID()
Petr Slaby created PDFBOX-2985: -- Summary: Potential NPE in PDMarkedContent#getMCID() Key: PDFBOX-2985 URL: https://issues.apache.org/jira/browse/PDFBOX-2985 Project: PDFBox Issue Type: Bug Reporter: Petr Slaby I do not have a test case, but this method in PDMarkedContent is obviously wrong: {noformat} public int getMCID() { return this.getProperties() == null ? null : this.getProperties().getInt(COSName.MCID); } {noformat} if getProperties() is null, the method tries to convert null Integer value to an int. I believe the intention was rather: {noformat} ... return this.getProperties() == null ? 0 : ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2971) CalGray white rendered as cyan
[ https://issues.apache.org/jira/browse/PDFBOX-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2971: --- Attachment: ExternesDokument_modif1.jpg > CalGray white rendered as cyan > -- > > Key: PDFBOX-2971 > URL: https://issues.apache.org/jira/browse/PDFBOX-2971 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Petr Slaby >Priority: Minor > Attachments: ExternesDokument_modif.pdf, ExternesDokument_modif1.jpg > > > The attached PDF uses CalGray colors. When converted to a jpeg using > PdfToImage, there is a cyan rectangle visible. Acrobat shows the same > rectangle as white. > The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value > after applying gamma is 1.0, i.e. white was intended. The class PDCalGray > multiplies the value by the white point to get X, Y, Z and sends it to the > java built-in CIEXYZ profile to convert it into sRGB. I believe the problem > is that the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we > need to adapt the white point before sending the values to it. There are > several methods to do that, but the easiest one is a simple scaling. In our > case it would meant to multiply the color value by the CIEXYZ white point > instead of the white point given in the CalGray. > I would not like to pretend that I am an expert in this area. I found the > information in the internet and in the java sources of ColorSpace and > ICC_ColorSpace and this is how I interpret it. An insight of someone who > really understands the color management stuff would be appreciated. But my > main point is that the result looks different compared to what is shown in > Acrobat. > The PDF originally comes from a customer and contains text above the > rectangles. I have removed the texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2971) CalGray white rendered as cyan
[ https://issues.apache.org/jira/browse/PDFBOX-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2971: --- Attachment: ExternesDokument_modif.pdf > CalGray white rendered as cyan > -- > > Key: PDFBOX-2971 > URL: https://issues.apache.org/jira/browse/PDFBOX-2971 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Petr Slaby >Priority: Minor > Attachments: ExternesDokument_modif.pdf > > > The attached PDF uses CalGray colors. When converted to a jpeg using > PdfToImage, there is a cyan rectangle visible. Acrobat shows the same > rectangle as white. > The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value > after applying gamma is 1.0, i.e. white was intended. The class PDCalGray > multiplies the value by the white point to get X, Y, Z and sends it to the > java built-in CIEXYZ profile to convert it into sRGB. I believe the problem > is that the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we > need to adapt the white point before sending the values to it. There are > several methods to do that, but the easiest one is a simple scaling. In our > case it would meant to multiply the color value by the CIEXYZ white point > instead of the white point given in the CalGray. > I would not like to pretend that I am an expert in this area. I found the > information in the internet and in the java sources of ColorSpace and > ICC_ColorSpace and this is how I interpret it. An insight of someone who > really understands the color management stuff would be appreciated. But my > main point is that the result looks different compared to what is shown in > Acrobat. > The PDF originally comes from a customer and contains text above the > rectangles. I have removed the texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-2971) CalGray white rendered as cyan
Petr Slaby created PDFBOX-2971: -- Summary: CalGray white rendered as cyan Key: PDFBOX-2971 URL: https://issues.apache.org/jira/browse/PDFBOX-2971 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor The attached PDF uses CalGray colors. When converted to a jpeg using PdfToImage, there is a cyan rectangle visible. Acrobat shows the same rectangle as white. The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value after applying gamma is 1.0, i.e. white was intended. The class PDCalGray multiplies the value by the white point to get X, Y, Z and sends it to the java built-in CIEXYZ profile to convert it into sRGB. I believe the problem is that the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we need to adapt the white point before sending the values to it. There are several methods to do that, but the easiest one is a simple scaling. In our case it would meant to multiply the color value by the CIEXYZ white point instead of the white point given in the CalGray. I would not like to pretend that I am an expert in this area. I found the information in the internet and in the java sources of ColorSpace and ICC_ColorSpace and this is how I interpret it. An insight of someone who really understands the color management stuff would be appreciated. But my main point is that the result looks different compared to what is shown in Acrobat. The PDF originally comes from a customer and contains text above the rectangles. I have removed the texts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2905) Replace PDFReader with PDFDebugger
[ https://issues.apache.org/jira/browse/PDFBOX-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655265#comment-14655265 ] Petr Slaby commented on PDFBOX-2905: Personally, I would never consider the public classes in the tools project to be a public and stable API of any sort. For me, an API of a tool is its main class and the parameters supported by the method main() in that class. A simple note in documentation making this clear would be enough for me. Or is there something in the tools project that is designed to be used in a different way than just calling its main()? Replace PDFReader with PDFDebugger -- Key: PDFBOX-2905 URL: https://issues.apache.org/jira/browse/PDFBOX-2905 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Priority: Minor Attachments: 007087-payment-due.pdf As discussed on the mailing list: {quote} Here's an idea: if we switch PDFDebugger to using View Pages by default, it will no longer be confusing for casual users. I've found myself using this mode most of the time anyway. We can add page up/down too, of course - preferably using the actual Page Up and Page Down keys rather than the bizarre choice of the +/- keys which are currently used in PDFReader. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class
[ https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492071#comment-14492071 ] Petr Slaby commented on PDFBOX-2692: Yes, that should do nicely. Thanks. Possibility to use our own and/or overwrite PageDrawer class Key: PDFBOX-2692 URL: https://issues.apache.org/jira/browse/PDFBOX-2692 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk Reporter: Manfred Pock Assignee: Andreas Lehmkühler Labels: features Fix For: 2.0.0 Attachments: pdfexample.jpg We use PDFBox to render PDF's. Additionally, we have the posibility to add different kinds of annotation (stamp, marks, free text, notes..) like in a wysiwyg-editor. To do this, it is necessary that we paint these annotations on our own. Another reason is not to paint all parts: for example we have a pdf with an embedded picture. Behind the picture we have the OCR-text to this picture. This text is only needed for searching und should not be painted. Thus it would be useful to use our own derived PageDrawer. As I see there are some things to change. a.) remove the final from PagerDrawer-class. b.) make some global-variables (graphics, xform, pageSize...) protected, c.) also some methods like setRenderingHints should be protected d.) maybe the possibility to say to the PDFRender which PageDrawer should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class
[ https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485173#comment-14485173 ] Petr Slaby commented on PDFBOX-2692: How would you know that the color in setColor() belongs to the element that is supposed to be green? As far as I understand the description from Daniel Wilson, the application is not really change all red colors to green, but change some of the elements on page to be green. Possibility to use our own and/or overwrite PageDrawer class Key: PDFBOX-2692 URL: https://issues.apache.org/jira/browse/PDFBOX-2692 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk Reporter: Manfred Pock Assignee: Andreas Lehmkühler Labels: features Fix For: 2.0.0 Attachments: pdfexample.jpg We use PDFBox to render PDF's. Additionally, we have the posibility to add different kinds of annotation (stamp, marks, free text, notes..) like in a wysiwyg-editor. To do this, it is necessary that we paint these annotations on our own. Another reason is not to paint all parts: for example we have a pdf with an embedded picture. Behind the picture we have the OCR-text to this picture. This text is only needed for searching und should not be painted. Thus it would be useful to use our own derived PageDrawer. As I see there are some things to change. a.) remove the final from PagerDrawer-class. b.) make some global-variables (graphics, xform, pageSize...) protected, c.) also some methods like setRenderingHints should be protected d.) maybe the possibility to say to the PDFRender which PageDrawer should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class
[ https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485169#comment-14485169 ] Petr Slaby commented on PDFBOX-2692: Basically, that means I have to re-implement the PageDrawer by myself, or? I need all of its functionality, including transparency groups and all the other logic it contains. I just need to intervene in showFontGlyph, or even drawGlyph2D to tell to the target renderer draw a character instead of fill path - if the renderer is capable of handling fonts. So in the end, I would copy/paste the whole PageDrawer, make the copy non-final, inherit from it and override one or two methods. I am fine with that - after all, we have copied the whole PDFBox source in 1.8.x. But at the moment, not even that is possible as TilingPattern and Glyph2D and its implementations are not public. Meaning I would have to copy/paste even more classes. Our target format renderers already have a Graphics2D implementation, passing it to PageDrawer.drawPage() is a perfect fit. I just need something corresponding to Graphics2D.drawGlyphVector() to be called instead of graphics.fill() when rendering a character. E.g. declare a special public interface, having methods like drawGlyph and fillGlyph with the parameters being PDFont, Glyph2D (or the GeneralPath it produces, but without the transformation being applied), character code and the transformation. The methods would be called instead of graphics.fill() or graphics.draw() in drawGlyph2D() if the graphics instance implements the interface. Passing Glyph2D instead of GeneralPath should be faster as my renderers only need the GeneralPath once for each character to create it in the on-the-fly font. Possibility to use our own and/or overwrite PageDrawer class Key: PDFBOX-2692 URL: https://issues.apache.org/jira/browse/PDFBOX-2692 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk Reporter: Manfred Pock Assignee: Andreas Lehmkühler Labels: features Fix For: 2.0.0 Attachments: pdfexample.jpg We use PDFBox to render PDF's. Additionally, we have the posibility to add different kinds of annotation (stamp, marks, free text, notes..) like in a wysiwyg-editor. To do this, it is necessary that we paint these annotations on our own. Another reason is not to paint all parts: for example we have a pdf with an embedded picture. Behind the picture we have the OCR-text to this picture. This text is only needed for searching und should not be painted. Thus it would be useful to use our own derived PageDrawer. As I see there are some things to change. a.) remove the final from PagerDrawer-class. b.) make some global-variables (graphics, xform, pageSize...) protected, c.) also some methods like setRenderingHints should be protected d.) maybe the possibility to say to the PDFRender which PageDrawer should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class
[ https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486071#comment-14486071 ] Petr Slaby commented on PDFBOX-2692: Yep, we fall back to a bitmaps if we are not able to express something in the target format. In the worst case scenario, we produce a bitmap for the whole page. But for a simple PDF containing just text, we want to produce native AFP or PCL code using fonts rather than bitmaps or vectors. Possibility to use our own and/or overwrite PageDrawer class Key: PDFBOX-2692 URL: https://issues.apache.org/jira/browse/PDFBOX-2692 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk Reporter: Manfred Pock Assignee: Andreas Lehmkühler Labels: features Fix For: 2.0.0 Attachments: pdfexample.jpg We use PDFBox to render PDF's. Additionally, we have the posibility to add different kinds of annotation (stamp, marks, free text, notes..) like in a wysiwyg-editor. To do this, it is necessary that we paint these annotations on our own. Another reason is not to paint all parts: for example we have a pdf with an embedded picture. Behind the picture we have the OCR-text to this picture. This text is only needed for searching und should not be painted. Thus it would be useful to use our own derived PageDrawer. As I see there are some things to change. a.) remove the final from PagerDrawer-class. b.) make some global-variables (graphics, xform, pageSize...) protected, c.) also some methods like setRenderingHints should be protected d.) maybe the possibility to say to the PDFRender which PageDrawer should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class
[ https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375894#comment-14375894 ] Petr Slaby commented on PDFBOX-2692: +1 In our application, we use PDFBox to render PDFs to AFP, PCL and PostScript. When rendering text, the target format renderer needs to know the text and its font to be able to use text operations and fonts in the respective target format language (there is a configurable font mapping to use pre-prepared fonts or a possibility to generate fonts on the fly). In our clone of PDFBox 1.8.x, we did that by getting the font information from GlyphVector in g2d.drawGlyphVector(). In PDFBox 2.0, text is rendered as Shapes, so the underlying G2D implementation has not even a chance to know that a text is being rendered. With the possibility to override PageDrawer, I could intercept showFontGlyph to tell the G2D implementation that the next fill() or draw() is in fact drawing a letter in a given font. Possibility to use our own and/or overwrite PageDrawer class Key: PDFBOX-2692 URL: https://issues.apache.org/jira/browse/PDFBOX-2692 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk Reporter: Manfred Pock Assignee: Andreas Lehmkühler Labels: features Fix For: 2.0.0 Attachments: pdfexample.jpg We use PDFBox to render PDF's. Additionally, we have the posibility to add different kinds of annotation (stamp, marks, free text, notes..) like in a wysiwyg-editor. To do this, it is necessary that we paint these annotations on our own. Another reason is not to paint all parts: for example we have a pdf with an embedded picture. Behind the picture we have the OCR-text to this picture. This text is only needed for searching und should not be painted. Thus it would be useful to use our own derived PageDrawer. As I see there are some things to change. a.) remove the final from PagerDrawer-class. b.) make some global-variables (graphics, xform, pageSize...) protected, c.) also some methods like setRenderingHints should be protected d.) maybe the possibility to say to the PDFRender which PageDrawer should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-2727) Cache color space instances
Petr Slaby created PDFBOX-2727: -- Summary: Cache color space instances Key: PDFBOX-2727 URL: https://issues.apache.org/jira/browse/PDFBOX-2727 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby I have a PDF from a customer which contains a lot of calls of SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded color profile resource is loaded via ICC_Profile.getInstance(InputStream). I have attempted to cache the result in PDResources.java as shown in the attached patch. For this particular PDF, this change improves the performance of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I cannot share the customer PDF, so I have attempted to find a similar free one. Unfortunately, in my test suite, I did not find anything with a comparable improvement. The best example I found is in the attached PDF. There the improvement is from 4.9 seconds without caching to 4.1 with caching. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2727) Cache color space instances
[ https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2727: --- Attachment: 000435.pdf PDResources.java.patch Cache color space instances --- Key: PDFBOX-2727 URL: https://issues.apache.org/jira/browse/PDFBOX-2727 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: 000435.pdf, PDResources.java.patch I have a PDF from a customer which contains a lot of calls of SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded color profile resource is loaded via ICC_Profile.getInstance(InputStream). I have attempted to cache the result in PDResources.java as shown in the attached patch. For this particular PDF, this change improves the performance of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I cannot share the customer PDF, so I have attempted to find a similar free one. Unfortunately, in my test suite, I did not find anything with a comparable improvement. The best example I found is in the attached PDF. There the improvement is from 4.9 seconds without caching to 4.1 with caching. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-2727) Cache color space instances
[ https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2727: --- Priority: Minor (was: Major) Cache color space instances --- Key: PDFBOX-2727 URL: https://issues.apache.org/jira/browse/PDFBOX-2727 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 000435.pdf, PDResources.java.patch I have a PDF from a customer which contains a lot of calls of SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded color profile resource is loaded via ICC_Profile.getInstance(InputStream). I have attempted to cache the result in PDResources.java as shown in the attached patch. For this particular PDF, this change improves the performance of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I cannot share the customer PDF, so I have attempted to find a similar free one. Unfortunately, in my test suite, I did not find anything with a comparable improvement. The best example I found is in the attached PDF. There the improvement is from 4.9 seconds without caching to 4.1 with caching. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2727) Cache color space instances
[ https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376818#comment-14376818 ] Petr Slaby commented on PDFBOX-2727: Note: In the PDResources constructor, I have noticed a todo comment stating that PDResources should be instantiated and cached on a per COSDictionary base, indicating that a proper caching solution might be more than my simple patch. Indeed, the cached color space instances should rather be bound to COSDictionary than to PDResources as multiple PDResources instances are created for a single COSDictionary. Also, I have tried to cache also fonts created from font resources in the same way, but without any noticeable performance gain in my test suite. Cache color space instances --- Key: PDFBOX-2727 URL: https://issues.apache.org/jira/browse/PDFBOX-2727 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 000435.pdf, PDResources.java.patch I have a PDF from a customer which contains a lot of calls of SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded color profile resource is loaded via ICC_Profile.getInstance(InputStream). I have attempted to cache the result in PDResources.java as shown in the attached patch. For this particular PDF, this change improves the performance of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I cannot share the customer PDF, so I have attempted to find a similar free one. Unfortunately, in my test suite, I did not find anything with a comparable improvement. The best example I found is in the attached PDF. There the improvement is from 4.9 seconds without caching to 4.1 with caching. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2576) Improve code quality
[ https://issues.apache.org/jira/browse/PDFBOX-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376789#comment-14376789 ] Petr Slaby commented on PDFBOX-2576: [~tilman]: Is the static modifier in COSNull.writePDF() intended? It produces a warning in COSWriter.visitFromNull() - static methods should be accessed in a static way. All the other COS objects have a non-static writePDF(), so I assume this one should not be static either? Improve code quality Key: PDFBOX-2576 URL: https://issues.apache.org/jira/browse/PDFBOX-2576 Project: PDFBox Issue Type: Task Affects Versions: 2.0.0 Reporter: Tilman Hausherr Attachments: GraphicsOperatorProcessor.patch, SecuryHandlerFactory.patch, org.apache.fontbox.afm.patch, org.apache.fontbox.cff.cffparser.patch, org.apache.fontbox.cff.patch, org.apache.fontbox.cmap.patch, org.apache.pdfbox.contentstream.operator.state.patch, org.apache.pdfbox.cos.patch, org.apache.pdfbox.filter-2.patch, org.apache.pdfbox.filter.patch, org.apache.pdfbox.pdmodel.documentinterchange.logicalstructure.patch, org.apache.pdfbox.pdmodel.documentinterchange.patch, org.apache.pdfbox.preflight.graphic.patch, pdfbox-override-patch.txt, pdfbox-raw-type-patch.txt, pdfcloneutility-patch.txt, pdftextstripperbyarea-patch.txt, ttfsubsetter-2.patch, ttfsubsetter-3.patch, ttfsubsetter-patch.txt This is a longterm issue for the task to improve code quality, by using the [SonarQube report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor], hints in different IDEs, the FindBugs tool and other code quality tools. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2539) [PATCH] Allow non static FontProvider
[ https://issues.apache.org/jira/browse/PDFBOX-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234168#comment-14234168 ] Petr Slaby commented on PDFBOX-2539: +1 [PATCH] Allow non static FontProvider - Key: PDFBOX-2539 URL: https://issues.apache.org/jira/browse/PDFBOX-2539 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.0 Reporter: simon steiner Attachments: fontProvider.patch I would like to use multiple instances of fontprovider in thread safe way -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts
[ https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128445#comment-14128445 ] Petr Slaby commented on PDFBOX-2262: [~jahewson]: I assume the semicolon at the end of line 133 of CMap.java is not intended? {noformat} if (range.isPartialMatch(bytes.get(i), i)); {noformat} Remove usage of AWT fonts - Key: PDFBOX-2262 URL: https://issues.apache.org/jira/browse/PDFBOX-2262 Project: PDFBox Issue Type: Improvement Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 Attachments: Basiswissen-Vorschriften.pdf, Basiswissen-Vorschriften.pdf-1.png, Basiswissen-Vorschriften.pdf-1.png-diff.png, Basiswissen-Vorschriften.pdf-9.png, Basiswissen-Vorschriften.pdf-9.png-diff.png, ELVIA-Reiserucktritt-Vollschutz.pdf-1.png, FreeSansTest.pdf, PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png-diff.png, bugzilla867751.pdf-2.png, bugzilla867751.pdf-2.png-diff.png, bugzilla886049.pdf, bugzilla886049.pdf-1.png, test_1fd9a_test.pdf We're still using AWT fonts to render the standard 14 built-in fonts, which causes rendering problems and encoding issues (see PDFBOX-2140). We're also using AWT for some fallback fonts. Removal of these AWT fonts isn't too difficult, we need to load the fonts using the existing PDFFontManager mechanism which has recently been added. All missing TrueType fonts loaded from disk have been using SystemFontManager for a number of weeks now. We should ship some sensible default fonts with PDFBox, such as the Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't find anything suitable, rather than falling back to the default TTF font, but by default we'll probe the system for suitable fonts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts
[ https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128539#comment-14128539 ] Petr Slaby commented on PDFBOX-2262: [~tilman]: Not really, or not easily. Given the amount of changes in pdfbox and the pile of other work I have, I gave up updating my pdfbox test suite in the last few months. I have just noticed the semicolon because of a warning I can see in Eclipse on that line (empty control flow statement). The condition should either not be there at all as it does nothing, or the semicolon should be removed. For the moment, I suggest to wait for John's opinion, rather than spending time in running test suites. I think he will know what the code is supposed to do. Remove usage of AWT fonts - Key: PDFBOX-2262 URL: https://issues.apache.org/jira/browse/PDFBOX-2262 Project: PDFBox Issue Type: Improvement Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 Attachments: Basiswissen-Vorschriften.pdf, Basiswissen-Vorschriften.pdf-1.png, Basiswissen-Vorschriften.pdf-1.png-diff.png, Basiswissen-Vorschriften.pdf-9.png, Basiswissen-Vorschriften.pdf-9.png-diff.png, ELVIA-Reiserucktritt-Vollschutz.pdf-1.png, FreeSansTest.pdf, PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png-diff.png, bugzilla867751.pdf-2.png, bugzilla867751.pdf-2.png-diff.png, bugzilla886049.pdf, bugzilla886049.pdf-1.png, test_1fd9a_test.pdf We're still using AWT fonts to render the standard 14 built-in fonts, which causes rendering problems and encoding issues (see PDFBOX-2140). We're also using AWT for some fallback fonts. Removal of these AWT fonts isn't too difficult, we need to load the fonts using the existing PDFFontManager mechanism which has recently been added. All missing TrueType fonts loaded from disk have been using SystemFontManager for a number of weeks now. We should ship some sensible default fonts with PDFBox, such as the Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't find anything suitable, rather than falling back to the default TTF font, but by default we'll probe the system for suitable fonts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112945#comment-14112945 ] Petr Slaby commented on PDFBOX-2144: {quote} I think you'll just have to make sure that you don't change the configuration while pages are rendering. {quote} That is not possible. The application is big, pdf rendering just a small part of it. I cannot change the whole application for the sake of it. As mentioned before, the application is designed to accept a new config at any time. The new config is supposed to be valid for jobs started after its activation, while the previously started and yet-not-finished jobs have to continue using the old one. {quote} it'll play nice with PDFBox's internal static state {quote} You scare me. What is static and where? I believed that state is bound to instances of PageDrawer or PDGraphicsState and the like. I do not have enough insight to really understand all your reasons, but... if FileSystemFontProvider is implemented as a singleton then it will do the same as it is doing now when being called from a static method of ExternalFonts. Just replace your static methods by a factory and bind the instance to somewhere (I thought PageDrawer or PDFStreamEngine is the right place) - and we are on the same boat :-) However, as mentioned before, I think I will be able to bind my font configurations to thread instances so that this is not such a big issue for me and your current solution should be fine. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Assignee: John Hewson Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101938#comment-14101938 ] Petr Slaby commented on PDFBOX-2144: {quote} Regarding the static configuration, what aspects of the configuration were you expecting to change while PDFs are being processed? Are you talking about using a specific FontProvider for a given PDF? If so why? This is certainly something we could think about if I can get my head around the use case. {quote} Our application runs in an application server, many things can happen in parallel there. Our configuration is stored in a database and can be changed while the application is running. When changing the configuration, the application might be in a middle of a rendering (or even in a middle of many renderings). It is expected that the already running renderings finish the job with the old configuration, while anything that has been started after the commit of a new configuration uses the new one. The configuration contains many settings, among others fonts to be used to render PDFs via PDFBox. I have to be able to change the fonts available to FontProvider at runtime and in a way that keeps the original configuration untouched for renderings that have already been started. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100421#comment-14100421 ] Petr Slaby commented on PDFBOX-2144: [~jahewson]: Thanks, that's basically exactly what I need. Just two questions. In the old solution, AWT fonts were used to provide all missing fonts. The new FontProvider interface has separate methods for substitution of ttf, cff and type 1 fonts. Is it so that a PDF references an external type 1 font and I have to provide a type 1 font then? Or is this used just when creating PDFs and the normal way of getting an external font for rendering is ExternalFonts.getType1EquivalentFont() which can return any of the flavors? Also, I do not like the static methods in ExternalFonts so much. In our environment, the configuration can change while the application is running. In such case, renderings which have already been started have to use the old configuration, renderings which will start later should use the new one. For that, I would need the ExternalFonts to have an instance bound to PageDrawer. It is a minor problem, though. I can probably solve it by binding the active font configuration to current thread while rendering. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2255) Text not rendered bold
[ https://issues.apache.org/jira/browse/PDFBOX-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085343#comment-14085343 ] Petr Slaby commented on PDFBOX-2255: I think bold is produced using text rendering mode fill + strike. As far as I can tell, the file renders fine with the patch I have proposed in PDFBOX-678 Text not rendered bold --- Key: PDFBOX-2255 URL: https://issues.apache.org/jira/browse/PDFBOX-2255 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner File from PDFBOX-265 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage PDFBOX265-problem.pdf -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063320#comment-14063320 ] Petr Slaby commented on PDFBOX-2210: We have a similar problem. In our application, we produce (among others) PCL and AFP output. For PDFBox, we have a PCL and AFP specific implementation of Graphics2D which produces the commands in the respective printer language. In the old solution, fillGlyphVector or drawGlyphVector was called for printing characters using AWT fonts. From the glyph vector, we were able to get at the AWT font and the character(s) being printed. From that, we were able to pick an existing PCL or AFP font if an equivalent for the AWT font was configured, or produce an on-the-fly font and embed it into the output. With the current solution, we just get a shape and do not even know that it is coming from rendering of text. I did not try to solve this yet, but I think I will probably need PageDrawer.drawGlyph2D() to become part of the API (make it protected instead of private) so that I can intercept it and call something else on our special G2D implementation. When producing on-the-fly fonts, we need some font metrics information - like ascend, descent and width of each character, etc. For that, I would need to put some more information into Glyph2D, e.g. have a reference to the underlying PDFont. [PATCH] Allow caching of glyphs --- Key: PDFBOX-2210 URL: https://issues.apache.org/jira/browse/PDFBOX-2210 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: John Hewson Attachments: drawglyphs.patch If you seperate transform from glyph it means we can reuse glyphs in fop postscript output and get smaller output files -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow
[ https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058549#comment-14058549 ] Petr Slaby commented on PDFBOX-2117: Just a hint or question. At the end of getRaster(), the cached values are always unnormalized by e.g. (int) (values[0] * 255). Why not cache the unnormalized values right away, then? You could put the three values into a single int to reduce memory consumption and to avoid the c.clone() in ColorRGB. But maybe I missed something, I did not really try to change the code this way. As for the comparison of the three methods of implementing AxialShadingContext, the scan line precomputation is the fastest of course, especially as it only counts the values at positions rounded to an int. I have run the test again on three files that use the axial shading and measured total time spent in the constructor and getRaster(). The times are in milliseconds. ||File||Trunk||Shaola||My patch|| |shading_pattern.pdf|67055|557|1534| |color_gradient.pdf|72622|1002|2461| |missing_image.pdf|34897|376|29672| AxialShadingContext is slow --- Key: PDFBOX-2117 URL: https://issues.apache.org/jira/browse/PDFBOX-2117 Project: PDFBox Issue Type: Sub-task Components: Rendering Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, AxialShading1.patch, AxialShadingContext.java.getrgbimage, GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf AxialShadingContext#getRaster() is on top of profiler hot spots in documents that use an axial shading. Inside it, the slowest part is calling PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow
[ https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058731#comment-14058731 ] Petr Slaby commented on PDFBOX-2117: Shaola, another idea in the similar direction - what about using a simple array instead of the HashMap? As far as I can tell, you have an array of values from zero to n. Why not put them into an array and use the array index instead of HashMap key? AxialShadingContext is slow --- Key: PDFBOX-2117 URL: https://issues.apache.org/jira/browse/PDFBOX-2117 Project: PDFBox Issue Type: Sub-task Components: Rendering Reporter: Petr Slaby Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, AxialShading1.patch, AxialShadingContext.java.getrgbimage, GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf AxialShadingContext#getRaster() is on top of profiler hot spots in documents that use an axial shading. Inside it, the slowest part is calling PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-678) Support missing Text Rendering Modes when rendering a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052274#comment-14052274 ] Petr Slaby commented on PDFBOX-678: --- Implementing this seems to be fairly easy in current trunk (with the exception of Type3 fonts), see the attached patch. Why not do it? Support missing Text Rendering Modes when rendering a PDF - Key: PDFBOX-678 URL: https://issues.apache.org/jira/browse/PDFBOX-678 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Maruan Sahyoun Attachments: Java Printing.pdf, TextRenderingModes.java.patch Of the 7 different Text Rendering Modes only mode 0 (Fill Text) is correctly implemented. Mode 1 (Stroke Text) falls back to Mode 0 and the others are not implemented. I'm looking to implement the missing modes (at least some of them). Before doing so I'm proposing a structural change to when rendering really occurs. Currently it's done within the PDxxxFont classes. I'd rather implement the (AWT) text output in PageDrawer (or helper classes within the same package) and use the font classes to return an AWT font by adding a getAwtFont method. Doing so we get a better separation between the PDF related stuff (PDxxx) and applications like PageDrawer. The current rendering specific code within the PDxxxFont classes can be retained for compatibility and marked deprecated at a later stage. WDYT? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-678) Support missing Text Rendering Modes when rendering a PDF
[ https://issues.apache.org/jira/browse/PDFBOX-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-678: -- Attachment: TextRenderingModes.java.patch Support missing Text Rendering Modes when rendering a PDF - Key: PDFBOX-678 URL: https://issues.apache.org/jira/browse/PDFBOX-678 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Maruan Sahyoun Attachments: Java Printing.pdf, TextRenderingModes.java.patch Of the 7 different Text Rendering Modes only mode 0 (Fill Text) is correctly implemented. Mode 1 (Stroke Text) falls back to Mode 0 and the others are not implemented. I'm looking to implement the missing modes (at least some of them). Before doing so I'm proposing a structural change to when rendering really occurs. Currently it's done within the PDxxxFont classes. I'd rather implement the (AWT) text output in PageDrawer (or helper classes within the same package) and use the font classes to return an AWT font by adding a getAwtFont method. Doing so we get a better separation between the PDF related stuff (PDxxx) and applications like PageDrawer. The current rendering specific code within the PDxxxFont classes can be retained for compatibility and marked deprecated at a later stage. WDYT? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2185) Rotation and skew not applied on rectangles
Petr Slaby created PDFBOX-2185: -- Summary: Rotation and skew not applied on rectangles Key: PDFBOX-2185 URL: https://issues.apache.org/jira/browse/PDFBOX-2185 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby When rendering the attached example, rotation and skew of rectangles is not applied properly. The reason is that the AppendRectangleToPath transform only start and end point and makes a non-rotated non-skewed result out of that. Instead, each corner of the rectangle has to be transformed separately as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2185) Rotation and skew not applied on rectangles
[ https://issues.apache.org/jira/browse/PDFBOX-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2185: --- Attachment: AppendRectangleToPath.java.patch example_013.pdf Rotation and skew not applied on rectangles --- Key: PDFBOX-2185 URL: https://issues.apache.org/jira/browse/PDFBOX-2185 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: AppendRectangleToPath.java.patch, example_013.pdf When rendering the attached example, rotation and skew of rectangles is not applied properly. The reason is that the AppendRectangleToPath transform only start and end point and makes a non-rotated non-skewed result out of that. Instead, each corner of the rectangle has to be transformed separately as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1997) CIE LAB item missing in rendering
[ https://issues.apache.org/jira/browse/PDFBOX-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051903#comment-14051903 ] Petr Slaby commented on PDFBOX-1997: Works fine for me, thanks. But I found just a single file using LAB color space, so that is no proof :-) CIE LAB item missing in rendering - Key: PDFBOX-1997 URL: https://issues.apache.org/jira/browse/PDFBOX-1997 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: regression Attachments: text_graphic_image.pdf, text_graphic_image.pdf-1.png The file from PDFBOX-1681 is missing the CIELAB output, it was there a few weeks ago. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051940#comment-14051940 ] Petr Slaby commented on PDFBOX-1915: In my test suite, I have one rendering fixed and no regressions. Cool, thanks. My only complaint is the performance. The attached file needs several minutes to render, especially the second page needs way too long. Without really understanding the algorithms, I had a look at PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and search for them instead of looping through all and checking which one contains the current point? The loop continues even after a matching triangle has been found. Could you at least break there? Also, the row/col loops always shift the current point by one. Isn't it likely that the same triangle or its neighbor will get a hit? Just ideas, keep up the good work. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the
[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051940#comment-14051940 ] Petr Slaby edited comment on PDFBOX-1915 at 7/3/14 9:42 PM: In my test suite, I have one rendering fixed and no regressions. Cool, thanks. My only complaint is the performance. The attached file (example_030.pdf) needs several minutes to render, especially the second page needs way too long. Without really understanding the algorithms, I had a look at PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and search for them instead of looping through all and checking which one contains the current point? The loop continues even after a matching triangle has been found. Could you at least break there? Also, the row/col loops always shift the current point by one. Isn't it likely that the same triangle or its neighbor will get a hit? Just ideas, keep up the good work. was (Author: pslabycz): In my test suite, I have one rendering fixed and no regressions. Cool, thanks. My only complaint is the performance. The attached file needs several minutes to render, especially the second page needs way too long. Without really understanding the algorithms, I had a look at PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and search for them instead of looping through all and checking which one contains the current point? The loop continues even after a matching triangle has been found. Could you at least break there? Also, the row/col loops always shift the current point by one. Isn't it likely that the same triangle or its neighbor will get a hit? Just ideas, keep up the good work. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049697#comment-14049697 ] Petr Slaby commented on PDFBOX-2126: The fix for PDFBOX-1772 is to reset the lastClip before restoring the graphics in the TransparencyGroup constructor, as the clipping was applied to a different graphics than what we are going to use now. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2126: --- Attachment: example_014.pdf Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2126: --- Attachment: ClipPath.2.patch Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049857#comment-14049857 ] Petr Slaby commented on PDFBOX-2126: ... or save and restore lastClip as shown in the attached ClipPath.2.patch. The patch also resets the lastClip before in processSubStream when painting annotation. This is necessary e.g. for the attached example_014.pdf containing an AcroForm. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049857#comment-14049857 ] Petr Slaby edited comment on PDFBOX-2126 at 7/2/14 12:19 PM: - ... or save and restore lastClip as shown in the attached ClipPath.2.patch. The patch also resets the lastClip before calling processSubStream in annotation processing. This is necessary e.g. for the attached example_014.pdf containing an AcroForm. was (Author: pslabycz): ... or save and restore lastClip as shown in the attached ClipPath.2.patch. The patch also resets the lastClip before in processSubStream when painting annotation. This is necessary e.g. for the attached example_014.pdf containing an AcroForm. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2176) Ignore IllegalArgumentException when reading an ICCProfile
[ https://issues.apache.org/jira/browse/PDFBOX-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2176: --- Attachment: 49.pdf Ignore IllegalArgumentException when reading an ICCProfile -- Key: PDFBOX-2176 URL: https://issues.apache.org/jira/browse/PDFBOX-2176 Project: PDFBox Issue Type: Bug Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 49.pdf A java.lang.IllegalArgumentException: Invalid ICC Profile Data is thrown from PDICCBase#loadICCProfile() when rendering the attached PDF. The code already checks for and ignores ProfileDataException and CMMException at this place. IllegalArgumentException is thrown if the profile header data is completely corrupt, either there is not even the 128 header bytes or the profile size found in header does not match the size of data. The exception is ignored in 1.8, in 2.0 it is re-thrown. I think ignoring the exception and using an alternate color space is better and consistent with the handling of the other two expected exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2176) Ignore IllegalArgumentException when reading an ICCProfile
Petr Slaby created PDFBOX-2176: -- Summary: Ignore IllegalArgumentException when reading an ICCProfile Key: PDFBOX-2176 URL: https://issues.apache.org/jira/browse/PDFBOX-2176 Project: PDFBox Issue Type: Bug Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 49.pdf A java.lang.IllegalArgumentException: Invalid ICC Profile Data is thrown from PDICCBase#loadICCProfile() when rendering the attached PDF. The code already checks for and ignores ProfileDataException and CMMException at this place. IllegalArgumentException is thrown if the profile header data is completely corrupt, either there is not even the 128 header bytes or the profile size found in header does not match the size of data. The exception is ignored in 1.8, in 2.0 it is re-thrown. I think ignoring the exception and using an alternate color space is better and consistent with the handling of the other two expected exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2180) LAB color space produces wrong colors
Petr Slaby created PDFBOX-2180: -- Summary: LAB color space produces wrong colors Key: PDFBOX-2180 URL: https://issues.apache.org/jira/browse/PDFBOX-2180 Project: PDFBox Issue Type: Bug Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor The attached example uses LAB colors. When rendering it using PDFToImage, the result is kind of violet, instead of black text and yellow rectangles (see the attached jpeg). When comparing 1.8 sources with current trunk, the incoming values are scaled to range in trunk PDLab#toRGB(), while this was not the case in 1.8 PDColorState and ColorSpaceLab. As far as I can tell, in 1.8 the values were only clipped to range in ColorSpaceLab#toCIEXYZ(). If I remove the scaling in 2.0 the rendering is correct. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2180) LAB color space produces wrong colors
[ https://issues.apache.org/jira/browse/PDFBOX-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2180: --- Attachment: 0003521.jpg 000352.pdf LAB color space produces wrong colors - Key: PDFBOX-2180 URL: https://issues.apache.org/jira/browse/PDFBOX-2180 Project: PDFBox Issue Type: Bug Components: PDModel, Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 000352.pdf, 0003521.jpg The attached example uses LAB colors. When rendering it using PDFToImage, the result is kind of violet, instead of black text and yellow rectangles (see the attached jpeg). When comparing 1.8 sources with current trunk, the incoming values are scaled to range in trunk PDLab#toRGB(), while this was not the case in 1.8 PDColorState and ColorSpaceLab. As far as I can tell, in 1.8 the values were only clipped to range in ColorSpaceLab#toCIEXYZ(). If I remove the scaling in 2.0 the rendering is correct. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049136#comment-14049136 ] Petr Slaby commented on PDFBOX-2126: [~jahewson]: In my original code, I was resetting the clipping path (lastClip = null;) just before processSubStream in drawPage, because that's exactly where the G2D transform changes. Your commits did not have that, maybe that was the reason of the regression? I must say I am not able to understand how your last commit works. It seems just to check whether the clip has changed in G2D, but not whether a new clip has been set in PDGraphicsState? Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047723#comment-14047723 ] Petr Slaby commented on PDFBOX-2126: I have removed clippingPath.clone() in my patch, the cloned PDGraphicsState uses a pointer to the same clipping path then. A new clipping object is only created in setClippingPath() (resp. intersectClippingPath()). This enables me to use the lastClip == currentClip condition in PageDrawer.applyClipping() to avoid applying the clip if it did not change. After the change from storing GeneralPath to storing Area, I thought the clippingPath.clone() in PDGraphicsState.clone() would be inevitable, but I can shift it to intersectClippingPath() as well. I will post an updated patch again as soon as possible, but unfortunately I am quite overwhelmed by my daily business right now. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046964#comment-14046964 ] Petr Slaby commented on PDFBOX-2126: I have already tried to replace the GeneralPath field by an Area field in PDGraphicsState myself, but then reverted it again. The problem is that Area is a slow and hungry beast. It seems that with a complex clipping path, it is better to give it back to garbage collector as soon as possible. Also, looking at the implementation of SunGraphics2D, the first thing that happens in clip() is that the shape is transformed using AffineTransform.createTransformedShape(). This is optimized (a little) for Path2D shapes, but not for Area. I have also tried using an alternative implementation of Area found at https://javagraphics.java.net/areax/. Compared to java.awt.Area, it is much faster and needs less memory when it comes to complex areas. It seems to be a little bit slower with simple rectangle areas. It has a modified BSD license that is fine for me, but I am personally not yet convinced whether it is worth having yet another third party library in the product. I am not sure whether it is compatible with the apache license, but for sure it is worth having a look at it just out of interest. It is amazing how a smart guy outperforms a big team at sun or oracle (albeit in a very small part of the library, of course). As for my original idea of applying the clipping path only if it has changed since being applied for the last time - your commit is a bit problematic for that. Because of the clippingPath.clone() in PDGraphicsState.clone() which is now necessary, I cannot use a simple comparison using lastClip == currentClip, I can probably solve it by having a counter in PDGraphcisState.intersectClippingPath() to keep track of how many times the clip has changed and what was the last change that was applied to Graphics2D. I will try to compare the performance of the code using the GeneralPath with the current one using Area again. I think that my code where I already tried that was quite similar to yours, but we will see. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2126: --- Attachment: ClipPath.1.patch Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045190#comment-14045190 ] Petr Slaby commented on PDFBOX-2126: Attached updated patch against latest trunk. I have moved the intersection of current clipping path with new clipping into PDGraphicsState#setCurrentClippingPath() to avoid duplicate code on places where this method is called (a second call has been added in PDFBOX-1875 now). There is a performance optimization in the computation of intersection trying to avoid creation of Area wherever possible. I do not insist on that, but it brought a big performance and memory consumption improvement on a corner case PDF where a very complex clipping path is used. It brought also some one pixel differences on a few of my test files, but I am not able to decide whether the before or after is correct. I am still hesitant to make the clip path immutable and private to the PDGraphicsState as that would mean to make a clone of input in setClipPath() and either a clone of output in getClipPath() or an introduction of methods like applyClipping(Graphics2D) and fillClipPath(Graphics2D) in PDGraphicsState. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044035#comment-14044035 ] Petr Slaby commented on PDFBOX-2144: Yes, exactly. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041157#comment-14041157 ] Petr Slaby commented on PDFBOX-2141: I see. Why not do the same, then - apply the transform to the path instead of the graphics? The following works like magic on your test file. I am just not sure whether it might negatively affect performance and I tested it with this single file only (pattern-shading-2-4-idMatrix.pdf). Alternatively, we might either be able to compute the right transformation from the deviceBounds/userBounds in AxialShadingPaint#createContext() (I am not sure, I just think the information we need might be in there), or use a custom rendering hint key and pass the additional transform along with the rendering hints. {noformat} private void drawGlyph2D(Glyph2D glyph2D, int[] codePoints, AffineTransform at) throws IOException { graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON); for (int i = 0; i codePoints.length; i++) { GeneralPath path = glyph2D.getPathForCharacterCode(codePoints[i]); if (path != null) { if (!at.isIdentity()) { path = (GeneralPath) path.clone(); path.transform(at); } graphics.fill(path); } } } {noformat} Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 1.8.7, 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch, pattern-shading-2-4-idMatrix.pdf, pattern-shading-2-4-idMatrix.pdf, pattern-shading-2-4-idMatrix1.jpg, pattern-shading-2-4-noMatrix.pdf, pattern-shading-2-4.ps, pattern-shading-2-4.ps The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040233#comment-14040233 ] Petr Slaby commented on PDFBOX-2141: {quote} pattern-shading-2-4-idMatrix.pdf ... {quote} But the problem does not seem to be related to this issue. At least I get an identical rendering before and after the change made in revision 1604282. Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 1.8.7, 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch, pattern-shading-2-4-idMatrix.pdf, pattern-shading-2-4-idMatrix1.jpg, pattern-shading-2-4.ps The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039959#comment-14039959 ] Petr Slaby commented on PDFBOX-2141: My observation was that the coordinates arriving to the AxialShadingContext in getRaster() were not what the shading expects. Not applying the transform to the graphics fixed it. The problem will be that, regardless of whether the transform is applied to the graphics or to the glyphs, the coordinates arriving to getRaster() are always the same. However, the transform applied to the cords field in the constructor of AxialShadingContext is the one that was set to the graphics, i.e. it is a different one if the graphics was transformed. Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039959#comment-14039959 ] Petr Slaby edited comment on PDFBOX-2141 at 6/21/14 9:46 PM: - My observation was that the coordinates arriving to the AxialShadingContext in getRaster() were not what the shading expects. Not applying the transform to the graphics fixed it. The problem will be that, regardless of whether the transform is applied to the graphics or to the glyphs, the coordinates arriving to getRaster() are always the same. However, the transform applied to the cords field in the constructor of AxialShadingContext is the one that was set to the graphics, i.e. it is a different one if the graphics was transformed. So yes, I agree. was (Author: pslabycz): My observation was that the coordinates arriving to the AxialShadingContext in getRaster() were not what the shading expects. Not applying the transform to the graphics fixed it. The problem will be that, regardless of whether the transform is applied to the graphics or to the glyphs, the coordinates arriving to getRaster() are always the same. However, the transform applied to the cords field in the constructor of AxialShadingContext is the one that was set to the graphics, i.e. it is a different one if the graphics was transformed. Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038567#comment-14038567 ] Petr Slaby commented on PDFBOX-2149: Attached a file which runs into a NPE in PDFont#isSymbolicFont() now. {noformat} Caused by: java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDFont.isSymbolicFont(PDFont.java:694) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getGIDForCharacterCode(PDTrueTypeFont.java:408) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:378) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:259) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:226) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:209) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:175) at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:227) at org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:190) at org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:174) {noformat} Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2149: --- Attachment: 000467.pdf Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2149: --- Attachment: 39.pdf Here is another one. Hope this helps. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading
[ https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038604#comment-14038604 ] Petr Slaby commented on PDFBOX-2153: Sounds reasonable. Current clipping path is passed to graphics.fill(), so if the graphics has a clipping path from a previous operation, it might interfere with that. I vote for setClip(null) because setClip() is a time and memory consuming operation if called with a complex path. The change does not show any effect on my test suite documents, it seems that I do not have an example that would be affected. Setting the correct clipping path for shading - Key: PDFBOX-2153 URL: https://issues.apache.org/jira/browse/PDFBOX-2153 Project: PDFBox Issue Type: Bug Components: Rendering Reporter: Tilman Hausherr Labels: shading, shadingpattern While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the clipping region) operator of a type 7 shading I got a lot more correct shadings (type 6 and lower). It looked like PDFBox had been using the clipping of the type 7 when drawing the type 6, which is just a rectangle above in that rendering. This resulted in a blank. By adding {code} graphics.setClip(getGraphicsState().getCurrentClippingPath()); {code} in PageDrawer.shfill() just before the graphics.fill() I get several files to render correctly that I hadn't before. (Setting null will probably do the same, didn't test that yet). The following PDFs are rendered correctly with the change: McAfee-ShadingType7.pdf eci_altona-test-suite-v2_technical_H.pdf crestron-p9.pdf (these three found in PDFBOX-1915) PDFBOX-1451.pdf (alfresco) PDFBOX-1940.pdf (chart) PDFBOX-1861-tracemonkey.pdf p.11 Not solved by the change: PDFBOX-2098-asyTUG.pdf p.6 (this one doesn't use shfill) PDFBOX-1861-tracemonkey.pdf p.6 (not shading) PDFBOX-1416.pdf (not shading) texample-rgb-triangle.pdf (John has an explanation about that one) WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() ? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2145) Clean up PDFStreamEngine and PDFTextStripper
[ https://issues.apache.org/jira/browse/PDFBOX-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036392#comment-14036392 ] Petr Slaby commented on PDFBOX-2145: After the change of TextPosition, the fields x and y are useless. Previously, they were used to cache the value in getX() resp. getY() Clean up PDFStreamEngine and PDFTextStripper Key: PDFBOX-2145 URL: https://issues.apache.org/jira/browse/PDFBOX-2145 Project: PDFBox Issue Type: Improvement Components: Text extraction Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Priority: Minor PDFStreamEngine and PDFTextStripper don't really meet our coding conventions and have several unused methods and deprecated code which can safely be removed. This should clear the way to fixing some bugs in PDFStreamEngine, PDFTextStripper and the various PDFont classes related to text encoding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2145) Clean up PDFStreamEngine and PDFTextStripper
[ https://issues.apache.org/jira/browse/PDFBOX-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034373#comment-14034373 ] Petr Slaby commented on PDFBOX-2145: Speaking about clean up - the way my Eclipse is configured, I get about 200 warnings on pdfbox code - unused imports, potential null pointer access, redundant null check, undocumented empty blocks, unnecessary semicolon - to name but a few. Usage of such warning is a matter of personal taste or team rules - what are yours? Do you intend to clean up at least some of these warnings? Mostly, this does not bring any real improvements, but still. E.g. following the warnings in PDSeedValue reveals a duplicated null check which does not make any obvious sense. Clean up PDFStreamEngine and PDFTextStripper Key: PDFBOX-2145 URL: https://issues.apache.org/jira/browse/PDFBOX-2145 Project: PDFBox Issue Type: Improvement Components: Text extraction Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Priority: Minor PDFStreamEngine and PDFTextStripper don't really meet our coding conventions and have several unused methods and deprecated code which can safely be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034393#comment-14034393 ] Petr Slaby commented on PDFBOX-2104: Cool, thanks. Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: John Hewson Labels: transparency Fix For: 2.0.0 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, TransparencyGroups.2.patch, TransparencyGroups.3.patch, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2104: --- Attachment: TransparencyGroups.3.patch Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: John Hewson Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, TransparencyGroups.2.patch, TransparencyGroups.3.patch, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032363#comment-14032363 ] Petr Slaby commented on PDFBOX-2104: I have refactored the code according to your suggestions. Only {quote} * In PageDrawer the following graphics state is constructed, but it is never used: {quote} This code does not construct a new graphics state, it changes some settings in the current one. Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: John Hewson Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, TransparencyGroups.2.patch, TransparencyGroups.3.patch, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2144) Provide a pluggable font manager
Petr Slaby created PDFBOX-2144: -- Summary: Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2144: --- Attachment: FontManager.patch Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032896#comment-14032896 ] Petr Slaby commented on PDFBOX-2144: Sorry, that was not intended. Anyway, the patch just shows what I need. Someone more familiar with pdfbox API design and its intentions has to decide whether or how such a feature can be implemented. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2144: --- Attachment: (was: FontManager.patch) Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager
[ https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2144: --- Attachment: FontManager.patch Fixed the license agreements in file headers. Sorry again, will try to be more careful next time. Provide a pluggable font manager Key: PDFBOX-2144 URL: https://issues.apache.org/jira/browse/PDFBOX-2144 Project: PDFBox Issue Type: Improvement Components: Rendering Reporter: Petr Slaby Attachments: FontManager.patch Our J2EE application has all fonts and resources configured and stored in its database. No files are accessed directly from file system or from system environment. To make PDFBox compatible with this philosophy, we need the FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the attached patch. The proposal defines a FontManager interface and default implementation which is the original one. FontManager then needs to be configured on and propagated from PDFStreamEngine and PageDrawer. It should also be configurable on PDFRenderer, which is not shown in the patch. There I would suggest to introduce a configuration object which would take care about all the current and future options of PDFRenderer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2104: --- Attachment: TransparencyGroups.2.patch Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: John Hewson Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, TransparencyGroups.2.patch, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2104) Implement transparency groups
[ https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030435#comment-14030435 ] Petr Slaby commented on PDFBOX-2104: It is not easy to merge changes from 1.7.x into 2.0... You are right, ImagePaintTiling is not used any more and ImagePaint can be replaced by TexturePaint. Modified patch is attached, also with slight changes towards PDFBox coding style (we use the prefix m on all field names, sorry if I forget it somewhere) and a potential NPE fixed in PDFormXObject#createPageDrawerGroup (matrix can be null). Implement transparency groups - Key: PDFBOX-2104 URL: https://issues.apache.org/jira/browse/PDFBOX-2104 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Assignee: John Hewson Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, TransparencyGroups.2.patch, TransparencyGroups.patch The attached PDF uses transparency groups, blending and soft masks to create the rounded corners and shades behind images. It appears that these features are not implemented in PDFBox. An implementation proposal is attached in the TransparencyGroup.patch. The basic idea is to create a buffered image, draw the transparency group content onto it and then use the result to produce the soft mask or draw the image on the original g2d. Note: I am not the (only) author of the proposed change. It was developed in our company few years ago in sources based on a 1.7.x version of PDFBox, mostly by a guy who already left. Over the years, merging of the work done in PDFBox main stream into our source base has become impossible due to many refactorings and other deep going changes done. Now we would like to go the opposite way - where possible - bring the changes and fixes we have done into PDFBox main stream and start to use it in our installations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2141) Shading not applied to text
Petr Slaby created PDFBOX-2141: -- Summary: Shading not applied to text Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petr Slaby updated PDFBOX-2141: --- Attachment: 04_ShadingPatternTextPDF.pdf PageDrawer.writeFont.java.patch Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031228#comment-14031228 ] Petr Slaby commented on PDFBOX-2141: The following seems to make the trick, tested on a single file so far, though: {noformat} private void writeFont(final AffineTransform at, final GlyphVector glyphs) { // Convert from PDF, where glyphs are upright when direction is from // bottom to top, to AWT, where this is the other way around at.scale(1, -1); for(int i=0; iglyphs.getNumGlyphs(); i++) { AffineTransform glyphTransform = glyphs.getGlyphTransform(i); Point2D glyphPos = glyphs.getGlyphPosition(i); AffineTransform applyTransform; if(glyphTransform != null || glyphPos.getX() != 0 || glyphPos.getY() != 0) { AffineTransform translate = AffineTransform.getTranslateInstance(glyphPos.getX(), glyphPos.getY()); if(glyphTransform != null) { translate.concatenate(glyphTransform); } translate.preConcatenate(at); applyTransform = translate; glyphs.setGlyphPosition(i, new Point2D.Float(0, 0)); } else { applyTransform = at; } glyphs.setGlyphTransform(i, applyTransform); } graphics.drawGlyphVector(glyphs, 0, 0); } {noformat} Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2141) Shading not applied to text
[ https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031274#comment-14031274 ] Petr Slaby commented on PDFBOX-2141: I am just not sure whether the incoming transform can also contain rotation or skew and whether you need to take that into account, too. For sure you should expect more than two glyphs, although I did not see such an example in my test suite. My code seems more generic - it concatenates everything into the glyph transform and sets its position to 0,0. On the other hand, less code is usually better... It is getting too late in the night for me now. I will retest both solutions with my test suite on Monday, but I assume they will render the same results. Shading not applied to text --- Key: PDFBOX-2141 URL: https://issues.apache.org/jira/browse/PDFBOX-2141 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Priority: Minor Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch The attached PDF draws a text filled with horizontal shading going from red to blue. When rendered via PDFBox, the text is completely filled with red. The problem is that AxialShadingContext#getRaster() gets called with positions that completely fell outside of the range stored in its coords[] field. The fix seems to be to set glyph transform rather than graphics2d transform in PageDrawer#writeText() as shown in the attached patch. -- This message was sent by Atlassian JIRA (v6.2#6252)