[jira] [Created] (PDFBOX-4756) ScratchFileBuffer seek beyond the last page

2020-01-30 Thread Petr Slaby (Jira)
Petr Slaby created PDFBOX-4756:
--

 Summary: ScratchFileBuffer seek beyond the last page
 Key: PDFBOX-4756
 URL: https://issues.apache.org/jira/browse/PDFBOX-4756
 Project: PDFBox
  Issue Type: Bug
Reporter: Petr Slaby
 Attachments: ScratchFileBuffer.java.patch, 
ScratchFileBufferRegressionTest.java

When rendering a confidential PDF, we get a java.io.EOFException in 
ScratchFileBuffer.seek(). The problem is demonstrated in the attached test and 
fixed by the attached patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4627) Wrong color of uncolored tiling pattern

2019-08-13 Thread Petr Slaby (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906178#comment-16906178
 ] 

Petr Slaby commented on PDFBOX-4627:


We use PDFBox to render PDFs. For our customers, Adobe Reader is the only 
source of truth. Even if I managed to convince them that the given PDF is not 
correct according to the specification, they will not be able to change it. The 
PDFs most usually come from a source they do not have an influence on.

I think the argument "but it renders with Adobe Reader" is a valid one. I agree 
that it is not possible to be 100% compatible with Adobe reader, but PDBox 
should try to head in this direction - much rather than attempting to be 100% 
compatible with the norm and trying to punish you for every mistake a PDF 
writer software has made. 

> Wrong color of uncolored tiling pattern
> ---
>
> Key: PDFBOX-4627
> URL: https://issues.apache.org/jira/browse/PDFBOX-4627
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.16
>Reporter: Jiri Kunhart
>Priority: Major
> Attachments: after_fix.png, before_fix.png, 
> uncolored_tiling_pattern.patch, uncolored_tiling_pattern.pdf
>
>
> The attached pdf file with uncolored tiling pattern is rendered wrongly (see 
> "before_fix.png"). The problem is that pattern stream contains
> /DevGrayCS cs
> which overwrites PDPattern color space stored in 
> PDGraphicsState#nonStrokingColor. I did a small fix which ignores all 
> settings of color space inside of uncolored tiling pattern stream and the 
> result seems to be good (see "after_fix.png").
> Note: the pattern in the png file looks diferently than in the original pdf 
> file, but this should be handled probably in the other issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-21 Thread Petr Slaby (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623694#comment-16623694
 ] 

Petr Slaby commented on PDFBOX-4309:


Trying to load a sun class does not seem to be a good approach to me. Isn't it 
so that KCMS is only used if Java < 7 or Java in (7, 8) and the system property 
sun.java2d.cmm is set to sun.java2d.cmm.kcms.KcmsServiceProvider ? In all other 
cases, LCMS can be assumed.

However, there is also IBM Java, JRebel and the like. I do not know what CMS 
these are using and whether it is "slow" or "fast". 

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF documents (e.g. 

[jira] [Commented] (PDFBOX-4245) wrong rendering of the transparency group at the specific position on a page

2018-07-22 Thread Petr Slaby (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552145#comment-16552145
 ] 

Petr Slaby commented on PDFBOX-4245:


Ooops, sorry. Thx.

> wrong rendering of the transparency group at the specific position on a page
> 
>
> Key: PDFBOX-4245
> URL: https://issues.apache.org/jira/browse/PDFBOX-4245
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.10
>Reporter: Jiri Kunhart
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: patch
> Attachments: gs-bugzilla690022-reduced-rotations-cropbox.pdf, 
> gs-bugzilla690022-reduced-rotations.pdf, gs-bugzilla690022.pdf, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_all.patch, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_resources.zip, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_sources.patch
>
>
> The rendering of the transparency groups works only if the whole page is 
> rendered. If you try to render only a part of the page where is a 
> transparency group placed, you will get only the white image or an image with 
> shifted pixels representing applied soft mask. The simple fix is attached in 
> the patch, including the test and the resources used for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4245) wrong rendering of the transparency group at the specific position on a page

2018-07-22 Thread Petr Slaby (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552140#comment-16552140
 ] 

Petr Slaby commented on PDFBOX-4245:


[~tilman]: Could you please link or attach the files which had the regression 
so that we could have a look? We cannot use PDFBox 2.0 in our product until 
this issue is resolved.

> wrong rendering of the transparency group at the specific position on a page
> 
>
> Key: PDFBOX-4245
> URL: https://issues.apache.org/jira/browse/PDFBOX-4245
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.10
>Reporter: Jiri Kunhart
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: patch
> Attachments: gs-bugzilla690022-reduced-rotations-cropbox.pdf, 
> gs-bugzilla690022-reduced-rotations.pdf, gs-bugzilla690022.pdf, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_all.patch, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_resources.zip, 
> pdfbox-2.0.10-SNAPSHOT_transparency_group_sources.patch
>
>
> The rendering of the transparency groups works only if the whole page is 
> rendered. If you try to render only a part of the page where is a 
> transparency group placed, you will get only the white image or an image with 
> shifted pixels representing applied soft mask. The simple fix is attached in 
> the patch, including the test and the resources used for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4229) allow user to set FontProvider

2018-05-25 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490348#comment-16490348
 ] 

Petr Slaby commented on PDFBOX-4229:


That class is not public, so you have to copy it to your own source base to be 
able to set the provider (I did not try, not sure if that is even possible 
without copying a lot of dependencies, too)...

The FontMapper topic has been discussed some time ago in PDFBOX-2539 and on the 
mailing list. I (still) share the opinion that the external font mapping 
customisation needs an improvement. In our application, we need a different 
font configuration for tasks running in parallel threads in an application 
server. This can only be achieved using a ThreadLocal variable in a custom 
FontMapper implementation (at least I hope it can work this way, we did not 
implement it yet).

> allow user to set FontProvider
> --
>
> Key: PDFBOX-4229
> URL: https://issues.apache.org/jira/browse/PDFBOX-4229
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.9
>Reporter: Michael Brackx
>Priority: Major
>
> Allow a user to set FontProvider without "hacking".
> Currently when using pubic interfaces only a complete FontMapper needs to be 
> implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4112) Build and test PDFBox with JDK10

2018-02-27 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378663#comment-16378663
 ] 

Petr Slaby commented on PDFBOX-4112:


I get the funny message "Oh-oh, sun.java2d.cmm.kcms.KcmsServiceProvider no 
longer exists, so image rendering will be much slower :-(" written on console 
when running the PDFDebugger on java 1.7.0_72-b14. I am not sure if that was 
the intention.

> Build and test PDFBox with JDK10
> 
>
> Key: PDFBOX-4112
> URL: https://issues.apache.org/jira/browse/PDFBOX-4112
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: jdk10
>
> Issue to collect problems and solutions for building and testing PDFBox with 
> JDK10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly

2017-12-15 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293218#comment-16293218
 ] 

Petr Slaby commented on PDFBOX-4038:


{quote}
But I doubt that the result values must have an increasing order 
{quote}

You are right, that was my misinterpretation of the information I got. The 
Type1 specification says exactly this:

The value associated with BlueValues is an array containing an even number of 
integers taken in pairs, and which follow a small
number of rules:
- The first integer in each pair is less than or equal to the second integer in 
that pair.
...

But that is relatively unimportant in this context. The important information 
is the delta encoding of the integer array in CFF which must be taken into 
account by the CFFParser. 

Thanks.


> CFF font Blue values and other delta encoded lists read incorrectly
> ---
>
> Key: PDFBOX-4038
> URL: https://issues.apache.org/jira/browse/PDFBOX-4038
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.8
>Reporter: Petr Slaby
>Assignee: Tilman Hausherr
> Fix For: 2.0.9, 3.0.0 PDFBox
>
> Attachments: BlueValuesTest.java, CFFParser.java.patch
>
>
> The attached test compares the values retrieved via CFFParser from an 
> OpenType font with the expected values as seen in FontForge (go to 
> Element->Font Info->PS Private). 
> The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans
> The CFF font specification explaining the encoding of the entries which are 
> incorrectly parsed by FontBox CFFParser can be found here 
> https://typekit.files.wordpress.com/2013/05/5176.cff.pdf
> We use FontBox to read the font when we need to embed it into an PDF which we 
> produce via our Apache FOP based software. Adobe validator complains about 
> incorrect "Blue values" sorting then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly

2017-12-15 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-4038:
---
Attachment: BlueValuesTest.java

> CFF font Blue values and other delta encoded lists read incorrectly
> ---
>
> Key: PDFBOX-4038
> URL: https://issues.apache.org/jira/browse/PDFBOX-4038
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.8
>Reporter: Petr Slaby
> Attachments: BlueValuesTest.java, CFFParser.java.patch
>
>
> The attached test compares the values retrieved via CFFParser from an 
> OpenType font with the expected values as seen in FontForge (go to 
> Element->Font Info->PS Private). 
> The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans
> The CFF font specification explaining the encoding of the entries which are 
> incorrectly parsed by FontBox CFFParser can be found here 
> https://typekit.files.wordpress.com/2013/05/5176.cff.pdf
> We use FontBox to read the font when we need to embed it into an PDF which we 
> produce via our Apache FOP based software. Adobe validator complains about 
> incorrect "Blue values" sorting then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly

2017-12-15 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-4038:
---
Attachment: (was: BlueValuesTest.java)

> CFF font Blue values and other delta encoded lists read incorrectly
> ---
>
> Key: PDFBOX-4038
> URL: https://issues.apache.org/jira/browse/PDFBOX-4038
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.8
>Reporter: Petr Slaby
> Attachments: CFFParser.java.patch
>
>
> The attached test compares the values retrieved via CFFParser from an 
> OpenType font with the expected values as seen in FontForge (go to 
> Element->Font Info->PS Private). 
> The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans
> The CFF font specification explaining the encoding of the entries which are 
> incorrectly parsed by FontBox CFFParser can be found here 
> https://typekit.files.wordpress.com/2013/05/5176.cff.pdf
> We use FontBox to read the font when we need to embed it into an PDF which we 
> produce via our Apache FOP based software. Adobe validator complains about 
> incorrect "Blue values" sorting then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4038) CFF font Blue values and other delta encoded lists read incorrectly

2017-12-15 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-4038:
--

 Summary: CFF font Blue values and other delta encoded lists read 
incorrectly
 Key: PDFBOX-4038
 URL: https://issues.apache.org/jira/browse/PDFBOX-4038
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.8
Reporter: Petr Slaby
 Attachments: BlueValuesTest.java, CFFParser.java.patch

The attached test compares the values retrieved via CFFParser from an OpenType 
font with the expected values as seen in FontForge (go to Element->Font 
Info->PS Private). 

The font NeoSans Black.otf can be found at https://www.wfonts.com/font/neosans

The CFF font specification explaining the encoding of the entries which are 
incorrectly parsed by FontBox CFFParser can be found here 
https://typekit.files.wordpress.com/2013/05/5176.cff.pdf

We use FontBox to read the font when we need to embed it into an PDF which we 
produce via our Apache FOP based software. Adobe validator complains about 
incorrect "Blue values" sorting then.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-19 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172220#comment-16172220
 ] 

Petr Slaby commented on PDFBOX-3933:


Thanks, Tilman. The commit comment should rather be "don't swallow CR at the 
end of stream if there is *none* at the beginning" as that is what the code 
does (or "swallow CR at the end of stream if and only if there is one at the 
beginning")

> PDFParser swallows a CR at the end of a stream
> --
>
> Key: PDFBOX-3933
> URL: https://issues.apache.org/jira/browse/PDFBOX-3933
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.13
>Reporter: Petr Slaby
> Attachments: Beispiel2.pdf, EndlinePrediction2.patch, 
> EndlinePrediction.patch
>
>
> I have a PDF which I cannot share at the moment, maybe later if I get a 
> permission from the customer. 
> The PDF is protected by an empty password, all streams are encrypted using 
> AES. The PDF consistently uses the LF character for line endings. One of the 
> streams looks like this:
> {code}
> 10 0 obj
> <>
> stream
> <0x0D><0x0A>
> endstream
> {code}
> i.e. Length field is a reference to an object, in the content, the length 
> object is stored immediately after the stream as
> {code}
> 9 0 obj
> 2624
> endobj
> {code}
> The byte <0x0D> belongs to the stream and is not to be treated as line 
> separator in this case. The parser is not able to read the length field so it 
> manually searches for the stream end in the class EndstreamOutputStream. This 
> class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
> strips off the <0x0D> from this particular stream content. Since the stream 
> is encrypted, PDFBox runs into a BadPaddingException later on when trying to 
> decrypt the stream.
> The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
> 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
> because it uses the non-sequential parser by default.
> The proposed fix is to analyze the PDF content while reading it and search 
> for the CR character only if it was ever encountered as a line separator 
> prior to the stream being parsed.
> Note: I do not exactly know or understand the usage of the other classes 
> inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
> heuristic should be kept "as before" in these classes, by setting the new 
> field BaseParser.hasCR to true already in the constructor.
> A patch is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-19 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3933:
---
Attachment: EndlinePrediction2.patch

I have attached a new patch which works fine both with your and my file. The 
line ending to search for is discovered after the "stream" keyword, with the 
expectation that there will be the same line ending after the "stream" keyword 
and after the stream content. 

> PDFParser swallows a CR at the end of a stream
> --
>
> Key: PDFBOX-3933
> URL: https://issues.apache.org/jira/browse/PDFBOX-3933
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.14
>Reporter: Petr Slaby
> Attachments: Beispiel2.pdf, EndlinePrediction2.patch, 
> EndlinePrediction.patch
>
>
> I have a PDF which I cannot share at the moment, maybe later if I get a 
> permission from the customer. 
> The PDF is protected by an empty password, all streams are encrypted using 
> AES. The PDF consistently uses the LF character for line endings. One of the 
> streams looks like this:
> {code}
> 10 0 obj
> <>
> stream
> <0x0D><0x0A>
> endstream
> {code}
> i.e. Length field is a reference to an object, in the content, the length 
> object is stored immediately after the stream as
> {code}
> 9 0 obj
> 2624
> endobj
> {code}
> The byte <0x0D> belongs to the stream and is not to be treated as line 
> separator in this case. The parser is not able to read the length field so it 
> manually searches for the stream end in the class EndstreamOutputStream. This 
> class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
> strips off the <0x0D> from this particular stream content. Since the stream 
> is encrypted, PDFBox runs into a BadPaddingException later on when trying to 
> decrypt the stream.
> The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
> 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
> because it uses the non-sequential parser by default.
> The proposed fix is to analyze the PDF content while reading it and search 
> for the CR character only if it was ever encountered as a line separator 
> prior to the stream being parsed.
> Note: I do not exactly know or understand the usage of the other classes 
> inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
> heuristic should be kept "as before" in these classes, by setting the new 
> field BaseParser.hasCR to true already in the constructor.
> A patch is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-19 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3933:
---
Attachment: Beispiel2.pdf

I have got the permission to share the problematic PDF, it is attached now.

> PDFParser swallows a CR at the end of a stream
> --
>
> Key: PDFBOX-3933
> URL: https://issues.apache.org/jira/browse/PDFBOX-3933
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.14
>Reporter: Petr Slaby
> Attachments: Beispiel2.pdf, EndlinePrediction.patch
>
>
> I have a PDF which I cannot share at the moment, maybe later if I get a 
> permission from the customer. 
> The PDF is protected by an empty password, all streams are encrypted using 
> AES. The PDF consistently uses the LF character for line endings. One of the 
> streams looks like this:
> {code}
> 10 0 obj
> <>
> stream
> <0x0D><0x0A>
> endstream
> {code}
> i.e. Length field is a reference to an object, in the content, the length 
> object is stored immediately after the stream as
> {code}
> 9 0 obj
> 2624
> endobj
> {code}
> The byte <0x0D> belongs to the stream and is not to be treated as line 
> separator in this case. The parser is not able to read the length field so it 
> manually searches for the stream end in the class EndstreamOutputStream. This 
> class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
> strips off the <0x0D> from this particular stream content. Since the stream 
> is encrypted, PDFBox runs into a BadPaddingException later on when trying to 
> decrypt the stream.
> The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
> 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
> because it uses the non-sequential parser by default.
> The proposed fix is to analyze the PDF content while reading it and search 
> for the CR character only if it was ever encountered as a line separator 
> prior to the stream being parsed.
> Note: I do not exactly know or understand the usage of the other classes 
> inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
> heuristic should be kept "as before" in these classes, by setting the new 
> field BaseParser.hasCR to true already in the constructor.
> A patch is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-19 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171204#comment-16171204
 ] 

Petr Slaby commented on PDFBOX-3933:


Yes, it prints 17661 with the proposed change, 17660 without it. The embedded 
ZIP is the only thing which is using line ending 0D0A in this PDF, so the 
parser does not know it should search for 0D in this case. This is kind of 
opposite to my example where all line endings are 0A, there is a 0D0A at the 
end of a stream, the 0D belongs to the stream content, and only the 0A is the 
line ending.

> PDFParser swallows a CR at the end of a stream
> --
>
> Key: PDFBOX-3933
> URL: https://issues.apache.org/jira/browse/PDFBOX-3933
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.14
>Reporter: Petr Slaby
> Attachments: EndlinePrediction.patch
>
>
> I have a PDF which I cannot share at the moment, maybe later if I get a 
> permission from the customer. 
> The PDF is protected by an empty password, all streams are encrypted using 
> AES. The PDF consistently uses the LF character for line endings. One of the 
> streams looks like this:
> {code}
> 10 0 obj
> <>
> stream
> <0x0D><0x0A>
> endstream
> {code}
> i.e. Length field is a reference to an object, in the content, the length 
> object is stored immediately after the stream as
> {code}
> 9 0 obj
> 2624
> endobj
> {code}
> The byte <0x0D> belongs to the stream and is not to be treated as line 
> separator in this case. The parser is not able to read the length field so it 
> manually searches for the stream end in the class EndstreamOutputStream. This 
> class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
> strips off the <0x0D> from this particular stream content. Since the stream 
> is encrypted, PDFBox runs into a BadPaddingException later on when trying to 
> decrypt the stream.
> The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
> 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
> because it uses the non-sequential parser by default.
> The proposed fix is to analyze the PDF content while reading it and search 
> for the CR character only if it was ever encountered as a line separator 
> prior to the stream being parsed.
> Note: I do not exactly know or understand the usage of the other classes 
> inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
> heuristic should be kept "as before" in these classes, by setting the new 
> field BaseParser.hasCR to true already in the constructor.
> A patch is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-18 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3933:
---
Affects Version/s: 1.8.14

> PDFParser swallows a CR at the end of a stream
> --
>
> Key: PDFBOX-3933
> URL: https://issues.apache.org/jira/browse/PDFBOX-3933
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.14
>Reporter: Petr Slaby
> Attachments: EndlinePrediction.patch
>
>
> I have a PDF which I cannot share at the moment, maybe later if I get a 
> permission from the customer. 
> The PDF is protected by an empty password, all streams are encrypted using 
> AES. The PDF consistently uses the LF character for line endings. One of the 
> streams looks like this:
> {code}
> 10 0 obj
> <>
> stream
> <0x0D><0x0A>
> endstream
> {code}
> i.e. Length field is a reference to an object, in the content, the length 
> object is stored immediately after the stream as
> {code}
> 9 0 obj
> 2624
> endobj
> {code}
> The byte <0x0D> belongs to the stream and is not to be treated as line 
> separator in this case. The parser is not able to read the length field so it 
> manually searches for the stream end in the class EndstreamOutputStream. This 
> class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
> strips off the <0x0D> from this particular stream content. Since the stream 
> is encrypted, PDFBox runs into a BadPaddingException later on when trying to 
> decrypt the stream.
> The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
> 1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
> because it uses the non-sequential parser by default.
> The proposed fix is to analyze the PDF content while reading it and search 
> for the CR character only if it was ever encountered as a line separator 
> prior to the stream being parsed.
> Note: I do not exactly know or understand the usage of the other classes 
> inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
> heuristic should be kept "as before" in these classes, by setting the new 
> field BaseParser.hasCR to true already in the constructor.
> A patch is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3933) PDFParser swallows a CR at the end of a stream

2017-09-18 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-3933:
--

 Summary: PDFParser swallows a CR at the end of a stream
 Key: PDFBOX-3933
 URL: https://issues.apache.org/jira/browse/PDFBOX-3933
 Project: PDFBox
  Issue Type: Bug
Reporter: Petr Slaby
 Attachments: EndlinePrediction.patch

I have a PDF which I cannot share at the moment, maybe later if I get a 
permission from the customer. 

The PDF is protected by an empty password, all streams are encrypted using AES. 
The PDF consistently uses the LF character for line endings. One of the streams 
looks like this:
{code}
10 0 obj
<>
stream
<0x0D><0x0A>
endstream
{code}
i.e. Length field is a reference to an object, in the content, the length 
object is stored immediately after the stream as
{code}
9 0 obj
2624
endobj
{code}

The byte <0x0D> belongs to the stream and is not to be treated as line 
separator in this case. The parser is not able to read the length field so it 
manually searches for the stream end in the class EndstreamOutputStream. This 
class searches both for the pair <0x0D><0x0A> and the single <0x0A>, so it 
strips off the <0x0D> from this particular stream content. Since the stream is 
encrypted, PDFBox runs into a BadPaddingException later on when trying to 
decrypt the stream.

The problem is reproducible using org.apache.pdfbox.PDFToImage in current 
1.8.14-SNAPSHOT. The same works fine in current PDFBox 2.0.x, presumably 
because it uses the non-sequential parser by default.

The proposed fix is to analyze the PDF content while reading it and search for 
the CR character only if it was ever encountered as a line separator prior to 
the stream being parsed.

Note: I do not exactly know or understand the usage of the other classes 
inherited from BaseParser, like PDFObjectStreamParser. Maybe the line ending 
heuristic should be kept "as before" in these classes, by setting the new field 
BaseParser.hasCR to true already in the constructor.

A patch is attached.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3764) 100 times performance hit on creating images

2017-04-24 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981016#comment-15981016
 ] 

Petr Slaby commented on PDFBOX-3764:


The bad news is that the -Dsun.java2d.cmm switch seems to be unofficial (at 
least I did not find it in any Oracle documentation) and the KCMS 
implementation is likely to disappear completely in one of the future Java 
versions. The OpenJDK issue https://bugs.openjdk.java.net/browse/JDK-8041125 is 
closed and the comments in it do not give me much hope that the LCMS would get 
faster again or that the Oracle/OpenJDK guys would even consider working on its 
performance. All very sad... A workaround implementation in PDFBox would be 
very welcome because of that.

BTW, the page 
http://www.subshell.com/en/subshell/blog/Wrong-Colors-in-Images-with-Java8-100.html
 mentioned in the Getting Started Guide does not open.

> 100 times performance hit on creating images
> 
>
> Key: PDFBOX-3764
> URL: https://issues.apache.org/jira/browse/PDFBOX-3764
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.6
>Reporter: Daniel Persson
>  Labels: image, performance
> Attachments: callstack_1.png, callstack_2.png, test.pdf
>
>
> We found that PDFBox creates a better image than poppler so we wanted to 
> switch out our environment to get these improvements but found a file that 
> took about 10 minutes to create one image with PDFBox and only about 6 
> seconds with poppler. So a 100 times performance hit if we where to change.
> I've done some rudimentary profiling on the code and found that most of the 
> time is spent in ColorConvertOp.filter. Maybe there is a leaner way to 
> implement this in order to get a better result?
> best regards
> Daniel



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3340) Image decoded twice without a real need

2016-05-05 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-3340:
--

 Summary: Image decoded twice without a real need
 Key: PDFBOX-3340
 URL: https://issues.apache.org/jira/browse/PDFBOX-3340
 Project: PDFBox
  Issue Type: Bug
Reporter: Petr Slaby
Priority: Minor


Take the pdf from PDFBOX-1708, put a breakpoint into the class CCITTFaxFilter, 
method decode() and run PDFToImage. You will see the debugger stop twice, even 
if the pdf contains a single image. 

The second call is arrives when the image is rendered to G2D, this is OK. But 
for the first time, the image is decompressed in the constructor of 
PDImageXObject - line 147 
{noformat}
this(stream, resources, stream.createInputStream());
{noformat}
just to allow the filter (CCITTFaxFilter in this case) to provide additional 
dictionary parameters in case something is missing in the input (COLORSPACE 
would be set to DeviceGray if missing here).

I think this is a complete waste. The filter should be able to fix the 
dictionary without having to decode the image. As far as I can tell, this could 
be done by implementing a repair method on COSStream and on implementations of 
Filter.

Also, I do not see that the stream created in the above mentioned constructor 
of PDImageXObject would ever be closed. This seems to be a more general issue. 
I have put a counter into COSInputStream.create(), there where it creates new 
RandomAccessInputStream(buffer). With the testfile from PDFBOX-1708, I end up 
with 3 unclosed streams when the program finishes. I am not sure whether this 
is important, but I guess the unclosed streams are uselessly occupying space in 
the scratch file.

Sorry if this is just lack of understanding of the code from my side, but I 
could not resist to report what I see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271452#comment-15271452
 ] 

Petr Slaby commented on PDFBOX-3338:


OK, here is the patch. I did not manage to make it work with the G3 1D example 
from PDFBOX-1708, so I left the K=0 path untouched in the end. I have 
successfully tested a G3 2D example, G4 byte-aligned and G4 w/o byte align. 
Hope your results will be good as well.

> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, CCITTFaxFilter.patch, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3338:
---
Attachment: CCITTFaxFilter.patch

> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, CCITTFaxFilter.patch, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271232#comment-15271232
 ] 

Petr Slaby commented on PDFBOX-3338:


I see. I misunderstood your earlier comment then, sorry. I have double-checked 
this now, the class compiles fine with java compliance set to 1.5. It would 
compile with older versions, too, except for the few annotations it is using.

> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271045#comment-15271045
 ] 

Petr Slaby commented on PDFBOX-3338:


{quote}
> It has an Apache license, so this isn't a problem.
{quote}
Cool, that saves me some sorrows.

{quote}
I suspect that the encodedByteAlign option isn't supported one would have to 
implement it. See in rev 1581603 and 1581602 / PDFBOX-1074.
{quote}
I can try, seems to be quite straightforward at a first glance.

{quote}
Another problem in that code is "continue" with label. I've never seen that one 
before, ever. When was this added to java?
{quote}
It is there since ever. See e.g. some examples at 
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/branch.html. I hope 
you are just exaggerating with the word "problem"? I find the code much better 
and more readable than the current decoder class in PDFBox. To the least, it 
does not need to jump hence and forth in the input and reads it byte by byte 
instead. Not that I would really understand what is going on in detail in 
either of the implementations. For that, one would have to study the standard 
first. 


> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270954#comment-15270954
 ] 

Petr Slaby commented on PDFBOX-3338:


You mean to test my solution using the Twelve Monkeys implementation? 
Unfortunately, the decoder class in that library is not public, so for my quick 
and dirty test I have just copied it with some minor tweaks to avoid copying 
too many classes. Then, I have used it for the K>1 path only as it was used in 
my PDF. I believe this is the G3 and G32D variant, depending on the value of 
tiffOptions. As for G4, it would not be a big deal, except that I do not see a 
flag for the byte align option in the Twelve Monkeys library. Not sure whether 
it is not supported there or whether this is just lack of knowledge on my side.

Apart from that, I could probably do this. The license of Twelve Monkeys allows 
copying provided that the copyright notice remains in the copied file. (At 
least this is how I understand it, but I am not a lawyer) This is no problem 
for a testing patch. But I do not know whether you could use it if you decide 
to take the solution instead of the current decoder implementation (which 
originally comes from Sun ImageIO and was made freely available by Sun some 
years ago).


> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3338:
---
Attachment: 1.tiff
TestCCITTFaxDecoder.java

> CCITT Fax decoder fails
> ---
>
> Key: PDFBOX-3338
> URL: https://issues.apache.org/jira/browse/PDFBOX-3338
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.8.12, 2.0.1
>Reporter: Petr Slaby
> Attachments: 1.tiff, TestCCITTFaxDecoder.java
>
>
> I have a PDF which does not render in PDFBox. It contains pages from a 
> scanner, encoded as CCITT Fax Tiffs. On each page, the decoder always runs 
> into IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the 
> same message just with "white" instead of "black"). Unfortunately, the PDF 
> contains sensitive data and I cannot share it.
> As a test, I have replaced the TIFFFaxDecoder by the class 
> CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked 
> fine after that and PDFToImage produced the expected result. 
> I have extracted the first few bytes of the TIFF to show the problem without 
> sharing the confidential content. See the attached test program and test file.
> I have tested this against latest trunk version of PDFBox, but I think the 
> decoder implementation is basically the same in all versions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-3338) CCITT Fax decoder fails

2016-05-04 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-3338:
--

 Summary: CCITT Fax decoder fails
 Key: PDFBOX-3338
 URL: https://issues.apache.org/jira/browse/PDFBOX-3338
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.1, 1.8.12
Reporter: Petr Slaby


I have a PDF which does not render in PDFBox. It contains pages from a scanner, 
encoded as CCITT Fax Tiffs. On each page, the decoder always runs into 
IOException("TIFFFaxDecoder: EOL encountered in black run.")  (or the same 
message just with "white" instead of "black"). Unfortunately, the PDF contains 
sensitive data and I cannot share it.

As a test, I have replaced the TIFFFaxDecoder by the class 
CCITTFaxDecoderStream from the Twelve Monkeys ImageIO library. All worked fine 
after that and PDFToImage produced the expected result. 

I have extracted the first few bytes of the TIFF to show the problem without 
sharing the confidential content. See the attached test program and test file.

I have tested this against latest trunk version of PDFBox, but I think the 
decoder implementation is basically the same in all versions. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-3191) PDFDebugger does not handle cancelling of "Open URL" dialog

2016-01-12 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-3191:
---
Summary: PDFDebugger does not handle cancelling of "Open URL" dialog  (was: 
PDFDebugger does not handle cancelling of the "Open URL")

> PDFDebugger does not handle cancelling of "Open URL" dialog
> ---
>
> Key: PDFBOX-3191
> URL: https://issues.apache.org/jira/browse/PDFBOX-3191
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Petr Slaby
>Priority: Trivial
>
> In PDFDebugger, click the menu item "Open URL..." and then cancel the dialog. 
> A MalformedURLException caused by a NPE is thrown. After that, it is not 
> possible to open any other file nor to close the application, since both 
> throws a NPE in the code updating the list of last recently used files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3046) Specific PDF prints really (REALLY) slow

2015-10-22 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969796#comment-14969796
 ] 

Petr Slaby commented on PDFBOX-3046:


I have seen this effect in our software using PDFBox a few years ago and I 
believe this PDF has the same problem. The answer lies in the implementation of 
sun.print.RasterPrintJob. It lets you render the page first into a dummy 
PeekGraphics which just searches for the types of graphical objects you want to 
use. Next, for each transparent bitmap it finds, it creates a BufferedImage 
having the size and position of that bitmap and calls the Printable.print() 
method again passing the BufferedImage g2d surface to it, to "flatten" all 
layers into the bitmap. Put a breakpoint into PDFPrintable.print() and, with 
this specific PDF, you will see that you arrive into it million times, always 
printing the same page 1 again and again. The one I was analyzing when I have 
seen the problem for the first time, was using a bitmap font, each character 
being a single small transparent bitmap. I believe the one attached here will 
be the same.

The only remedy I found is to render the whole page into a bitmap first and 
only then pass it to the java printing API. Since printing huge bitmaps is not 
desired in general, I count the transparent bitmaps in the page to be printed 
first and resort to the full page bitmap printing only if there are "many".

> Specific PDF prints really (REALLY) slow
> 
>
> Key: PDFBOX-3046
> URL: https://issues.apache.org/jira/browse/PDFBOX-3046
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: Windows 10
>Reporter: Teon Metselaar
> Attachments: mspubcol.pdf, mspubcol.prn
>
>
> On Windows 10 I have printed a test page using the MS Publisher Color Printer 
> (which outputs a Postscript-file) and converted that file to PDF using 
> GhostScript ps2pdf.
> The resulting single-page PDF file is printed really, really slow (180-190 
> seconds) while other documents (even generated using ps2pdf) print a lot 
> faster (some seconds).
> I can't figure out why this is. I guess it has someting to do with the used 
> font, but other PDF printing libraries (jPedal, jPDFPrint) are able print the 
> same documents in a couple of seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3000) Transparency Group issues

2015-10-02 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940964#comment-14940964
 ] 

Petr Slaby commented on PDFBOX-3000:


Being the original author of the transparency groups contribution in 
PDFBOX-2104, I have tried to look into this again. The good news is that the 
file attached to PDFBOX-2104 renders fine with the patch from John. It might be 
correct for the first time ever, the image shadow on the first page is missing 
in all the 2.0 reference renderings I have in my repository since December 
2014. Unfortunately, I have no older renderings, so I cannot tell whether we 
got it wrong already at the beginning or whether it got broken later. We have 
this working correctly in our source code based on PDFBox 1.7, but it seems to 
be too hard for me to figure out what exactly needs to be done to successfully 
port our implementation to PDFBox 2.0.

The only thing I do not like about John's patch is that it creates a full page 
bitmap to render the transparency group. I have tried to bring back the 
original idea of creating a bitmap according to the intersection of the 
bleeding box of the group and of the current clip path. After some trials, I 
get the same results as John on my test documents using the attached patch. 
Maybe someone with more insight into all the transforms can use it as a 
starting point to get this right.

> Transparency Group issues
> -
>
> Key: PDFBOX-3000
> URL: https://issues.apache.org/jira/browse/PDFBOX-3000
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.0
>Reporter: John Hewson
> Fix For: 2.1.0
>
> Attachments: softmask-rewrite.patch
>
>
> This is a follow-up issue for transparency group issues from PDFBOX-2423. 
> More details to come.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2985) Potential NPE in PDMarkedContent#getMCID()

2015-09-22 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2985:
--

 Summary: Potential NPE in PDMarkedContent#getMCID()
 Key: PDFBOX-2985
 URL: https://issues.apache.org/jira/browse/PDFBOX-2985
 Project: PDFBox
  Issue Type: Bug
Reporter: Petr Slaby


I do not have a test case, but this method in PDMarkedContent is obviously 
wrong:

{noformat}
public int getMCID()
{
return this.getProperties() == null ? null :
this.getProperties().getInt(COSName.MCID);
}
{noformat}

if getProperties() is null, the method tries to convert null Integer value to 
an int. I believe the intention was rather:
{noformat}
...
return this.getProperties() == null ? 0 :
...
{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2971) CalGray white rendered as cyan

2015-09-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2971:
---
Attachment: ExternesDokument_modif1.jpg

> CalGray white rendered as cyan
> --
>
> Key: PDFBOX-2971
> URL: https://issues.apache.org/jira/browse/PDFBOX-2971
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Petr Slaby
>Priority: Minor
> Attachments: ExternesDokument_modif.pdf, ExternesDokument_modif1.jpg
>
>
> The attached PDF uses CalGray colors. When converted to a jpeg using 
> PdfToImage, there is a cyan rectangle visible. Acrobat shows the same 
> rectangle as white.
> The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value 
> after applying gamma is 1.0, i.e. white was intended. The class PDCalGray 
> multiplies the value by the white point to get X, Y, Z and sends it to the 
> java built-in CIEXYZ profile to convert it into sRGB. I believe the problem 
> is that the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we 
> need to adapt the white point before sending the values to it. There are 
> several methods to do that, but the easiest one is a simple scaling. In our 
> case it would meant to multiply the color value by the CIEXYZ white point 
> instead of the white point given in the CalGray.
> I would not like to pretend that I am an expert in this area. I found the 
> information in the internet and in the java sources of ColorSpace and 
> ICC_ColorSpace and this is how I interpret it. An insight of someone who 
> really understands the color management stuff would be appreciated. But my 
> main point is that the result looks different compared to what is shown in 
> Acrobat.
> The PDF originally comes from a customer and contains text above the 
> rectangles. I have removed the texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2971) CalGray white rendered as cyan

2015-09-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2971:
---
Attachment: ExternesDokument_modif.pdf

> CalGray white rendered as cyan
> --
>
> Key: PDFBOX-2971
> URL: https://issues.apache.org/jira/browse/PDFBOX-2971
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Petr Slaby
>Priority: Minor
> Attachments: ExternesDokument_modif.pdf
>
>
> The attached PDF uses CalGray colors. When converted to a jpeg using 
> PdfToImage, there is a cyan rectangle visible. Acrobat shows the same 
> rectangle as white.
> The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value 
> after applying gamma is 1.0, i.e. white was intended. The class PDCalGray 
> multiplies the value by the white point to get X, Y, Z and sends it to the 
> java built-in CIEXYZ profile to convert it into sRGB. I believe the problem 
> is that the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we 
> need to adapt the white point before sending the values to it. There are 
> several methods to do that, but the easiest one is a simple scaling. In our 
> case it would meant to multiply the color value by the CIEXYZ white point 
> instead of the white point given in the CalGray.
> I would not like to pretend that I am an expert in this area. I found the 
> information in the internet and in the java sources of ColorSpace and 
> ICC_ColorSpace and this is how I interpret it. An insight of someone who 
> really understands the color management stuff would be appreciated. But my 
> main point is that the result looks different compared to what is shown in 
> Acrobat.
> The PDF originally comes from a customer and contains text above the 
> rectangles. I have removed the texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2971) CalGray white rendered as cyan

2015-09-16 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2971:
--

 Summary: CalGray white rendered as cyan
 Key: PDFBOX-2971
 URL: https://issues.apache.org/jira/browse/PDFBOX-2971
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor


The attached PDF uses CalGray colors. When converted to a jpeg using 
PdfToImage, there is a cyan rectangle visible. Acrobat shows the same rectangle 
as white.

The PDF uses a CalGray having white point (0.9505, 1, 1.089). The color value 
after applying gamma is 1.0, i.e. white was intended. The class PDCalGray 
multiplies the value by the white point to get X, Y, Z and sends it to the java 
built-in CIEXYZ profile to convert it into sRGB. I believe the problem is that 
the white point of CIEXYZ in java is (0.9642, 1., 0.8249) and we need to 
adapt the white point before sending the values to it. There are several 
methods to do that, but the easiest one is a simple scaling. In our case it 
would meant to multiply the color value by the CIEXYZ white point instead of 
the white point given in the CalGray.

I would not like to pretend that I am an expert in this area. I found the 
information in the internet and in the java sources of ColorSpace and 
ICC_ColorSpace and this is how I interpret it. An insight of someone who really 
understands the color management stuff would be appreciated. But my main point 
is that the result looks different compared to what is shown in Acrobat.

The PDF originally comes from a customer and contains text above the 
rectangles. I have removed the texts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2905) Replace PDFReader with PDFDebugger

2015-08-05 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655265#comment-14655265
 ] 

Petr Slaby commented on PDFBOX-2905:


Personally, I would never consider the public classes in the tools project to 
be a public and stable API of any sort. For me, an API of a tool is its main 
class and the parameters supported by the method main() in that class. A simple 
note in documentation making this clear would be enough for me. Or is there 
something in the tools project that is designed to be used in a different way 
than just calling its main()? 

 Replace PDFReader with PDFDebugger
 --

 Key: PDFBOX-2905
 URL: https://issues.apache.org/jira/browse/PDFBOX-2905
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
Priority: Minor
 Attachments: 007087-payment-due.pdf


 As discussed on the mailing list:
 {quote}
 Here's an idea: if we switch PDFDebugger to using View Pages by default, it 
 will no longer be confusing for casual users. I've found myself using this 
 mode most of the time anyway. We can add page up/down too, of course - 
 preferably using the actual Page Up and Page Down keys rather than the 
 bizarre choice of the +/- keys which are currently used in PDFReader.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class

2015-04-13 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492071#comment-14492071
 ] 

Petr Slaby commented on PDFBOX-2692:


Yes, that should do nicely. Thanks.

 Possibility to use our own and/or overwrite PageDrawer class
 

 Key: PDFBOX-2692
 URL: https://issues.apache.org/jira/browse/PDFBOX-2692
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk
Reporter: Manfred Pock
Assignee: Andreas Lehmkühler
  Labels: features
 Fix For: 2.0.0

 Attachments: pdfexample.jpg


 We use PDFBox to render PDF's. Additionally, we have the posibility to add 
 different kinds of annotation (stamp, marks, free text, notes..) like in a 
 wysiwyg-editor. To do this, it is necessary that we paint these annotations 
 on our own.
 Another reason is not to paint all parts: for example we have a pdf with an 
 embedded picture. Behind the picture we have the OCR-text to this picture. 
 This text is only needed for searching und should not be painted.
 Thus it would be useful to use our own derived PageDrawer. As I see there are 
 some things to change.
 a.) remove the final from PagerDrawer-class.
 b.) make some global-variables (graphics, xform, pageSize...) protected,
 c.) also some methods like setRenderingHints should be protected
 d.) maybe the possibility to say to the PDFRender which PageDrawer should be 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class

2015-04-08 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485173#comment-14485173
 ] 

Petr Slaby commented on PDFBOX-2692:


How would you know that the color in setColor() belongs to the element that is 
supposed to be green? As far as I understand the description from Daniel 
Wilson, the application is not really change all red colors to green, but 
change some of the elements on page to be green.

 Possibility to use our own and/or overwrite PageDrawer class
 

 Key: PDFBOX-2692
 URL: https://issues.apache.org/jira/browse/PDFBOX-2692
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk
Reporter: Manfred Pock
Assignee: Andreas Lehmkühler
  Labels: features
 Fix For: 2.0.0

 Attachments: pdfexample.jpg


 We use PDFBox to render PDF's. Additionally, we have the posibility to add 
 different kinds of annotation (stamp, marks, free text, notes..) like in a 
 wysiwyg-editor. To do this, it is necessary that we paint these annotations 
 on our own.
 Another reason is not to paint all parts: for example we have a pdf with an 
 embedded picture. Behind the picture we have the OCR-text to this picture. 
 This text is only needed for searching und should not be painted.
 Thus it would be useful to use our own derived PageDrawer. As I see there are 
 some things to change.
 a.) remove the final from PagerDrawer-class.
 b.) make some global-variables (graphics, xform, pageSize...) protected,
 c.) also some methods like setRenderingHints should be protected
 d.) maybe the possibility to say to the PDFRender which PageDrawer should be 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class

2015-04-08 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485169#comment-14485169
 ] 

Petr Slaby commented on PDFBOX-2692:


Basically, that means I have to re-implement the PageDrawer by myself, or? I 
need all of its functionality, including transparency groups and all the other 
logic it contains. I just need to intervene in showFontGlyph, or even 
drawGlyph2D to tell to the target renderer draw a character instead of fill 
path - if the renderer is capable of handling fonts. So in the end, I would 
copy/paste the whole PageDrawer, make the copy non-final, inherit from it and 
override one or two methods. I am fine with that - after all, we have copied 
the whole PDFBox source in 1.8.x. But at the moment, not even that is possible 
as TilingPattern and Glyph2D and its implementations are not public. Meaning I 
would have to copy/paste even more classes.

Our target format renderers already have a Graphics2D implementation, passing 
it to PageDrawer.drawPage() is a perfect fit. I just need something 
corresponding to Graphics2D.drawGlyphVector() to be called instead of 
graphics.fill() when rendering a character. E.g. declare a special public 
interface, having methods like drawGlyph and fillGlyph with the parameters 
being PDFont, Glyph2D (or the GeneralPath it produces, but without the 
transformation being applied), character code and the transformation. The 
methods would be called instead of graphics.fill() or graphics.draw() in 
drawGlyph2D() if the graphics instance implements the interface. Passing 
Glyph2D instead of GeneralPath should be faster as my renderers only need the 
GeneralPath once for each character to create it in the on-the-fly font.

 Possibility to use our own and/or overwrite PageDrawer class
 

 Key: PDFBOX-2692
 URL: https://issues.apache.org/jira/browse/PDFBOX-2692
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk
Reporter: Manfred Pock
Assignee: Andreas Lehmkühler
  Labels: features
 Fix For: 2.0.0

 Attachments: pdfexample.jpg


 We use PDFBox to render PDF's. Additionally, we have the posibility to add 
 different kinds of annotation (stamp, marks, free text, notes..) like in a 
 wysiwyg-editor. To do this, it is necessary that we paint these annotations 
 on our own.
 Another reason is not to paint all parts: for example we have a pdf with an 
 embedded picture. Behind the picture we have the OCR-text to this picture. 
 This text is only needed for searching und should not be painted.
 Thus it would be useful to use our own derived PageDrawer. As I see there are 
 some things to change.
 a.) remove the final from PagerDrawer-class.
 b.) make some global-variables (graphics, xform, pageSize...) protected,
 c.) also some methods like setRenderingHints should be protected
 d.) maybe the possibility to say to the PDFRender which PageDrawer should be 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class

2015-04-08 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486071#comment-14486071
 ] 

Petr Slaby commented on PDFBOX-2692:


Yep, we fall back to a bitmaps if we are not able to express something in the 
target format. In the worst case scenario, we produce a bitmap for the whole 
page. But for a simple PDF containing just text, we want to produce native AFP 
or PCL code using fonts rather than bitmaps or vectors.

 Possibility to use our own and/or overwrite PageDrawer class
 

 Key: PDFBOX-2692
 URL: https://issues.apache.org/jira/browse/PDFBOX-2692
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk
Reporter: Manfred Pock
Assignee: Andreas Lehmkühler
  Labels: features
 Fix For: 2.0.0

 Attachments: pdfexample.jpg


 We use PDFBox to render PDF's. Additionally, we have the posibility to add 
 different kinds of annotation (stamp, marks, free text, notes..) like in a 
 wysiwyg-editor. To do this, it is necessary that we paint these annotations 
 on our own.
 Another reason is not to paint all parts: for example we have a pdf with an 
 embedded picture. Behind the picture we have the OCR-text to this picture. 
 This text is only needed for searching und should not be painted.
 Thus it would be useful to use our own derived PageDrawer. As I see there are 
 some things to change.
 a.) remove the final from PagerDrawer-class.
 b.) make some global-variables (graphics, xform, pageSize...) protected,
 c.) also some methods like setRenderingHints should be protected
 d.) maybe the possibility to say to the PDFRender which PageDrawer should be 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2692) Possibility to use our own and/or overwrite PageDrawer class

2015-03-23 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375894#comment-14375894
 ] 

Petr Slaby commented on PDFBOX-2692:


+1
In our application, we use PDFBox to render PDFs  to AFP, PCL and PostScript. 
When rendering text, the target format renderer needs to know the text and its 
font to be able to use text operations and fonts in the respective target 
format language (there is a configurable font mapping to use pre-prepared fonts 
or a possibility to generate fonts on the fly). In our clone of PDFBox 1.8.x, 
we did that by getting the font information from GlyphVector in 
g2d.drawGlyphVector(). In PDFBox 2.0, text is rendered as Shapes, so the 
underlying G2D implementation has not even a chance to know that a text is 
being rendered. With the possibility to override PageDrawer, I could intercept 
showFontGlyph to tell the G2D implementation that the next fill() or draw() is 
in fact drawing a letter in a given font.

 Possibility to use our own and/or overwrite PageDrawer class
 

 Key: PDFBOX-2692
 URL: https://issues.apache.org/jira/browse/PDFBOX-2692
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
 Environment: JDK 1.8, Windows 7, PDF-Box - current trunk
Reporter: Manfred Pock
Assignee: Andreas Lehmkühler
  Labels: features
 Fix For: 2.0.0

 Attachments: pdfexample.jpg


 We use PDFBox to render PDF's. Additionally, we have the posibility to add 
 different kinds of annotation (stamp, marks, free text, notes..) like in a 
 wysiwyg-editor. To do this, it is necessary that we paint these annotations 
 on our own.
 Another reason is not to paint all parts: for example we have a pdf with an 
 embedded picture. Behind the picture we have the OCR-text to this picture. 
 This text is only needed for searching und should not be painted.
 Thus it would be useful to use our own derived PageDrawer. As I see there are 
 some things to change.
 a.) remove the final from PagerDrawer-class.
 b.) make some global-variables (graphics, xform, pageSize...) protected,
 c.) also some methods like setRenderingHints should be protected
 d.) maybe the possibility to say to the PDFRender which PageDrawer should be 
 used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-2727) Cache color space instances

2015-03-23 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2727:
--

 Summary: Cache color space instances
 Key: PDFBOX-2727
 URL: https://issues.apache.org/jira/browse/PDFBOX-2727
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby


I have a PDF from a customer which contains a lot of calls of 
SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded 
color profile resource is loaded via ICC_Profile.getInstance(InputStream). I 
have attempted to cache the result in PDResources.java as shown in the attached 
patch. For this particular PDF, this change improves the performance of 
PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I cannot 
share the customer PDF, so I have attempted to find a similar free one. 
Unfortunately, in my test suite, I did not find anything with a comparable 
improvement. The best example I found is in the attached PDF. There the 
improvement is from 4.9 seconds without caching to 4.1 with caching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2727) Cache color space instances

2015-03-23 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2727:
---
Attachment: 000435.pdf
PDResources.java.patch

 Cache color space instances
 ---

 Key: PDFBOX-2727
 URL: https://issues.apache.org/jira/browse/PDFBOX-2727
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: 000435.pdf, PDResources.java.patch


 I have a PDF from a customer which contains a lot of calls of 
 SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded 
 color profile resource is loaded via ICC_Profile.getInstance(InputStream). I 
 have attempted to cache the result in PDResources.java as shown in the 
 attached patch. For this particular PDF, this change improves the performance 
 of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I 
 cannot share the customer PDF, so I have attempted to find a similar free 
 one. Unfortunately, in my test suite, I did not find anything with a 
 comparable improvement. The best example I found is in the attached PDF. 
 There the improvement is from 4.9 seconds without caching to 4.1 with caching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-2727) Cache color space instances

2015-03-23 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2727:
---
Priority: Minor  (was: Major)

 Cache color space instances
 ---

 Key: PDFBOX-2727
 URL: https://issues.apache.org/jira/browse/PDFBOX-2727
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 000435.pdf, PDResources.java.patch


 I have a PDF from a customer which contains a lot of calls of 
 SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded 
 color profile resource is loaded via ICC_Profile.getInstance(InputStream). I 
 have attempted to cache the result in PDResources.java as shown in the 
 attached patch. For this particular PDF, this change improves the performance 
 of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I 
 cannot share the customer PDF, so I have attempted to find a similar free 
 one. Unfortunately, in my test suite, I did not find anything with a 
 comparable improvement. The best example I found is in the attached PDF. 
 There the improvement is from 4.9 seconds without caching to 4.1 with caching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2727) Cache color space instances

2015-03-23 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376818#comment-14376818
 ] 

Petr Slaby commented on PDFBOX-2727:


Note: In the PDResources constructor, I have noticed a todo comment stating 
that PDResources should be instantiated and cached on a per COSDictionary 
base, indicating that a proper caching solution might be more than my simple 
patch. Indeed, the cached color space instances should rather be bound to 
COSDictionary than to PDResources as multiple PDResources instances are created 
for a single COSDictionary.

Also, I have tried to cache also fonts created from font resources in the same 
way, but without any noticeable performance gain in my test suite.

 Cache color space instances
 ---

 Key: PDFBOX-2727
 URL: https://issues.apache.org/jira/browse/PDFBOX-2727
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 000435.pdf, PDResources.java.patch


 I have a PDF from a customer which contains a lot of calls of 
 SetNonStrokingColorSpace and SetStrokingColorSpace. Each time, an embedded 
 color profile resource is loaded via ICC_Profile.getInstance(InputStream). I 
 have attempted to cache the result in PDResources.java as shown in the 
 attached patch. For this particular PDF, this change improves the performance 
 of PDFToImage from 27 seconds down to 5 seconds (the PDF has two pages). I 
 cannot share the customer PDF, so I have attempted to find a similar free 
 one. Unfortunately, in my test suite, I did not find anything with a 
 comparable improvement. The best example I found is in the attached PDF. 
 There the improvement is from 4.9 seconds without caching to 4.1 with caching.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2576) Improve code quality

2015-03-23 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376789#comment-14376789
 ] 

Petr Slaby commented on PDFBOX-2576:


[~tilman]: Is the static modifier in COSNull.writePDF() intended? It produces 
a warning in COSWriter.visitFromNull() - static methods should be accessed in 
a static way. All the other COS objects have a non-static writePDF(), so I 
assume this one should not be static either?

 Improve code quality
 

 Key: PDFBOX-2576
 URL: https://issues.apache.org/jira/browse/PDFBOX-2576
 Project: PDFBox
  Issue Type: Task
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
 Attachments: GraphicsOperatorProcessor.patch, 
 SecuryHandlerFactory.patch, org.apache.fontbox.afm.patch, 
 org.apache.fontbox.cff.cffparser.patch, org.apache.fontbox.cff.patch, 
 org.apache.fontbox.cmap.patch, 
 org.apache.pdfbox.contentstream.operator.state.patch, 
 org.apache.pdfbox.cos.patch, org.apache.pdfbox.filter-2.patch, 
 org.apache.pdfbox.filter.patch, 
 org.apache.pdfbox.pdmodel.documentinterchange.logicalstructure.patch, 
 org.apache.pdfbox.pdmodel.documentinterchange.patch, 
 org.apache.pdfbox.preflight.graphic.patch, pdfbox-override-patch.txt, 
 pdfbox-raw-type-patch.txt, pdfcloneutility-patch.txt, 
 pdftextstripperbyarea-patch.txt, ttfsubsetter-2.patch, ttfsubsetter-3.patch, 
 ttfsubsetter-patch.txt


 This is a longterm issue for the task to improve code quality, by using the 
 [SonarQube 
 report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
  hints in different IDEs, the FindBugs tool and other code quality tools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2539) [PATCH] Allow non static FontProvider

2014-12-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234168#comment-14234168
 ] 

Petr Slaby commented on PDFBOX-2539:


+1

 [PATCH] Allow non static FontProvider
 -

 Key: PDFBOX-2539
 URL: https://issues.apache.org/jira/browse/PDFBOX-2539
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.0
Reporter: simon steiner
 Attachments: fontProvider.patch


 I would like to use multiple instances of fontprovider in thread safe way



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts

2014-09-10 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128445#comment-14128445
 ] 

Petr Slaby commented on PDFBOX-2262:


[~jahewson]: I assume the semicolon at the end of line 133 of CMap.java is not 
intended? 
{noformat}
if (range.isPartialMatch(bytes.get(i), i));
{noformat}

 Remove usage of AWT fonts
 -

 Key: PDFBOX-2262
 URL: https://issues.apache.org/jira/browse/PDFBOX-2262
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
 Fix For: 2.0.0

 Attachments: Basiswissen-Vorschriften.pdf, 
 Basiswissen-Vorschriften.pdf-1.png, 
 Basiswissen-Vorschriften.pdf-1.png-diff.png, 
 Basiswissen-Vorschriften.pdf-9.png, 
 Basiswissen-Vorschriften.pdf-9.png-diff.png, 
 ELVIA-Reiserucktritt-Vollschutz.pdf-1.png, FreeSansTest.pdf, 
 PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png, 
 PDF_Spec-Shading-23.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png-diff.png, 
 bugzilla867751.pdf-2.png, bugzilla867751.pdf-2.png-diff.png, 
 bugzilla886049.pdf, bugzilla886049.pdf-1.png, test_1fd9a_test.pdf


 We're still using AWT fonts to render the standard 14 built-in fonts, which 
 causes rendering problems and encoding issues (see  PDFBOX-2140). We're also 
 using AWT for some fallback fonts.
 Removal of these AWT fonts isn't too difficult, we need to load the fonts 
 using the existing PDFFontManager mechanism which has recently been added. 
 All missing TrueType fonts loaded from disk have been using SystemFontManager 
 for a number of weeks now. 
 We should ship some sensible default fonts with PDFBox, such as the 
 Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't 
 find anything suitable, rather than falling back to the default TTF font, but 
 by default we'll probe the system for suitable fonts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts

2014-09-10 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128539#comment-14128539
 ] 

Petr Slaby commented on PDFBOX-2262:


[~tilman]: Not really, or not easily. Given the amount of changes in pdfbox and 
the pile of other work I have, I gave up updating my pdfbox test suite in the 
last few months. I have just noticed the semicolon because of a warning I can 
see in Eclipse on that line (empty control flow statement). The condition 
should either not be there at all as it does nothing, or the semicolon should 
be removed. For the moment, I suggest to wait for John's opinion, rather than 
spending time in running test suites. I think he will know what the code is 
supposed to do.

 Remove usage of AWT fonts
 -

 Key: PDFBOX-2262
 URL: https://issues.apache.org/jira/browse/PDFBOX-2262
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
 Fix For: 2.0.0

 Attachments: Basiswissen-Vorschriften.pdf, 
 Basiswissen-Vorschriften.pdf-1.png, 
 Basiswissen-Vorschriften.pdf-1.png-diff.png, 
 Basiswissen-Vorschriften.pdf-9.png, 
 Basiswissen-Vorschriften.pdf-9.png-diff.png, 
 ELVIA-Reiserucktritt-Vollschutz.pdf-1.png, FreeSansTest.pdf, 
 PDFBOX-1094-094730.pdf-1.png, PDFBOX-1770.pdf-1.png, 
 PDF_Spec-Shading-23.pdf-1.png, PDF_Spec-Shading-23.pdf-1.png-diff.png, 
 bugzilla867751.pdf-2.png, bugzilla867751.pdf-2.png-diff.png, 
 bugzilla886049.pdf, bugzilla886049.pdf-1.png, test_1fd9a_test.pdf


 We're still using AWT fonts to render the standard 14 built-in fonts, which 
 causes rendering problems and encoding issues (see  PDFBOX-2140). We're also 
 using AWT for some fallback fonts.
 Removal of these AWT fonts isn't too difficult, we need to load the fonts 
 using the existing PDFFontManager mechanism which has recently been added. 
 All missing TrueType fonts loaded from disk have been using SystemFontManager 
 for a number of weeks now. 
 We should ship some sensible default fonts with PDFBox, such as the 
 Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't 
 find anything suitable, rather than falling back to the default TTF font, but 
 by default we'll probe the system for suitable fonts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager

2014-08-27 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112945#comment-14112945
 ] 

Petr Slaby commented on PDFBOX-2144:


{quote}
I think you'll just have to make sure that you don't change the configuration 
while pages are rendering.
{quote}
That is not possible. The application is big, pdf rendering just a small part 
of it. I cannot change the whole application for the sake of it. As mentioned 
before, the application is designed to accept a new config at any time. The new 
config is supposed to be valid for jobs started after its activation, while the 
previously started and  yet-not-finished jobs have to continue using the old 
one.

{quote}
 it'll play nice with PDFBox's internal static state 
{quote}
You scare me. What is static and where? I believed that state is bound to 
instances of PageDrawer or PDGraphicsState and the like.

I do not have enough insight to really understand all your reasons, but... if 
FileSystemFontProvider is implemented as a singleton then it will do the same 
as it is doing now when being called from a static method of ExternalFonts. 
Just replace your static methods by a factory and bind the instance to 
somewhere (I thought PageDrawer or PDFStreamEngine is the right place) - and we 
are on the same boat :-)

However, as mentioned before, I think I will be able to bind my font 
configurations to thread instances so that this is not such a big issue for me 
and your current solution should be fine.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
Assignee: John Hewson
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager

2014-08-19 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101938#comment-14101938
 ] 

Petr Slaby commented on PDFBOX-2144:


{quote}
Regarding the static configuration, what aspects of the configuration were you 
expecting to change while PDFs are being processed? Are you talking about using 
a specific FontProvider for a given PDF? If so why? This is certainly something 
we could think about if I can get my head around the use case.
{quote}
Our application runs in an application server, many things can happen in 
parallel there. Our configuration is stored in a database and can be changed 
while the application is running. When changing the configuration, the 
application might be in a middle of a rendering (or even in a middle of many 
renderings). It is expected that the already running renderings finish the job 
with the old configuration, while anything that has been started after the 
commit of a new configuration uses the new one. The configuration contains many 
settings, among others fonts to be used to render PDFs via PDFBox. I have to be 
able to change the fonts available to FontProvider at runtime and in a way that 
keeps the original configuration untouched for renderings that have already 
been started.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager

2014-08-18 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100421#comment-14100421
 ] 

Petr Slaby commented on PDFBOX-2144:


[~jahewson]: Thanks, that's basically exactly what I need. Just two questions. 

In the old solution, AWT fonts were used to provide all missing fonts. The new 
FontProvider interface has separate methods for substitution of ttf, cff and 
type 1 fonts. Is it so that a PDF references an external type 1 font and I have 
to provide a type 1 font then? Or is this used just when creating PDFs and the 
normal way of getting an external font for rendering is 
ExternalFonts.getType1EquivalentFont() which can return any of the flavors?

Also, I do not like the static methods in ExternalFonts so much. In our 
environment, the configuration can change while the application is running. In 
such case, renderings which have already been started have to use the old 
configuration, renderings which will start later should use the new one. For 
that, I would need the ExternalFonts to have an instance bound to PageDrawer. 
It is a minor problem, though. I can probably solve it by binding the active 
font configuration to current thread while rendering.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2255) Text not rendered bold

2014-08-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085343#comment-14085343
 ] 

Petr Slaby commented on PDFBOX-2255:


I think bold is produced using text rendering mode fill + strike. As far as 
I can tell, the file renders fine with the patch I have proposed in PDFBOX-678

 Text not rendered bold 
 ---

 Key: PDFBOX-2255
 URL: https://issues.apache.org/jira/browse/PDFBOX-2255
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner

 File from PDFBOX-265
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
 PDFBOX265-problem.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs

2014-07-16 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063320#comment-14063320
 ] 

Petr Slaby commented on PDFBOX-2210:


We have a similar problem. In our application, we produce (among others)  PCL 
and AFP output. For PDFBox, we have a PCL and AFP specific implementation of 
Graphics2D which produces the commands in the respective printer language. In 
the old solution, fillGlyphVector or drawGlyphVector was called for printing 
characters using AWT fonts. From the glyph vector, we were able to get at the 
AWT font and the character(s) being printed. From that, we were able to pick an 
existing PCL or AFP font if an equivalent for the AWT font was configured, or 
produce an on-the-fly font and embed it into the output. With the current 
solution, we just get  a shape and do not even know that it is coming from 
rendering of text. I did not try to solve this yet, but I think I will probably 
need PageDrawer.drawGlyph2D() to become part of the API (make it protected 
instead of private) so that I can intercept it and call something else on our 
special G2D implementation. When producing on-the-fly fonts, we need some font 
metrics information - like ascend, descent and width of each character, etc.  
For that, I would need to put some more information into Glyph2D, e.g. have a 
reference to the underlying PDFont. 

 [PATCH] Allow caching of glyphs
 ---

 Key: PDFBOX-2210
 URL: https://issues.apache.org/jira/browse/PDFBOX-2210
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: John Hewson
 Attachments: drawglyphs.patch


 If you seperate transform from glyph it means we can reuse glyphs in fop 
 postscript output and get smaller output files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow

2014-07-11 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058549#comment-14058549
 ] 

Petr Slaby commented on PDFBOX-2117:


Just a hint or question. At the end of getRaster(), the cached values are 
always unnormalized by e.g. (int) (values[0] * 255). Why not cache the 
unnormalized values right away, then? You could put the three values into a 
single int to reduce memory consumption and to avoid the c.clone() in ColorRGB. 
But maybe I missed something, I did not really try to change the code this way.

As for the comparison of the three methods of implementing AxialShadingContext, 
the scan line precomputation is the fastest of course, especially as it only 
counts the values at positions rounded to an int. I have run the test again on 
three files that use the axial shading and measured total time spent in the 
constructor and getRaster(). The times are in milliseconds.

||File||Trunk||Shaola||My patch||
|shading_pattern.pdf|67055|557|1534|
|color_gradient.pdf|72622|1002|2461|
|missing_image.pdf|34897|376|29672|



 AxialShadingContext is slow
 ---

 Key: PDFBOX-2117
 URL: https://issues.apache.org/jira/browse/PDFBOX-2117
 Project: PDFBox
  Issue Type: Sub-task
  Components: Rendering
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, 
 AxialShading1.patch, AxialShadingContext.java.getrgbimage, 
 GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, 
 GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, 
 Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, 
 color_gradient.pdf, shading_pattern.pdf


 AxialShadingContext#getRaster() is on top of profiler hot spots in documents 
 that use an axial shading. Inside it, the slowest part is calling 
 PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order).
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow

2014-07-11 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058731#comment-14058731
 ] 

Petr Slaby commented on PDFBOX-2117:


Shaola, another idea in the similar direction - what about using a simple array 
instead of the HashMap? As far as I can tell, you have an array of values from 
zero to n. Why not put them into an array and use the array index instead of 
HashMap key?

 AxialShadingContext is slow
 ---

 Key: PDFBOX-2117
 URL: https://issues.apache.org/jira/browse/PDFBOX-2117
 Project: PDFBox
  Issue Type: Sub-task
  Components: Rendering
Reporter: Petr Slaby
 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, 
 AxialShading1.patch, AxialShadingContext.java.getrgbimage, 
 GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, 
 GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, 
 Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, 
 color_gradient.pdf, shading_pattern.pdf


 AxialShadingContext#getRaster() is on top of profiler hot spots in documents 
 that use an axial shading. Inside it, the slowest part is calling 
 PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order).
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-678) Support missing Text Rendering Modes when rendering a PDF

2014-07-04 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052274#comment-14052274
 ] 

Petr Slaby commented on PDFBOX-678:
---

Implementing this seems to be fairly easy in current trunk (with the exception 
of Type3 fonts), see the attached patch. Why not do it?

 Support missing Text Rendering Modes when rendering a PDF
 -

 Key: PDFBOX-678
 URL: https://issues.apache.org/jira/browse/PDFBOX-678
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Maruan Sahyoun
 Attachments: Java Printing.pdf, TextRenderingModes.java.patch


 Of the 7 different Text Rendering Modes only mode 0 (Fill Text) is correctly 
 implemented. Mode 1 (Stroke Text) falls back to Mode 0 and the others are not 
 implemented. I'm looking to implement the missing modes (at least some of 
 them).
 Before doing so I'm proposing a structural change to when rendering really 
 occurs. Currently it's done within the PDxxxFont classes. I'd rather 
 implement the (AWT) text output in PageDrawer (or helper classes within the 
 same package) and use the font classes to return an AWT font by adding a 
 getAwtFont method. Doing so we get a better separation between the PDF 
 related stuff (PDxxx) and applications like PageDrawer. The current rendering 
 specific code within the PDxxxFont classes can be retained for compatibility 
 and marked deprecated at a later stage.
 WDYT?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-678) Support missing Text Rendering Modes when rendering a PDF

2014-07-04 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-678:
--

Attachment: TextRenderingModes.java.patch

 Support missing Text Rendering Modes when rendering a PDF
 -

 Key: PDFBOX-678
 URL: https://issues.apache.org/jira/browse/PDFBOX-678
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Maruan Sahyoun
 Attachments: Java Printing.pdf, TextRenderingModes.java.patch


 Of the 7 different Text Rendering Modes only mode 0 (Fill Text) is correctly 
 implemented. Mode 1 (Stroke Text) falls back to Mode 0 and the others are not 
 implemented. I'm looking to implement the missing modes (at least some of 
 them).
 Before doing so I'm proposing a structural change to when rendering really 
 occurs. Currently it's done within the PDxxxFont classes. I'd rather 
 implement the (AWT) text output in PageDrawer (or helper classes within the 
 same package) and use the font classes to return an AWT font by adding a 
 getAwtFont method. Doing so we get a better separation between the PDF 
 related stuff (PDxxx) and applications like PageDrawer. The current rendering 
 specific code within the PDxxxFont classes can be retained for compatibility 
 and marked deprecated at a later stage.
 WDYT?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2185) Rotation and skew not applied on rectangles

2014-07-04 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2185:
--

 Summary: Rotation and skew not applied on rectangles
 Key: PDFBOX-2185
 URL: https://issues.apache.org/jira/browse/PDFBOX-2185
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby


When rendering the attached example, rotation and skew of rectangles is not 
applied properly. The reason is that the AppendRectangleToPath transform only 
start and end point and makes a non-rotated non-skewed result out of that. 
Instead, each corner of the rectangle has to be transformed separately as shown 
in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2185) Rotation and skew not applied on rectangles

2014-07-04 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2185:
---

Attachment: AppendRectangleToPath.java.patch
example_013.pdf

 Rotation and skew not applied on rectangles
 ---

 Key: PDFBOX-2185
 URL: https://issues.apache.org/jira/browse/PDFBOX-2185
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: AppendRectangleToPath.java.patch, example_013.pdf


 When rendering the attached example, rotation and skew of rectangles is not 
 applied properly. The reason is that the AppendRectangleToPath transform only 
 start and end point and makes a non-rotated non-skewed result out of that. 
 Instead, each corner of the rectangle has to be transformed separately as 
 shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1997) CIE LAB item missing in rendering

2014-07-03 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051903#comment-14051903
 ] 

Petr Slaby commented on PDFBOX-1997:


Works fine for me, thanks. But I found just a single file using LAB color 
space, so that is no proof :-)

 CIE LAB item missing in rendering
 -

 Key: PDFBOX-1997
 URL: https://issues.apache.org/jira/browse/PDFBOX-1997
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
  Labels: regression
 Attachments: text_graphic_image.pdf, text_graphic_image.pdf-1.png


 The file from PDFBOX-1681 is missing the CIELAB output, it was there a few 
 weeks ago.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-03 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051940#comment-14051940
 ] 

Petr Slaby commented on PDFBOX-1915:


In my test suite, I have one rendering fixed and no regressions. Cool, thanks. 

My only complaint is the performance. The attached file needs several minutes 
to render, especially the second page needs way too long. Without really 
understanding the algorithms,  I had a look at 
PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and 
search for them instead of looping through all and checking which one contains 
the current point? The loop continues even after a matching triangle has been 
found. Could you at least break there? Also, the row/col loops always shift the 
current point by one. Isn't it likely that the same triangle or its neighbor 
will get a hit?

Just ideas, keep up the good work. 

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the 

[jira] [Comment Edited] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-03 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051940#comment-14051940
 ] 

Petr Slaby edited comment on PDFBOX-1915 at 7/3/14 9:42 PM:


In my test suite, I have one rendering fixed and no regressions. Cool, thanks. 

My only complaint is the performance. The attached file (example_030.pdf) needs 
several minutes to render, especially the second page needs way too long. 
Without really understanding the algorithms,  I had a look at 
PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and 
search for them instead of looping through all and checking which one contains 
the current point? The loop continues even after a matching triangle has been 
found. Could you at least break there? Also, the row/col loops always shift the 
current point by one. Isn't it likely that the same triangle or its neighbor 
will get a hit?

Just ideas, keep up the good work. 


was (Author: pslabycz):
In my test suite, I have one rendering fixed and no regressions. Cool, thanks. 

My only complaint is the performance. The attached file needs several minutes 
to render, especially the second page needs way too long. Without really 
understanding the algorithms,  I had a look at 
PatchMeshesShadingContext#getRaster(). Could you perhaps sort the triangles and 
search for them instead of looping through all and checking which one contains 
the current point? The loop continues even after a matching triangle has been 
found. Could you at least break there? Also, the row/col loops always shift the 
current point by one. Isn't it likely that the same triangle or its neighbor 
will get a hit?

Just ideas, keep up the good work. 

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new 

[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-07-02 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049697#comment-14049697
 ] 

Petr Slaby commented on PDFBOX-2126:


The fix for PDFBOX-1772 is to reset the lastClip before restoring the graphics 
in the TransparencyGroup constructor, as the clipping was applied to a 
different graphics than what we are going to use now. 

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, 
 PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, 
 screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2126) Optimize clipping

2014-07-02 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2126:
---

Attachment: example_014.pdf

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, 
 PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, 
 pdfbox-1772.pdf-1-good.png, screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2126) Optimize clipping

2014-07-02 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2126:
---

Attachment: ClipPath.2.patch

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, 
 PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, 
 pdfbox-1772.pdf-1-good.png, screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-07-02 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049857#comment-14049857
 ] 

Petr Slaby commented on PDFBOX-2126:


... or save and restore lastClip as shown in the attached ClipPath.2.patch. The 
patch also resets the lastClip before in processSubStream when painting 
annotation. This is necessary e.g. for the attached example_014.pdf containing 
an AcroForm.

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, 
 PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, 
 pdfbox-1772.pdf-1-good.png, screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2126) Optimize clipping

2014-07-02 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049857#comment-14049857
 ] 

Petr Slaby edited comment on PDFBOX-2126 at 7/2/14 12:19 PM:
-

... or save and restore lastClip as shown in the attached ClipPath.2.patch. The 
patch also resets the lastClip before calling processSubStream in annotation 
processing. This is necessary e.g. for the attached example_014.pdf containing 
an AcroForm.


was (Author: pslabycz):
... or save and restore lastClip as shown in the attached ClipPath.2.patch. The 
patch also resets the lastClip before in processSubStream when painting 
annotation. This is necessary e.g. for the attached example_014.pdf containing 
an AcroForm.

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.2.patch, ClipPath.patch, 
 PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, example_014.pdf, 
 pdfbox-1772.pdf-1-good.png, screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2176) Ignore IllegalArgumentException when reading an ICCProfile

2014-07-02 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2176:
---

Attachment: 49.pdf

 Ignore IllegalArgumentException when reading an ICCProfile
 --

 Key: PDFBOX-2176
 URL: https://issues.apache.org/jira/browse/PDFBOX-2176
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 49.pdf


 A java.lang.IllegalArgumentException: Invalid ICC Profile Data is thrown 
 from PDICCBase#loadICCProfile() when rendering the attached PDF. The code 
 already checks for and ignores ProfileDataException and CMMException at this 
 place. 
 IllegalArgumentException is thrown if the profile header data is completely 
 corrupt, either there is not even the 128 header bytes or the profile size 
 found in header does not match the size of data.
 The exception is ignored in 1.8, in 2.0 it is re-thrown. I think ignoring the 
 exception and using an alternate color space is better and consistent with 
 the handling of the other two expected exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2176) Ignore IllegalArgumentException when reading an ICCProfile

2014-07-02 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2176:
--

 Summary: Ignore IllegalArgumentException when reading an ICCProfile
 Key: PDFBOX-2176
 URL: https://issues.apache.org/jira/browse/PDFBOX-2176
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 49.pdf

A java.lang.IllegalArgumentException: Invalid ICC Profile Data is thrown from 
PDICCBase#loadICCProfile() when rendering the attached PDF. The code already 
checks for and ignores ProfileDataException and CMMException at this place. 

IllegalArgumentException is thrown if the profile header data is completely 
corrupt, either there is not even the 128 header bytes or the profile size 
found in header does not match the size of data.

The exception is ignored in 1.8, in 2.0 it is re-thrown. I think ignoring the 
exception and using an alternate color space is better and consistent with the 
handling of the other two expected exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2180) LAB color space produces wrong colors

2014-07-02 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2180:
--

 Summary: LAB color space produces wrong colors
 Key: PDFBOX-2180
 URL: https://issues.apache.org/jira/browse/PDFBOX-2180
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor


The attached example uses LAB colors. When rendering it using PDFToImage, the 
result is kind of violet, instead of black text and yellow rectangles (see the 
attached jpeg). 

When comparing 1.8 sources with current trunk, the incoming values are scaled 
to range in trunk PDLab#toRGB(), while this was not the case in 1.8 
PDColorState and ColorSpaceLab. As far as I can tell, in 1.8 the values were 
only clipped to range in ColorSpaceLab#toCIEXYZ(). 

If I remove the scaling in 2.0 the rendering is correct.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2180) LAB color space produces wrong colors

2014-07-02 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2180:
---

Attachment: 0003521.jpg
000352.pdf

 LAB color space produces wrong colors
 -

 Key: PDFBOX-2180
 URL: https://issues.apache.org/jira/browse/PDFBOX-2180
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel, Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 000352.pdf, 0003521.jpg


 The attached example uses LAB colors. When rendering it using PDFToImage, the 
 result is kind of violet, instead of black text and yellow rectangles (see 
 the attached jpeg). 
 When comparing 1.8 sources with current trunk, the incoming values are 
 scaled to range in trunk PDLab#toRGB(), while this was not the case in 1.8 
 PDColorState and ColorSpaceLab. As far as I can tell, in 1.8 the values were 
 only clipped to range in ColorSpaceLab#toCIEXYZ(). 
 If I remove the scaling in 2.0 the rendering is correct.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-07-01 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049136#comment-14049136
 ] 

Petr Slaby commented on PDFBOX-2126:


[~jahewson]: In my original code, I was resetting the clipping path (lastClip = 
null;) just before processSubStream in drawPage, because that's exactly where 
the G2D transform changes. Your commits did not have that, maybe that was the 
reason of the regression? 

I must say I am not able to understand how your last commit works. It seems 
just to check whether the clip has changed in G2D, but not whether a new clip 
has been set in PDGraphicsState?

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, 
 PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, 
 screenshot.png


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-06-30 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047723#comment-14047723
 ] 

Petr Slaby commented on PDFBOX-2126:


I have removed clippingPath.clone() in my patch, the cloned PDGraphicsState 
uses a pointer to the same clipping path then. A new clipping object is only 
created in setClippingPath() (resp. intersectClippingPath()). This enables me 
to use the lastClip == currentClip condition in PageDrawer.applyClipping() to 
avoid applying the clip if it did not change. 

After the change from storing GeneralPath to storing Area, I thought the 
clippingPath.clone() in PDGraphicsState.clone() would be inevitable, but I can 
shift it to intersectClippingPath() as well.

I will post an updated patch again as soon as possible, but unfortunately I am 
quite overwhelmed by my daily business right now. 

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-06-28 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046964#comment-14046964
 ] 

Petr Slaby commented on PDFBOX-2126:


I have already tried to replace the GeneralPath field by an Area field in 
PDGraphicsState myself, but then reverted it again. The problem is that Area is 
a slow and hungry beast. It seems that with a complex clipping path, it is 
better to give it back to garbage collector as soon as possible. Also, looking 
at the implementation of SunGraphics2D, the first thing that happens in clip() 
is that the shape is transformed using 
AffineTransform.createTransformedShape(). This is optimized (a little) for 
Path2D shapes, but not for Area.

I have also tried using an alternative implementation of Area found at 
https://javagraphics.java.net/areax/. Compared to java.awt.Area, it is much 
faster and needs less memory when it comes to complex areas. It seems to be a 
little bit slower with simple rectangle areas. It has a modified BSD license 
that  is fine for me, but I am personally not yet convinced whether it is worth 
having yet another third party library in the product. I am not sure whether it 
is compatible with the apache license, but for sure it is worth having a look 
at it just out of interest. It is amazing how a smart guy outperforms a big 
team at sun or oracle (albeit in a very small part of the library, of course).

As for my original idea of applying the clipping path only if it has changed 
since being applied for the last time - your commit is a bit problematic for 
that. Because of the clippingPath.clone() in PDGraphicsState.clone() which is 
now necessary, I cannot use a simple comparison using lastClip == currentClip, 
I can probably solve it by having a counter in 
PDGraphcisState.intersectClippingPath() to keep track of how many times the 
clip has changed and what was the last change that was applied to Graphics2D.

I will try to compare the performance of the code using the GeneralPath with 
the current one using Area again. I think that my code where I already tried 
that was quite similar to yours, but we will see.

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2126) Optimize clipping

2014-06-26 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2126:
---

Attachment: ClipPath.1.patch

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2126) Optimize clipping

2014-06-26 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045190#comment-14045190
 ] 

Petr Slaby commented on PDFBOX-2126:


Attached updated patch against latest trunk. 

I have moved the intersection of current clipping path with new clipping into 
PDGraphicsState#setCurrentClippingPath() to avoid duplicate code on places 
where this method is called (a second call has been added in PDFBOX-1875 now). 

There is a performance optimization in the computation of intersection trying 
to avoid creation of Area wherever possible. I do not insist on that, but it 
brought a big performance and memory consumption improvement on a corner case 
PDF where a very complex clipping path is used. It brought also some one pixel 
differences on a few of my test files, but I am not able to decide whether the 
before or after is correct. 

I am still hesitant to make the clip path immutable and private to the 
PDGraphicsState as that would mean to make a clone of input in setClipPath() 
and either a clone of output in getClipPath() or an introduction of methods 
like applyClipping(Graphics2D) and fillClipPath(Graphics2D) in PDGraphicsState. 

 Optimize clipping
 -

 Key: PDFBOX-2126
 URL: https://issues.apache.org/jira/browse/PDFBOX-2126
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
 Attachments: ClipPath.1.patch, ClipPath.patch, example_010.pdf


 As already stated in a TODO comment in PageDrawer, the call of 
 Graphics2D#setClip() is time and memory consuming. The attached patch 
 optimizes clipping by calling Graphics2D#setClip() only if the clipping path 
 has changed. The effect depends on the document, e.g. the attached one 
 renders in 10.5s without the optimization and in 5.5 seconds in the optimized 
 version.
 The clipping has to be re-applied whenever the transform in Graphics2D 
 changes. This is not explicitly checked for, the implementation rather 
 depends on the cached value being reset manually. Currently this is only 
 needed at one place when processing annotations (AcroForms). Also, the 
 implementation relies upon the clipping path object stored in PDGraphicsState 
 to never change so that a comparison using == can be used. This works fine, 
 but needs a bit of awareness in future changes. To make the design more 
 clean, the clipping path could be made private to PDGraphcisState and thus 
 really immutable from outside.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager

2014-06-25 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044035#comment-14044035
 ] 

Petr Slaby commented on PDFBOX-2144:


Yes, exactly.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2141) Shading not applied to text

2014-06-23 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041157#comment-14041157
 ] 

Petr Slaby commented on PDFBOX-2141:


I see. Why not do the same, then - apply the transform to the path instead of 
the graphics? The following works like magic on your test file. I am just not 
sure whether it might negatively affect performance and I tested it with this 
single file only (pattern-shading-2-4-idMatrix.pdf). 

Alternatively, we might either be able to compute the right transformation from 
the deviceBounds/userBounds in AxialShadingPaint#createContext()  (I am not 
sure, I just think the information we need might be in there), or use a custom 
rendering hint key and pass the additional transform along with the rendering 
hints.

{noformat}
private void drawGlyph2D(Glyph2D glyph2D, int[] codePoints, AffineTransform 
at) throws IOException
{
graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, 
RenderingHints.VALUE_ANTIALIAS_ON);
for (int i = 0; i  codePoints.length; i++)
{
GeneralPath path = glyph2D.getPathForCharacterCode(codePoints[i]);
if (path != null)
{
if (!at.isIdentity())
{
path = (GeneralPath) path.clone();
path.transform(at);
}
graphics.fill(path);
}
}
}
{noformat}
 

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 1.8.7, 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch, 
 pattern-shading-2-4-idMatrix.pdf, pattern-shading-2-4-idMatrix.pdf, 
 pattern-shading-2-4-idMatrix1.jpg, pattern-shading-2-4-noMatrix.pdf, 
 pattern-shading-2-4.ps, pattern-shading-2-4.ps


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2141) Shading not applied to text

2014-06-22 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040233#comment-14040233
 ] 

Petr Slaby commented on PDFBOX-2141:


{quote}
pattern-shading-2-4-idMatrix.pdf ...
{quote}
But the problem does not seem to be related to this issue. At least I get an 
identical rendering before and after the change made in revision 1604282.

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 1.8.7, 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch, 
 pattern-shading-2-4-idMatrix.pdf, pattern-shading-2-4-idMatrix1.jpg, 
 pattern-shading-2-4.ps


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2141) Shading not applied to text

2014-06-21 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039959#comment-14039959
 ] 

Petr Slaby commented on PDFBOX-2141:


My observation was that the coordinates arriving to the AxialShadingContext in 
getRaster() were not what the shading expects. Not applying the transform to 
the graphics fixed it. 

The problem will be that, regardless of whether the transform is applied to the 
graphics or to the glyphs, the coordinates arriving to getRaster() are always 
the same. However, the transform applied to the cords field in the constructor 
of AxialShadingContext is the one that was set to the graphics, i.e. it is a 
different one if the graphics was transformed. 

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2141) Shading not applied to text

2014-06-21 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039959#comment-14039959
 ] 

Petr Slaby edited comment on PDFBOX-2141 at 6/21/14 9:46 PM:
-

My observation was that the coordinates arriving to the AxialShadingContext in 
getRaster() were not what the shading expects. Not applying the transform to 
the graphics fixed it. 

The problem will be that, regardless of whether the transform is applied to the 
graphics or to the glyphs, the coordinates arriving to getRaster() are always 
the same. However, the transform applied to the cords field in the constructor 
of AxialShadingContext is the one that was set to the graphics, i.e. it is a 
different one if the graphics was transformed. 

So yes, I agree.


was (Author: pslabycz):
My observation was that the coordinates arriving to the AxialShadingContext in 
getRaster() were not what the shading expects. Not applying the transform to 
the graphics fixed it. 

The problem will be that, regardless of whether the transform is applied to the 
graphics or to the glyphs, the coordinates arriving to getRaster() are always 
the same. However, the transform applied to the cords field in the constructor 
of AxialShadingContext is the one that was set to the graphics, i.e. it is a 
different one if the graphics was transformed. 

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2149) Font Refactoring

2014-06-20 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038567#comment-14038567
 ] 

Petr Slaby commented on PDFBOX-2149:


Attached a file which runs into a NPE in PDFont#isSymbolicFont() now.
{noformat}
Caused by: java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.font.PDFont.isSymbolicFont(PDFont.java:694)
at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getGIDForCharacterCode(PDTrueTypeFont.java:408)
at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:378)
at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312)
at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44)
at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:259)
at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:226)
at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:209)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:175)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:227)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:190)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPageToGraphics(PDFRenderer.java:174)
{noformat}

 Font Refactoring
 

 Key: PDFBOX-2149
 URL: https://issues.apache.org/jira/browse/PDFBOX-2149
 Project: PDFBox
  Issue Type: Improvement
  Components: FontBox, PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
 Attachments: 000467.pdf


 To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need 
 to sort out long-standing font/text encoding issues. The main issue is that 
 encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, 
 sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this 
 code is copy  pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and 
 Encodings despite the fact that these two encoding methods are mutually 
 exclusive. The end result is that the process of reading Encodings/CMaps is 
 often following rules which are completely invalid for that font type but 
 mostly work by luck.
 Phase 1
 - Refactor PDFont subclasses to remove setXXX methods which allow the object 
 to be corrupted. Proper use of inheritance can remove all cases where public 
 setXXX methods are used during font loading.
 - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF 
 embedding, FontBox's TrueTypeFont class is externally mutable via setXXX 
 methods used only by TTFParser: these can be made package-private.
 - the Encoding class and EncodingManager could do with some cleaning up prior 
 to further refactoring.
 - PDSimpleFont does not do anything, its functionality should be moved into 
 its superclass, PDFont.
 - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, 
 and vice versa. Loading needs to be pushed down into the appropriate 
 subclasses, as a starting point the relevant code should at least be copied 
 into the relevant subclasses ready for further refactoring.
 - TTFGlyph2D does its own decoding of char codes, rather than using the 
 font's #encode method (fair enough because #encode is broken) and there's a 
 copy and pasted version of the same code in PDTrueTypeFont - we need to 
 consolidate this code into PDTrueTypeFont where it belongs.
 Phase 2
 - Refactor loading of CMaps and Encodings from font dictionaries, this will 
 involve changes to PDFont and its subclasses to delegate loading to 
 subclasses where it can be properly encapsulated
 - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as 
 CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its 
 CMap. We'll see.
 Phase 3
 - Refactor the decoding of character codes by PDFont and its subclasses, this 
 will involve replacing the #getCodeFromArray, #encode and #encodeToCID 
 methods.
 - Fix decoding of content stream character codes in PDFStreamEngine, using 
 the newly refactored PDFont and using the current font's CMap to determine 
 the code width.
 Phase 4
 - Add support for generating embedded TTFs with Unicode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2149) Font Refactoring

2014-06-20 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2149:
---

Attachment: 000467.pdf

 Font Refactoring
 

 Key: PDFBOX-2149
 URL: https://issues.apache.org/jira/browse/PDFBOX-2149
 Project: PDFBox
  Issue Type: Improvement
  Components: FontBox, PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
 Attachments: 000467.pdf


 To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need 
 to sort out long-standing font/text encoding issues. The main issue is that 
 encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, 
 sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this 
 code is copy  pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and 
 Encodings despite the fact that these two encoding methods are mutually 
 exclusive. The end result is that the process of reading Encodings/CMaps is 
 often following rules which are completely invalid for that font type but 
 mostly work by luck.
 Phase 1
 - Refactor PDFont subclasses to remove setXXX methods which allow the object 
 to be corrupted. Proper use of inheritance can remove all cases where public 
 setXXX methods are used during font loading.
 - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF 
 embedding, FontBox's TrueTypeFont class is externally mutable via setXXX 
 methods used only by TTFParser: these can be made package-private.
 - the Encoding class and EncodingManager could do with some cleaning up prior 
 to further refactoring.
 - PDSimpleFont does not do anything, its functionality should be moved into 
 its superclass, PDFont.
 - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, 
 and vice versa. Loading needs to be pushed down into the appropriate 
 subclasses, as a starting point the relevant code should at least be copied 
 into the relevant subclasses ready for further refactoring.
 - TTFGlyph2D does its own decoding of char codes, rather than using the 
 font's #encode method (fair enough because #encode is broken) and there's a 
 copy and pasted version of the same code in PDTrueTypeFont - we need to 
 consolidate this code into PDTrueTypeFont where it belongs.
 Phase 2
 - Refactor loading of CMaps and Encodings from font dictionaries, this will 
 involve changes to PDFont and its subclasses to delegate loading to 
 subclasses where it can be properly encapsulated
 - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as 
 CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its 
 CMap. We'll see.
 Phase 3
 - Refactor the decoding of character codes by PDFont and its subclasses, this 
 will involve replacing the #getCodeFromArray, #encode and #encodeToCID 
 methods.
 - Fix decoding of content stream character codes in PDFStreamEngine, using 
 the newly refactored PDFont and using the current font's CMap to determine 
 the code width.
 Phase 4
 - Add support for generating embedded TTFs with Unicode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2149) Font Refactoring

2014-06-20 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2149:
---

Attachment: 39.pdf

Here is another one. Hope this helps.

 Font Refactoring
 

 Key: PDFBOX-2149
 URL: https://issues.apache.org/jira/browse/PDFBOX-2149
 Project: PDFBox
  Issue Type: Improvement
  Components: FontBox, PDModel
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
 Attachments: 39.pdf, 000467.pdf


 To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need 
 to sort out long-standing font/text encoding issues. The main issue is that 
 encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, 
 sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this 
 code is copy  pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and 
 Encodings despite the fact that these two encoding methods are mutually 
 exclusive. The end result is that the process of reading Encodings/CMaps is 
 often following rules which are completely invalid for that font type but 
 mostly work by luck.
 Phase 1
 - Refactor PDFont subclasses to remove setXXX methods which allow the object 
 to be corrupted. Proper use of inheritance can remove all cases where public 
 setXXX methods are used during font loading.
 - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF 
 embedding, FontBox's TrueTypeFont class is externally mutable via setXXX 
 methods used only by TTFParser: these can be made package-private.
 - the Encoding class and EncodingManager could do with some cleaning up prior 
 to further refactoring.
 - PDSimpleFont does not do anything, its functionality should be moved into 
 its superclass, PDFont.
 - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, 
 and vice versa. Loading needs to be pushed down into the appropriate 
 subclasses, as a starting point the relevant code should at least be copied 
 into the relevant subclasses ready for further refactoring.
 - TTFGlyph2D does its own decoding of char codes, rather than using the 
 font's #encode method (fair enough because #encode is broken) and there's a 
 copy and pasted version of the same code in PDTrueTypeFont - we need to 
 consolidate this code into PDTrueTypeFont where it belongs.
 Phase 2
 - Refactor loading of CMaps and Encodings from font dictionaries, this will 
 involve changes to PDFont and its subclasses to delegate loading to 
 subclasses where it can be properly encapsulated
 - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as 
 CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its 
 CMap. We'll see.
 Phase 3
 - Refactor the decoding of character codes by PDFont and its subclasses, this 
 will involve replacing the #getCodeFromArray, #encode and #encodeToCID 
 methods.
 - Fix decoding of content stream character codes in PDFStreamEngine, using 
 the newly refactored PDFont and using the current font's CMap to determine 
 the code width.
 Phase 4
 - Add support for generating embedded TTFs with Unicode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2153) Setting the correct clipping path for shading

2014-06-20 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038604#comment-14038604
 ] 

Petr Slaby commented on PDFBOX-2153:


Sounds reasonable. Current clipping path is passed to graphics.fill(), so if 
the graphics has a clipping path from a previous operation, it might interfere 
with that. I vote for setClip(null) because setClip() is a time and memory 
consuming operation if called with a complex path.

The change does not show any effect on my test suite documents, it seems that I 
do not have an example that would be affected. 

 Setting the correct clipping path for shading
 -

 Key: PDFBOX-2153
 URL: https://issues.apache.org/jira/browse/PDFBOX-2153
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Reporter: Tilman Hausherr
  Labels: shading, shadingpattern

 While doing tests with the file eci_altona-test-suite-v2_technical_H.pdf 
 (uncompressed) of PDFBOX-1915 I noticed that by removing a W (modifies the 
 clipping region) operator of a type 7 shading I got a lot more correct 
 shadings (type 6 and lower). It looked like PDFBox had been using the 
 clipping of the type 7 when drawing the type 6, which is just a rectangle 
 above in that rendering. This resulted in a blank.
 By adding 
 {code}
 graphics.setClip(getGraphicsState().getCurrentClippingPath());
 {code}
 in PageDrawer.shfill() just before the graphics.fill() I get several files to 
 render correctly that I hadn't before.
 (Setting null will probably do the same, didn't test that yet).
 The following PDFs are rendered correctly with the change:
 McAfee-ShadingType7.pdf
 eci_altona-test-suite-v2_technical_H.pdf
 crestron-p9.pdf  (these three found in PDFBOX-1915)
 PDFBOX-1451.pdf (alfresco)
 PDFBOX-1940.pdf (chart)
 PDFBOX-1861-tracemonkey.pdf p.11
 Not solved by the change:
 PDFBOX-2098-asyTUG.pdf p.6  (this one doesn't use shfill)
 PDFBOX-1861-tracemonkey.pdf p.6 (not shading)
 PDFBOX-1416.pdf (not shading)
 texample-rgb-triangle.pdf (John has an explanation about that one)
 WDYT? Is there any reason NOT to set the clipping path in PageDrawer.shFill() 
 ?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2145) Clean up PDFStreamEngine and PDFTextStripper

2014-06-18 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036392#comment-14036392
 ] 

Petr Slaby commented on PDFBOX-2145:


After the change of TextPosition, the fields x and y are useless. Previously, 
they were used to cache the value in getX() resp. getY()

 Clean up PDFStreamEngine and PDFTextStripper
 

 Key: PDFBOX-2145
 URL: https://issues.apache.org/jira/browse/PDFBOX-2145
 Project: PDFBox
  Issue Type: Improvement
  Components: Text extraction
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
Priority: Minor

 PDFStreamEngine and PDFTextStripper don't really meet our coding conventions 
 and have several unused methods and deprecated code which can safely be 
 removed.
 This should clear the way to fixing some bugs in PDFStreamEngine, 
 PDFTextStripper and the various PDFont classes related to text encoding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2145) Clean up PDFStreamEngine and PDFTextStripper

2014-06-17 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034373#comment-14034373
 ] 

Petr Slaby commented on PDFBOX-2145:


Speaking about clean up - the way my Eclipse is configured, I get about 200 
warnings on pdfbox code - unused imports, potential null pointer access, 
redundant null check, undocumented empty blocks, unnecessary semicolon - to 
name but a few. Usage of such warning is a matter of personal taste or team 
rules - what are yours? Do you intend to clean up at least some of these 
warnings? Mostly, this does not bring any real improvements, but still. E.g. 
following the warnings in PDSeedValue reveals a duplicated null check which 
does not make any obvious sense. 

 Clean up PDFStreamEngine and PDFTextStripper
 

 Key: PDFBOX-2145
 URL: https://issues.apache.org/jira/browse/PDFBOX-2145
 Project: PDFBox
  Issue Type: Improvement
  Components: Text extraction
Affects Versions: 2.0.0
Reporter: John Hewson
Assignee: John Hewson
Priority: Minor

 PDFStreamEngine and PDFTextStripper don't really meet our coding conventions 
 and have several unused methods and deprecated code which can safely be 
 removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2104) Implement transparency groups

2014-06-17 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034393#comment-14034393
 ] 

Petr Slaby commented on PDFBOX-2104:


Cool, thanks.

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: John Hewson
  Labels: transparency
 Fix For: 2.0.0

 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, 
 TransparencyGroups.2.patch, TransparencyGroups.3.patch, 
 TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2104) Implement transparency groups

2014-06-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2104:
---

Attachment: TransparencyGroups.3.patch

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: John Hewson
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, 
 TransparencyGroups.2.patch, TransparencyGroups.3.patch, 
 TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2104) Implement transparency groups

2014-06-16 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032363#comment-14032363
 ] 

Petr Slaby commented on PDFBOX-2104:


I have refactored the code according to your suggestions. Only
{quote}
* In PageDrawer the following graphics state is constructed, but it is never 
used: 

{quote}

This code does not construct a new graphics state, it changes some settings in 
the current one. 

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: John Hewson
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, 
 TransparencyGroups.2.patch, TransparencyGroups.3.patch, 
 TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2144) Provide a pluggable font manager

2014-06-16 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2144:
--

 Summary: Provide a pluggable font manager
 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch

Our J2EE application has all fonts and resources configured and stored in its 
database. No files are accessed directly from file system or from system 
environment. To make PDFBox compatible with this philosophy, we need the 
FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
attached patch.

The proposal defines a FontManager interface and default implementation which 
is the original one. FontManager then needs to be configured on and propagated 
from PDFStreamEngine and PageDrawer. It should also be configurable on 
PDFRenderer, which is not shown in the patch. There I would suggest to 
introduce a configuration object which would take care about all the current 
and future options of PDFRenderer.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager

2014-06-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2144:
---

Attachment: FontManager.patch

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2144) Provide a pluggable font manager

2014-06-16 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032896#comment-14032896
 ] 

Petr Slaby commented on PDFBOX-2144:


Sorry, that was not intended. Anyway, the patch just shows what I need. Someone 
more familiar with pdfbox API design and its intentions has to decide whether 
or how such a feature can be implemented.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager

2014-06-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2144:
---

Attachment: (was: FontManager.patch)

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby

 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2144) Provide a pluggable font manager

2014-06-16 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2144:
---

Attachment: FontManager.patch

Fixed the license agreements in file headers. Sorry again, will try to be more 
careful next time.

 Provide a pluggable font manager
 

 Key: PDFBOX-2144
 URL: https://issues.apache.org/jira/browse/PDFBOX-2144
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Reporter: Petr Slaby
 Attachments: FontManager.patch


 Our J2EE application has all fonts and resources configured and stored in its 
 database. No files are accessed directly from file system or from system 
 environment. To make PDFBox compatible with this philosophy, we need the 
 FontManager in pdfbox and fontbox to be pluggable, e.g. as shown in the 
 attached patch.
 The proposal defines a FontManager interface and default implementation which 
 is the original one. FontManager then needs to be configured on and 
 propagated from PDFStreamEngine and PageDrawer. It should also be 
 configurable on PDFRenderer, which is not shown in the patch. There I would 
 suggest to introduce a configuration object which would take care about all 
 the current and future options of PDFRenderer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2104) Implement transparency groups

2014-06-13 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2104:
---

Attachment: TransparencyGroups.2.patch

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: John Hewson
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, 
 TransparencyGroups.2.patch, TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2104) Implement transparency groups

2014-06-13 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030435#comment-14030435
 ] 

Petr Slaby commented on PDFBOX-2104:


It is not easy to merge changes from 1.7.x into 2.0... 

You are right, ImagePaintTiling is not used any more and ImagePaint can be 
replaced by TexturePaint. Modified patch is attached, also with slight changes 
towards PDFBox coding style (we use the prefix m on all field names, sorry if 
I forget it somewhere) and a potential NPE fixed in 
PDFormXObject#createPageDrawerGroup (matrix can be null).

 Implement transparency groups
 -

 Key: PDFBOX-2104
 URL: https://issues.apache.org/jira/browse/PDFBOX-2104
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Assignee: John Hewson
 Attachments: 01_MTEXT_CS6.pdf, TransparencyGroups.1.patch, 
 TransparencyGroups.2.patch, TransparencyGroups.patch


 The attached PDF uses transparency groups, blending and soft masks to create 
 the rounded corners and shades behind images. It appears that these features 
 are not implemented in PDFBox. An implementation proposal is attached in the 
 TransparencyGroup.patch. The basic idea is to create a buffered image, draw 
 the transparency group content onto it and then use the result to produce the 
 soft mask or draw the image on the original g2d.
 Note: I am not the (only) author of the proposed change. It was developed in 
 our company few years ago in sources based on a 1.7.x version of PDFBox, 
 mostly by a guy who already left. Over the years, merging of the work done in 
 PDFBox main stream into our source base has become impossible due to many 
 refactorings and other deep going changes done. Now we would like to go the 
 opposite way - where possible - bring the changes and fixes we have done into 
 PDFBox main stream and start to use it in our installations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2141) Shading not applied to text

2014-06-13 Thread Petr Slaby (JIRA)
Petr Slaby created PDFBOX-2141:
--

 Summary: Shading not applied to text
 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor


The attached PDF draws a text filled with horizontal shading going from red to 
blue. When rendered via PDFBox, the text is completely filled with red. The 
problem is that AxialShadingContext#getRaster() gets called with positions that 
completely fell outside of the range stored in its coords[] field. The fix 
seems to be to set glyph transform rather than graphics2d transform in 
PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PDFBOX-2141) Shading not applied to text

2014-06-13 Thread Petr Slaby (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Petr Slaby updated PDFBOX-2141:
---

Attachment: 04_ShadingPatternTextPDF.pdf
PageDrawer.writeFont.java.patch

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, 
 PageDrawer.writeFont.java.patch


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2141) Shading not applied to text

2014-06-13 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031228#comment-14031228
 ] 

Petr Slaby commented on PDFBOX-2141:


The following seems to make the trick, tested on a single file so far, though:
{noformat}
private void writeFont(final AffineTransform at, final GlyphVector glyphs)
{
// Convert from PDF, where glyphs are upright when direction is from
// bottom to top, to AWT, where this is the other way around
at.scale(1, -1);
for(int i=0; iglyphs.getNumGlyphs(); i++)
{
AffineTransform glyphTransform = glyphs.getGlyphTransform(i);
Point2D glyphPos = glyphs.getGlyphPosition(i);
AffineTransform applyTransform;
if(glyphTransform != null || glyphPos.getX() != 0 || 
glyphPos.getY() != 0) {
AffineTransform translate = 
AffineTransform.getTranslateInstance(glyphPos.getX(), glyphPos.getY());
if(glyphTransform != null)
{
translate.concatenate(glyphTransform);
}
translate.preConcatenate(at);
applyTransform = translate;
glyphs.setGlyphPosition(i, new Point2D.Float(0, 0));
}
else 
{
applyTransform = at;
}
glyphs.setGlyphTransform(i, applyTransform);
}
graphics.drawGlyphVector(glyphs, 0, 0);
}
{noformat}

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2141) Shading not applied to text

2014-06-13 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031274#comment-14031274
 ] 

Petr Slaby commented on PDFBOX-2141:


I am just not sure whether the incoming transform can also contain rotation or 
skew and whether you need to take that into account, too. For sure you should 
expect more than two glyphs, although I did not see such an example in my test 
suite. My code seems more generic - it concatenates everything into the glyph 
transform and sets its position to 0,0. On the other hand, less code is usually 
better...

It is getting too late in the night for me now. I will retest both solutions 
with my test suite on Monday, but I assume they will render the same results.

 Shading not applied to text
 ---

 Key: PDFBOX-2141
 URL: https://issues.apache.org/jira/browse/PDFBOX-2141
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.0
Reporter: Petr Slaby
Priority: Minor
 Attachments: 04_ShadingPatternTextPDF.pdf, PDFBOX-1917.pdf-1.png, 
 PDFBOX-1917.pdf-1.png-diff.png, PDFBOX-1917.pdf-9.png, 
 PDFBOX-1917.pdf-9.png-diff.png, PDFBOX-2135.pdf-2.png, 
 PDFBOX-2135.pdf-2.png-diff.png, PageDrawer.writeFont.java.patch


 The attached PDF draws a text filled with horizontal shading going from red 
 to blue. When rendered via PDFBox, the text is completely filled with red. 
 The problem is that AxialShadingContext#getRaster() gets called with 
 positions that completely fell outside of the range stored in its coords[] 
 field. The fix seems to be to set glyph transform rather than graphics2d 
 transform in PageDrawer#writeText() as shown in the attached patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >