John

Am 19.03.2014 um 19:10 schrieb John Hewson <j...@jahewson.com>:

> Maruan,
> 
>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>> maintain their own matrix to map to user space which is then transformed 
>>>> by the CTM to device space so handling them specifically is fine and 
>>>> inline with the spec.
>>> 
>>> No, that’s not right, what I said was:
>>> 
>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>> initial coordinate space, rather than the
>>>>> coordinate space defined by the CTM.
>>> 
>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>> achieve.
>>> 
>> 
>> I think you misunderstood what I wrote - patterns have their own matrix - so 
>> I think we are on the same page here. IMHO according to the spec CTM 
>> transforms from user space to device space. So it’s pattern space -> user 
>> space -> device space.
> 
> Nope, as I said, that’s what PDFBox currently does and it’s wrong. As you say 
> the CTM transforms from user space to device space, but it’s not the only way 
> to do so, and it is not used by patterns.

As the processing is defined in the spec this is a good reference so no need to 
discuss that further. Of course different people might come to different 
conclusions by reading and interpreting the spec. 

> 
>> Didn’t mean to only reference to the spec but to use the same terms as 
>> described by the spec. Adding references to the spec is an add-on not a 
>> replacement.
> 
> I don’t see what value this adds, given that the references will just go 
> out-of-date when the next spec is released. We already use the same 
> terminology as the PDF spec, so Ctrl+F can be used for quick look-ups that 
> won’t go out-of-date.

You are not enforced to add the information.

> 
>>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>>> stream, but processStream and processSubStream do not provide this 
>>> information. That’s why I’m discussing this on the mailing list.
>> 
>> I don’t understand why this is shouldn’t be possible. It’s more effort, 
>> agreed, but beneficial.
> 
> 
> What’s not to understand? PDFStreamEngine *needs* to know the parent of each 
> stream, and the old methods don’t provide this, passing a null parent will 
> not work because we need that information later in order to correctly process 
> the stream. If we allowed a null parent to be passed, the result would be 
> silently broken rendering - there’s no value in providing a 
> backwards-compatible API if it can only produce broken results.

Won’t get to the same conclusion here (as I think we won’t get on the other 
topics above).

> 
> -- John
> 
> On 19 Mar 2014, at 10:31, Maruan Sahyoun <sahy...@fileaffairs.de> wrote:
> 
>> John,
>> 
>> Am 19.03.2014 um 18:15 schrieb John Hewson <j...@jahewson.com>:
>> 
>>> Maruan
>>> 
>>>> From how I understand the rendering in PDF Form, Text, Image and Pattern 
>>>> maintain their own matrix to map to user space which is then transformed 
>>>> by the CTM to device space so handling them specifically is fine and 
>>>> inline with the spec.
>>> 
>>> No, that’s not right, what I said was:
>>> 
>>>>> My problem is that tiling patterns are defined in their parent stream’s 
>>>>> initial coordinate space, rather than the
>>>>> coordinate space defined by the CTM.
>>> 
>>> So patterns should *not* be using the CTM, which is what I’m trying to 
>>> achieve.
>>> 
>> 
>> I think you misunderstood what I wrote - patterns have their own matrix - so 
>> I think we are on the same page here. IMHO according to the spec CTM 
>> transforms from user space to device space. So it’s pattern space -> user 
>> space -> device space.
>> 
>> 
>>>> I’d suggest that we make sure that the different ‚spaces‘ are defined 
>>>> properly within the code and refer to the PDF spec so that the code is 
>>>> easier to read if this is not already the case. With so many changes it’s 
>>>> a good opportunity to enhance the documentation within the source code. 
>>>> Some of the old code enjoys very little documentation.
>>> 
>>> 
>>> I disagree, in general I don’t think that references to the PDF spec are a 
>>> good form of documentation (there are some exceptions). References to the 
>>> spec are meaningless to the reader unless they take the time to look them 
>>> up in a 700 page PDF document. I would argue that by just linking back to 
>>> the spec, we have *failed* to document PDFBox, not succeeded.
>>> 
>>> References to the PDF spec have another major flaw: they go out-of-date. 
>>> For example a Pattern Colour Space will always be called “Pattern Colour 
>>> Space” in future versions of the PDF spec but it may not be described in 
>>> paragraph 8.6.6.2 or on page 156. The existing code contains many 
>>> references to the PDF 1.6 and 1.7 specs as well as the ISO PDF32000 spec, 
>>> which means that I need three 700 page PDF files open at all times in order 
>>> to look up PDFBox references. With the new version of the PDF spec due this 
>>> year, this situation is going to get worse.
>>> 
>> 
>> Didn’t mean to only reference to the spec but to use the same terms as 
>> described by the spec. Adding references to the spec is an add-on not a 
>> replacement.
>> 
>>> I agree that some of the existing code needs more documentation, and I 
>>> often add documentation to old files which I’m working on. However, my 
>>> approach is to just paste in a sentence or two from the PDF spec (fair 
>>> use). That way the reader does not ever need to look at the PDF spec. 
>>> Because we use the same terminology in PDFBox as in the spec, if someone 
>>> really wants to look something up, it’s as simple as Ctrl+F, no reference 
>>> needed, and it’s guaranteed not to go out-of-date.
>>> 
>>>> I wouldn’t remove processStream and processSubStream but deprecate them 
>>>> and remove them in the next major release though as to keep the changes to 
>>>> a minimum.
>>> 
>>> This isn’t possible, as I said it "will necessarily be a breaking change”. 
>>> This is because in 2.0 PDFStreamEngine needs to know the parent of each 
>>> stream, but processStream and processSubStream do not provide this 
>>> information. That’s why I’m discussing this on the mailing list.
>> 
>> I don’t understand why this is shouldn’t be possible. It’s more effort, 
>> agreed, but beneficial.
>> 
>>> 
>>>> For the rendering what might have been missed is taking the UserUnit entry 
>>>> in the page dictionary into account which might change the default user 
>>>> space. This was introduced in PDF 1.6. A good opportunity to read that 
>>>> entry and make sure that we handle it appropriately.
>>> 
>>> Yes, I have this as a “todo” in my working copy, however, if we put the 
>>> UserUnit in the matrix then we should also put the page Rotation into the 
>>> matrix, but that’a a significant change.
>>> 
>>> -- John
> 

Reply via email to