Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

John Hewson Tue, 11 Mar 2014 03:29:34 -0700

Great. One more thing...

> To get that completed we need to revisit the PD model as not all features of 
> PDF are reflected in the matching PD model. That could be done when 
> implementing the profiles.


All the PD classes provide access to the underlying COS model, so there’s no 
need to expose low-level details in the PD model.

-- John

On 11 Mar 2014, at 00:24, Maruan Sahyoun <[email protected]> wrote:

> 
>> 
>>> OK - wasn’t precise enough - token types didn’t change but there are newer 
>>> tokens introduced. 
>> 
>> Yes.
>> 
>>> As the syntax has changed do we need version and standards support in the 
>>> parsing phase then?
>> 
>> I don’t think so, no. I don’t know what the use-case would be. You’d have to 
>> go back and read all seven versions of the PDF Reference and make sure that 
>> the parser implements the correct handling for each version, that’s an awful 
>> lot of work.
> 
> OK - so the parser should concentrate on getting the parsing done according 
> to the spec (which is mostly the case with NonSequentialParser today) and we 
> also have a way that there is some standards/relaxed way of parsing for files 
> where the base syntax is not correct as we need to catch such circumstances 
> for standards compliant parsing (which we don’t have in core but in the PDF/A 
> project) but would ignore such errors if they can be corrected for relaxed 
> parsing. 
> 
>> 
>>> Other way would be to parse what’s in there and do validation etc. purely 
>>> on the parsing result (COS model, PD model). Need to do that anyway.
>> 
>> Yes, I prefer this approach, you can always write a tool which inspects a 
>> PDDocument and determines whether or not it uses features available in a 
>> given PDF version. It seems better to do this as a separate feature than to 
>> try and build it into the parser or the PD model directly.
> 
> Fine for me - would be something like a ‚profile' per standard which could be 
> used for validation as well as writing.
> 
> To get that completed we need to revisit the PD model as not all features of 
> PDF are reflected in the matching PD model. That could be done when 
> implementing the profiles.
> 
>> 
>>> What about writing?
>> 
>> Yes, we want versions for writing, because a user may want to generate e.g a 
>> PDF 1.6 file. This is going to be even more important in the near future 
>> because the PDF 2.0 standard is supposed to be introduced in 2014.
> 
> There are some base features missing in writing a PDF today but I think 
> Andreas has something in the works. The ‚profile‘ mentioned above could be 
> used for writing too e.g. to check if PD model keys are permitted for a 
> certain standard/version or not.
> 
>> 
>> -- John
>

Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards

Reply via email to