Re: Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association : The Apache Software Foundation Blog

Louis S Wed, 04 Feb 2015 11:14:21 -0800


Louis


> On 4 Feb 2015, at 13:55, jan i <[email protected]> wrote:
> 
>> On 4 February 2015 at 19:51, Louis S <[email protected]> wrote:
>> 
>> I posted on this to see if pdfbox could offer insight s it is taken up.
>> Dave pointed out that the functionality of pdfbox ws interesting to his
>> company.
>> 
> 
> And I think your posting was interesting information (such information is
> needed to see what moves out there). But I do not think we currently should
> think about putting it into Corinthia.
> 
No objections.

> rgds
> jan i.
> 
> 
>> Louis
>> 
>>> On 4 Feb 2015, at 12:03, jan i <[email protected]> wrote:
>>> 
>>> On Wednesday, February 4, 2015, Peter Kelly <[email protected]> wrote:
>>> 
>>>>> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann <[email protected]
>>>> <javascript:;>> wrote:
>>>>> 
>>>>> Does this have anything to do with Corinthia? No. Corinthia is about
>>>> content and especially word processing formats (OOXML, ODF etc.)..
>>>> Corinthia is at its core about pragmatic fidelity. The point of the
>>>> bidirectional transformation model is to be able to reduce fidelity
>>>> demands. Unless the project wants to get sidetracked into HiFi rendering
>>>> (of DOCX or ODT) it's completely outside of the scope….
>>>> 
>>>> I think of PDF in the same way as I do PNG. It’s intended as an output
>>>> format, not an input format. I know there are tools out there which are
>>>> effectively half of an OCR system which can reconstruct a source
>> document
>>>> by inferring the logical structure from the layout (e.g. where a
>> paragraph
>>>> begins and ends), though this is quite a difficult problem and I’m not
>> sure
>>>> that it’d be within the scope of Corinthia (though if someone has ideas
>> on
>>>> this and wants to work on it, I’m all for it - it’s just a very
>> difficult
>>>> and very different task to writing filters for all the other formats
>> we’ve
>>>> discussed).
>>> 
>>> +1 I think we currently have other more important tasks in corinthia.
>>> 
>>> 
>>> rgds
>>> jan i
>>> 
>>>> 
>>>> On the other side is output to PDF - that is, typesetting. This is
>>>> something I also think would be outside the scope of the project (at
>> least
>>>> based on my understanding of people’s interests to date). We basically
>> rely
>>>> on separate programs to do the typesetting of a document produced by the
>>>> library, e.g. LaTeX, WebKit/other browser engines.
>>>> 
>>>> --
>>>> Dr. Peter M. Kelly
>>>> [email protected] <javascript:;>
>>>> http://www.kellypmk.net/
>>>> 
>>>> PGP key: http://www.kellypmk.net/pgp-key <
>> http://www.kellypmk.net/pgp-key>
>>>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>>> 
>>> --
>>> Sent from My iPad, sorry for any misspellings.
>>

Re: Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association : The Apache Software Foundation Blog

Reply via email to