On Wednesday, February 4, 2015, Dave Fisher <[email protected]> wrote:
> Yes, it is interesting to me. I know that PDF is a markup that is based on > a set of PostScript functions and an object layout specification. It is not > like PNG - that's a raster bitmap. It is a vector drawing spec. My interest > is pulling out the content - both text and shapes into a useful set of > objects. I am not so interested at this time in other features like forms, > embedded files, and output. > > I can read the PDF into an object structure and output HTML5. I can also > output the objects into roughly equivalent PPTX slides using Apache POI. > > Corinthia comes in two ways for me. > > (1) An HTML5 format that is targeting interchange with Office Document > formats. > > (2) An intermediate format the may be exported in any format that makes > sense. > > So I am looking for Corinthia to allow pluggable DocFormats. plugable filters is something I tried to persuade peter to earlier, maybe it will be easier when the new core API is ready. rgds jan i > > Regards, > Dave > > On Feb 4, 2015, at 11:13 AM, Louis S wrote: > > > > > > > Louis > > > >> On 4 Feb 2015, at 13:55, jan i <[email protected] <javascript:;>> wrote: > >> > >>> On 4 February 2015 at 19:51, Louis S <[email protected] <javascript:;>> > wrote: > >>> > >>> I posted on this to see if pdfbox could offer insight s it is taken up. > >>> Dave pointed out that the functionality of pdfbox ws interesting to his > >>> company. > >>> > >> > >> And I think your posting was interesting information (such information > is > >> needed to see what moves out there). But I do not think we currently > should > >> think about putting it into Corinthia. > >> > > No objections. > > > >> rgds > >> jan i. > >> > >> > >>> Louis > >>> > >>>> On 4 Feb 2015, at 12:03, jan i <[email protected] <javascript:;>> > wrote: > >>>> > >>>> On Wednesday, February 4, 2015, Peter Kelly <[email protected] > <javascript:;>> wrote: > >>>> > >>>>>> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann < > [email protected] <javascript:;> > >>>>> <javascript:;>> wrote: > >>>>>> > >>>>>> Does this have anything to do with Corinthia? No. Corinthia is about > >>>>> content and especially word processing formats (OOXML, ODF etc.).. > >>>>> Corinthia is at its core about pragmatic fidelity. The point of the > >>>>> bidirectional transformation model is to be able to reduce fidelity > >>>>> demands. Unless the project wants to get sidetracked into HiFi > rendering > >>>>> (of DOCX or ODT) it's completely outside of the scope…. > >>>>> > >>>>> I think of PDF in the same way as I do PNG. It’s intended as an > output > >>>>> format, not an input format. I know there are tools out there which > are > >>>>> effectively half of an OCR system which can reconstruct a source > >>> document > >>>>> by inferring the logical structure from the layout (e.g. where a > >>> paragraph > >>>>> begins and ends), though this is quite a difficult problem and I’m > not > >>> sure > >>>>> that it’d be within the scope of Corinthia (though if someone has > ideas > >>> on > >>>>> this and wants to work on it, I’m all for it - it’s just a very > >>> difficult > >>>>> and very different task to writing filters for all the other formats > >>> we’ve > >>>>> discussed). > >>>> > >>>> +1 I think we currently have other more important tasks in corinthia. > >>>> > >>>> > >>>> rgds > >>>> jan i > >>>> > >>>>> > >>>>> On the other side is output to PDF - that is, typesetting. This is > >>>>> something I also think would be outside the scope of the project (at > >>> least > >>>>> based on my understanding of people’s interests to date). We > basically > >>> rely > >>>>> on separate programs to do the typesetting of a document produced by > the > >>>>> library, e.g. LaTeX, WebKit/other browser engines. > >>>>> > >>>>> -- > >>>>> Dr. Peter M. Kelly > >>>>> [email protected] <javascript:;> <javascript:;> > >>>>> http://www.kellypmk.net/ > >>>>> > >>>>> PGP key: http://www.kellypmk.net/pgp-key < > >>> http://www.kellypmk.net/pgp-key> > >>>>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > >>>> > >>>> -- > >>>> Sent from My iPad, sorry for any misspellings. > >>> > > -- Sent from My iPad, sorry for any misspellings.
