On Wednesday, February 4, 2015, Dave Fisher <[email protected]> wrote:

> Yes, it is interesting to me. I know that PDF is a markup that is based on
> a set of PostScript functions and an object layout specification. It is not
> like PNG - that's a raster bitmap. It is a vector drawing spec. My interest
> is pulling out the content - both text and shapes into a useful set of
> objects. I am not so interested at this time in other features like forms,
> embedded files, and output.
>
> I can read the PDF into an object structure and output HTML5. I can also
> output the objects into roughly equivalent PPTX slides using Apache POI.
>
> Corinthia comes in two ways for me.
>
> (1) An HTML5 format that is targeting interchange with Office Document
> formats.
>
> (2) An intermediate format the may be exported in any format that makes
> sense.
>
> So I am looking for Corinthia to allow pluggable DocFormats.


plugable filters is something I tried to persuade peter to earlier, maybe
it will be easier when the new core API is ready.

rgds
jan i

>
> Regards,
> Dave
>
> On Feb 4, 2015, at 11:13 AM, Louis S wrote:
>
> >
> >
> > Louis
> >
> >> On 4 Feb 2015, at 13:55, jan i <[email protected] <javascript:;>> wrote:
> >>
> >>> On 4 February 2015 at 19:51, Louis S <[email protected] <javascript:;>>
> wrote:
> >>>
> >>> I posted on this to see if pdfbox could offer insight s it is taken up.
> >>> Dave pointed out that the functionality of pdfbox ws interesting to his
> >>> company.
> >>>
> >>
> >> And I think your posting was interesting information (such information
> is
> >> needed to see what moves out there). But I do not think we currently
> should
> >> think about putting it into Corinthia.
> >>
> > No objections.
> >
> >> rgds
> >> jan i.
> >>
> >>
> >>> Louis
> >>>
> >>>> On 4 Feb 2015, at 12:03, jan i <[email protected] <javascript:;>>
> wrote:
> >>>>
> >>>> On Wednesday, February 4, 2015, Peter Kelly <[email protected]
> <javascript:;>> wrote:
> >>>>
> >>>>>> On 4 Feb 2015, at 5:47 pm, Edward Zimmermann <
> [email protected] <javascript:;>
> >>>>> <javascript:;>> wrote:
> >>>>>>
> >>>>>> Does this have anything to do with Corinthia? No. Corinthia is about
> >>>>> content and especially word processing formats (OOXML, ODF etc.)..
> >>>>> Corinthia is at its core about pragmatic fidelity. The point of the
> >>>>> bidirectional transformation model is to be able to reduce fidelity
> >>>>> demands. Unless the project wants to get sidetracked into HiFi
> rendering
> >>>>> (of DOCX or ODT) it's completely outside of the scope….
> >>>>>
> >>>>> I think of PDF in the same way as I do PNG. It’s intended as an
> output
> >>>>> format, not an input format. I know there are tools out there which
> are
> >>>>> effectively half of an OCR system which can reconstruct a source
> >>> document
> >>>>> by inferring the logical structure from the layout (e.g. where a
> >>> paragraph
> >>>>> begins and ends), though this is quite a difficult problem and I’m
> not
> >>> sure
> >>>>> that it’d be within the scope of Corinthia (though if someone has
> ideas
> >>> on
> >>>>> this and wants to work on it, I’m all for it - it’s just a very
> >>> difficult
> >>>>> and very different task to writing filters for all the other formats
> >>> we’ve
> >>>>> discussed).
> >>>>
> >>>> +1 I think we currently have other more important tasks in corinthia.
> >>>>
> >>>>
> >>>> rgds
> >>>> jan i
> >>>>
> >>>>>
> >>>>> On the other side is output to PDF - that is, typesetting. This is
> >>>>> something I also think would be outside the scope of the project (at
> >>> least
> >>>>> based on my understanding of people’s interests to date). We
> basically
> >>> rely
> >>>>> on separate programs to do the typesetting of a document produced by
> the
> >>>>> library, e.g. LaTeX, WebKit/other browser engines.
> >>>>>
> >>>>> --
> >>>>> Dr. Peter M. Kelly
> >>>>> [email protected] <javascript:;> <javascript:;>
> >>>>> http://www.kellypmk.net/
> >>>>>
> >>>>> PGP key: http://www.kellypmk.net/pgp-key <
> >>> http://www.kellypmk.net/pgp-key>
> >>>>> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
> >>>>
> >>>> --
> >>>> Sent from My iPad, sorry for any misspellings.
> >>>
>
>

-- 
Sent from My iPad, sorry for any misspellings.

Reply via email to