Re: Get the good stuff out of PDFs

Joachim Baran Tue, 21 May 2013 06:33:22 -0700

Hello,

On 21 May 2013 09:15, Alexander Garcia Castro <alexgarc...@gmail.com> wrote:


> Do you have tools that may help us to extract information from PDFs?
> send us an email so that we can include them in the hackathon.

[...]
> Would you like to have XML/RDF for scholarly PDFs? What if you could
> have access to the actual content of the PDF for supporting the Web of
> Data?
>
  Have a look at BioInterchange: http://www.biointerchange.org

  One of the supported formats is output from PDFx (
http://pdfx.cs.man.ac.uk/), which we turn into RDF N-Triples. We make use
of Dublin Core and the Semanticscience Integrated Ontology. Geraint (CC'd
here) did the actual implementation.

  We are currently writing up a paper on BioInterchange. I can send you a
draft of it, if you consider using it as a framework for RDFization.

Best wishes,
Joachim

Re: Get the good stuff out of PDFs

Reply via email to