Hello, On 21 May 2013 09:15, Alexander Garcia Castro <alexgarc...@gmail.com> wrote:
> Do you have tools that may help us to extract information from PDFs? > send us an email so that we can include them in the hackathon. [...] > Would you like to have XML/RDF for scholarly PDFs? What if you could > have access to the actual content of the PDF for supporting the Web of > Data? > Have a look at BioInterchange: http://www.biointerchange.org One of the supported formats is output from PDFx ( http://pdfx.cs.man.ac.uk/), which we turn into RDF N-Triples. We make use of Dublin Core and the Semanticscience Integrated Ontology. Geraint (CC'd here) did the actual implementation. We are currently writing up a paper on BioInterchange. I can send you a draft of it, if you consider using it as a framework for RDFization. Best wishes, Joachim