Re: scientific publishing process (was Re: Cost and access)

Norman Gray Tue, 07 Oct 2014 02:43:35 -0700

Kingsley and all, hello.

On 2014 Oct 7, at 02:18, Kingsley Idehen <kide...@openlinksw.com> wrote:

> On 10/6/14 2:49 PM, Peter F. Patel-Schneider wrote:
>> 
>> 
>> On 10/06/2014 11:03 AM, Kingsley Idehen wrote:
>>> On 10/6/14 12:48 PM, Peter F. Patel-Schneider wrote:
>>>> It's not hard to query PDFs with SPARQL.  All you have to do is extract the
>> 
>> Huh?  Every single PDF reader that I use can extract the PDF metadata and 
>> display it.
> 
> Again, this isn't about metadata.

With all respect to the larger goal of having fully semanticked-up documents, I 
think the question _is_ all about metadata.  The original spark to the thread 
was a lament that SW and LD conferences don't mandate something XMLish for 
submissions because X(HT)ML is clearly better for... well ... dammit, it's 
Better.

_One_ thing it would be better for is supporting the sort of full-scale 
RDF-everything view that you've described so eloquently.  But if that's your 
goal, then lexing the source text is really going to be the least of your 
problems.

A more modest goal, which is still valuable and _much_ more achievable, is to 
get at least some RDF out of submitted articles.  That practically means 
metadata, plus perhaps some document structure, plus, if you're keen and can 
get the authors to invest their effort, some argumentation.  That's available 
for free (and right now) from LaTeX authors, and available from XHTML authors 
depending on how hard it would be to get them to put @profile attribute in the 
right places.

So no, not just about 'metadata' in the narrow sense, but I think this thread 
is about what RDF you can in practice extract from the materials that authors 
can in practice be induced or obliged to submit to conference proceedings.

That original lament has overlapped with a parallel lament that PDF is a 
dead-end format -- it's not 'webby'.  I believe that the demo in my earlier 
message undermines that claim as far as RDF goes.

>>> 1. The extractors are platform specific -- AWWW is about platform 
>>> agnosticism
>>> (I don't want to mandate an OS for experiencing the power of Linked Open 
>>> Data
>>> transformers / rdfizers)
>> 
>> Well, the extractors would be specific to PDF, but that's hardly surprising, 
>> I think.

[I've lost track of whose comment this is...]

The extractor I demoed wasn't PDF-specific.

>>> We want to leverage the productivity and simplicity that AWWW brings to data
>>> representation, access, interaction, and integration.
>> 
>> Sure, but the additional costs, if any, on paper authors, reviewers, and 
>> readers have to be considered.  If these costs are eliminated or at least 
>> minimized then this good is much more likely to be realized.
> 
> With some help from Adobe we can have the best of all worlds here. I am going 
> to take a look at their latest cloud offerings and associated APIs.

I forgot to attach the extractor I wrote -- done.  The demo didn't use any 
Adobe API, neither to put the XMP into the PDF nor to extract the RDF from it.

All the best,

Norman

-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

extract-xmp.c
Description: Binary data

Re: scientific publishing process (was Re: Cost and access)

Reply via email to