GRDDL (split off from: Structured vs. Unstructured)

John Madden Tue, 14 Feb 2006 09:49:10 -0800

When I first raise my "confusion" (not "objection") over Davide's
"unstructured-to-structured" wording, my intension was to clarifywhat kindof problems that the bioRDF group attempts to tackle Morespecifically, Iwas refering it within the context of "GRDDL" because IMHO, I don'tthinkGRDDL is designed to help RDF-ize natural language; GRDDL isdesigned tospecifically target at the XML-based documents. Because the draftproposalof the bioRDF only says: "Learn about GRDDL, SPARQL, OWL, etc.", Iwant to
clarify where they are heading.

There are several threads ongoing here, and I'm going to split thisone off from "Structured vs. Unstructured".

Xiaoshu, like you, my focus of interest here is GRDDL specifically.Let me just give you "take" on GRDDL and hopefully Eric Miller and/orothers can help correct any misconceptions I have.

My understanding of GRDDL is that it was originally proposed in the(X)HTML community. The problem it was intended to address is thatthere is no way of validating arbitrary RDF using XML schema (inother words, there is no XSD for RDF, because XML schema isinsufficiently expressive). Consequently for XML instances that areintended to be validated according to some schema--and this couldinclude (X)HTML--RDF embedding requires some kind of "expedient",otherwise the RDF will "break" the schema and render the instance non-validatable.

Many "expedients" for embedding the RDF will work--for exampleseparating out the RDF into an appinfo element, attaching it as aseparate file, hiding it inside CDATA--and all of these have beentried successfully in one or another application setting. But the (X)HTML community wanted a *web-standard* way of embedding RDF in such away that the semantic intent ("I hereby officially declare to the WWWthat this RDF is inseparably part of the semantics of this XMLinstance.") would be clear.

GRDDL allows the instance author to make the public declaration aboveby referencing the URL of some xml transform, that the author therebypublicly identifies as the "key" to extract the intended RDF from hisinstance. In this very nice way, GRDDL allows the instance author thefreedom to package his/her RDF any way he/she pleases, so long as he/she also provides the "decoder ring" of an xml transform to extractit. Furthermore, the author's statement of semantic inseparability isexplicitly entailed by his/her use of the GRDDL standard to renderthe RDF.


Eric, once again, if I'm getting any of this wrong, correct me...

It's always been my understanding that the primary use case for GRDDLis the one where the instance author explicitly has in mind a"finished" set of RDF triples that he/she wants to embed. He/she"encodes" these triples, packages them into the instance XML, assignsthe intended extraction transform a url, attaches that, and sends theresulting instance document off into the world. Easy peasy.

But now here's the part that I (and I think maybe also Xiaoshu)aren't sure about.

Question #1 (which Eric has already answered in the affirmative):Will this work for non-(X)HTML too? Answer: yes. And this isimportant because most healthcare records documents aren't (X)HTML.

Question #2: Will this work for the case where the instance author**doesn't** explicitly know the actual RDF triple set up front, andthe referenced extraction transform is actually acting as a "languageprocessor" to generate triples "that thereby see the light for thefirst time"?

Question #3: If the answer to #2 is "yes", then is there aconceivable extension to GRDDL where the GRDDL url is not just an xmltransform, but ---for example-- a web service fronting for some kindof natural language processor??

GRDDL (split off from: Structured vs. Unstructured)

Reply via email to