When I first raise my "confusion" (not "objection") over Davide's
"unstructured-to-structured" wording, my intension was to clarify what kind of problems that the bioRDF group attempts to tackle More specifically, I was refering it within the context of "GRDDL" because IMHO, I don't think GRDDL is designed to help RDF-ize natural language; GRDDL is designed to specifically target at the XML-based documents. Because the draft proposal of the bioRDF only says: "Learn about GRDDL, SPARQL, OWL, etc.", I want to
clarify where they are heading.

There are several threads ongoing here, and I'm going to split this one off from "Structured vs. Unstructured".

Xiaoshu, like you, my focus of interest here is GRDDL specifically. Let me just give you "take" on GRDDL and hopefully Eric Miller and/or others can help correct any misconceptions I have.

My understanding of GRDDL is that it was originally proposed in the (X)HTML community. The problem it was intended to address is that there is no way of validating arbitrary RDF using XML schema (in other words, there is no XSD for RDF, because XML schema is insufficiently expressive). Consequently for XML instances that are intended to be validated according to some schema--and this could include (X)HTML--RDF embedding requires some kind of "expedient", otherwise the RDF will "break" the schema and render the instance non- validatable.

Many "expedients" for embedding the RDF will work--for example separating out the RDF into an appinfo element, attaching it as a separate file, hiding it inside CDATA--and all of these have been tried successfully in one or another application setting. But the (X) HTML community wanted a *web-standard* way of embedding RDF in such a way that the semantic intent ("I hereby officially declare to the WWW that this RDF is inseparably part of the semantics of this XML instance.") would be clear.

GRDDL allows the instance author to make the public declaration above by referencing the URL of some xml transform, that the author thereby publicly identifies as the "key" to extract the intended RDF from his instance. In this very nice way, GRDDL allows the instance author the freedom to package his/her RDF any way he/she pleases, so long as he/ she also provides the "decoder ring" of an xml transform to extract it. Furthermore, the author's statement of semantic inseparability is explicitly entailed by his/her use of the GRDDL standard to render the RDF.

Eric, once again, if I'm getting any of this wrong, correct me...

It's always been my understanding that the primary use case for GRDDL is the one where the instance author explicitly has in mind a "finished" set of RDF triples that he/she wants to embed. He/she "encodes" these triples, packages them into the instance XML, assigns the intended extraction transform a url, attaches that, and sends the resulting instance document off into the world. Easy peasy.

But now here's the part that I (and I think maybe also Xiaoshu) aren't sure about.

Question #1 (which Eric has already answered in the affirmative): Will this work for non-(X)HTML too? Answer: yes. And this is important because most healthcare records documents aren't (X)HTML.

Question #2: Will this work for the case where the instance author **doesn't** explicitly know the actual RDF triple set up front, and the referenced extraction transform is actually acting as a "language processor" to generate triples "that thereby see the light for the first time"?

Question #3: If the answer to #2 is "yes", then is there a conceivable extension to GRDDL where the GRDDL url is not just an xml transform, but ---for example-- a web service fronting for some kind of natural language processor??







Reply via email to