XML vs. RDF

William Bug Fri, 07 Jul 2006 21:44:35 -0700

Dear Philip,

Many thanks for this concise and accessible qualification to Chimezie's explanation. I was a little crest-fallen when I saw his original answer to Trish, and thought I really had misunderstood an issue that is becoming of very significant importance to several projects with which I'm involved.

There have been several debates recently in the neuroinformatics community as to whether an XML-only (XML, XSD, XSLT, XLink) will suffice when creating creating sub-domain knowledge resources - especially if you are just collecting terminologies, as opposed to creating a full-blown, well-founded ontology. Whether it really isn't necessary to go to Semantic Web tech - i.e., the constellation of RDF-associated specs (RDF++ - sorry to add to the acronym soup - this is just a shorthand for this email) and the growing number of utilities for manipulating RDF/OWL and all the other RDF-related formalisms.

The general arguments against moving on to RDF++ seem to be:

1) It's extra work to fashion the assembled terminologies in such a way so as to be able to represent them in RDF++

2) RDF++ is relatively new and unproven on a large scale (i.e., has limited adoption)

3) The RDF++ toolset is consequently small, of questionable robustness, and not ubiquitous (in the sense the Xerces parser is ubiquitous);

4) RDF++ are all XML-based. Whatever you do with them, you could do yourself with a little extra work;

5) OWL isn't perfect for representing formal ontological frameworks - besides we're just representing terminologies, not building an ontology

6) We can leave it to others to create XSLT converters to move the XML-only resources into the RDF++ space

7) XLink can provide typed relations not unlike the predicate in an RDF triplet

8) RDF syntax is more opaque to a human than XML/XSD - e.g., more difficult for a human to read.

9) Proponents of RDF++ argue that XML has limited semantic expressivity, but that's just not true.

I've really only started using RDF++ technologies myself over the year or so, but my naive answers have typically been:

1) It's extra work to fashion the assembled terminologies in such a way so as to be able to represent them in RDF++

a) The work you'd do in order to correctly represent your knowledge resource in RDF++ doesn't really add a very significant percentage of time to the overall effort and doing will force you to be more explicit about the semantic relations between the terms. There would be a moderate amount of work just developing a working knowledge of the technologies associated with RDF++, but you'll be better off for it in the end. I think its this latter issue that is really at the bottom of most of the concern. Folks have invested heavily in the XML-only array of technologies over the last 10 years. They are somewhat knowledgeable regarding RDF++ technologies but don't yet have a complete working knowledge of that space.

2) RDF++ is relatively new and unproven on a large scale (i.e., has limited adoption)

It is neither new, nor is it unproven on a large scale. The number of applications in biomedical informatics is growing fast ([http://www.w3.org/2005/04/swls/], [http://esw.w3.org/topic/SemanticWebForLifeSciences]), though admittedly its much less visible in neuroinformatics right now. This too is changing rapidly, where the focus is on semantically-based information processing ([http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Neuroscience_Semantic_Web_Projects], [http://sciencecommons.org/data/neurocommons], and several projects in development by other participants on the HCLSIG [http://esw.w3.org/topic/SemanticWebForLifeSciencesPeople] - e.g., Kei Cheung, Don Doherty, etc.). Still, many of the projects are just in the offing.

3) The RDF++ toolset is consequently small, of questionable robustness, and not ubiquitous (in the sense the Xerces parser is ubiquitous);

This is also not true, as best I can tell. Certainly, CWM, Protégé-OWL, and others [http://esw.w3.org/topic/], as well as those more specific to the HC/LS space (http://www.w3.org/2001/sw/hcls/#resources)

4) RDF++ are all XML-based. Whatever you do with them, you could do yourself with a little extra work;

These seems to contradict point 1 above. The "little extra work" is not trivial, and an overwhelming amount of it will need to meet general requirements for manipulating semantic information effectively - the driver behind the creation of RDF++. You'll have a lot more code to write and maintain, if you don't take advantage of Semantic Web tech. Also, as Eric N. has mentioned: "An example of a classic SW Myth: RDF is not based at all on XML -- it was defined as a graph-relational model outside of XML, and can be represented TRIPLES, TURTLES, N3, and XML. By definition, it is broader than any XML schema-- it can live as a meta-definition for a RDBM or even a KB."

5) OWL isn't perfect for representing formal ontological frameworks - besides we're just representing terminologies, not building an ontology

a) Even when assembling a terminology, you will be hard pressed not to represent some implicit semantic relations in your graph. This is even true for some flat lists of terms - e.g., 'driver', 'iron', and 'putter' are all types of 'golf club'.

b) Work is ongoing to expand the semantic expressivity of OWL (see Chris M.'s comment re: including a formalism to accommodate time).

6) We can leave it to others to create XSLT converters to move the XML-only resources into the RDF++ space

Philip & Chris M. have both given clear answers to this ill-advised use of XSLT. The other issue Eric N. has described clearly is the N**2 problem - the combinatorial proliferation of XSLTs as more XSDs are added to the mix. Eric: "Data assembly from new sources and modalities is virtually impossible via XML schemas , after the schemas have been defined."

7) XLink can provide typed relations not unlike the predicate in an RDF triplet. There's nothing special about the use of URIs to provide these links in RDF

Yes - but:

a) Using XLink to do this forces you to reference semantic entities in relation to an entire document structure, as opposed to a more direct, simple URI based link;

b) This was not the design intension of XLink, so it's likely to be a problematic mechanism to rely on for representing complex semantic networks.

c) Again, as Eric N. has said: "URI's are a central part of the "node definition" in RDF, but not in XML. You can do what you want with URI's in XML, and that becomes a problem. RDF says all URI's must merge, while XML says "you need to explicitly define that in your parser and tree handler"-- yuck! No guarantee others will do that when they read your XML content. URI meta-semantics are only defined in RDF/OWL."

8) RDF syntax is more opaque to a human than XML/XSD - e.g., more difficult for a human to read.

Not true. There's the N3 formalism and many tools providing an much easier way for humans to review formal, semantically specified data sets than sorting through XML/XSD/XSLT mappings to ontologies, for instance. Eric: "Tim B-L himself says, RDF predicates serve human expressivity first, machines second! It's easy enough to write an RDF to english viewer for those addicted to reading XML."

9) Proponents of RDF++ argue that XML has limited semantic expressivity, but that's just not true.

I think this argument is completely inverted. The problem is XML has nearly unlimited expressivity, but any semantic meaning you want to imbue your XML with must be made explicit in the parsers you write. When you hope to align your semantic content to others, they must also represent equivalent semantic entities and relations according to the logic in your parser - not a very scalable approach. I think the Nature Biotech article by Xiaoshu and his colleagues clearly explains that issue.

A lot of the counter arguments to these statements come down to:

I) if you try to perform semantically-based KE/KR/KD with XML-only, you will have a lot more code to write & maintain YOURSELF - and much of it will reproduce what you'd get automatically using RDF++.

II) You just can't provide the flexibility, guaranteed resolvability of resources, and efficient _expression_ required when representing semantic relations in the rigid, strictly hierarchical document-oriented world of XML-only, so you'll likely fall short on a lot of your requirements.

I'd really appreciate hearing the views both pro & con on these issues from others on this list.

Thanks again, Philip, for your lucid and concise explanation.

Cheers,

Bill

On Jul 7, 2006, at 6:35 AM, Phillip Lord wrote:

"TW" == Trish Whetzel <[EMAIL PROTECTED]> writes:

TW> Hi all,

TW> As a terribly simple question, is it possible to take the actual
TW> FuGE-ML that is generated on a per instance reporting of an
TW> experiment/study/investigation and then convert than to RDF for
TW> use with semantic web technologies?

Converting between one syntax and another is fairly simple, and there
are some reasonably tools for it. XSLT would work for converting XML
into RDF. I wouldn't like to use it for converting the other way
(actually I wouldn't like to use it at all, but this is personal
prejudice!).

This is assuming, however, that the semantics of the two
representations are compatible. To give an example, syntactically it
is possible to convert between the GO DAG and an OWL representation of
GO. However, the GO part-of relationship doesn't distinguish
universal and existential, while OWL forces you to make this
distinction; you can't sit on the fence.

So, the simple answer to a simple question is: it depends. I wouldn't
assume that FuGE-ML will be convertible into a given
ontology or representation in RDF, unless a reasonable amount of care
is taken in the design of FuGE-ML or the ontology to ensure that it
can happen.

Course, you could always hack it with some rules and a bit of human
intervention. That works as well.

Cheers

Phil

Bill Bug

Senior Analyst/Ontological Engineer

Laboratory for Bioimaging & Anatomical Informatics

www.neuroterrain.org

Department of Neurobiology & Anatomy

Drexel University College of Medicine

2900 Queen Lane

Philadelphia, PA 19129

215 991 8430 (ph)

610 457 0443 (mobile)

215 843 9367 (fax)

Please Note: I now have a new email - [EMAIL PROTECTED]


This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.

XML vs. RDF

Reply via email to