On Apr 18, 2007, at 1:38 AM, Chris Mungall wrote:
On Apr 17, 2007, at 10:49 AM, William Bug wrote:
I with Bijan on this issue.
However complex the current OWL representation may appear, it's
considerably more terse than the expression of this same info in a
relational model.
I'm not sure if this is necessarily the case.
To be clear about what I said: I am not fond of using triple based
syntaxes for representing class expressions (and axioms involving
class expressions, and queries involving class expressions). I also
dislike long names for operators (e.g., intersectionOf) and having to
name (some) pieces of syntax (e.g., owl:Restrictions). Some non
triple/rdf representations representations retain the latter features
(e.g., OWL 1.1 functional and XML syntax). I still tend to find them
superior to the triple based ones, so these are separate issues.
For complex class expressions, I prefer an operator syntax such as
standard DL or FOL. Standard variable free DL syntax has considerable
concision and composition advantages, in my experience. (Even in ad
hoc textual variants, it's nice to be able to do something like
(some.P C) instead of (some(y)(Pxy & Cy). Nesting quantifiers is
really nice in DL syntax.)
If we are talking specifically about the representation of OWL *in
RDF triples* and the corresponding SPARQL queries, then we are
essentially talking about a 3-ary relational model anyway,
I hesitate to follow moves about syntax through "essentially"s to
models. Lisp lists are "essentally" chains of cons cells, but '(1 2
3) doesn't wear that on its sleeve (and could resolve to an array
based internal form).
modulo the usual concerns re open vs closed world and the like.
Such talk *really* worries me when we are talking *syntax*.
And n-ary relations are surely either as terse or more terse than 3-
ary relations.
Since OWL is restricted in the number of distinct variables (and the
combinations thereof), you get some of the advantages of variable
freedom even in the hairier syntaxes.
Compare facts in an imaginary relational model for OWL [1]:
existential_restriction(part_of, CellNucleus, Cell)
This is not far off from current OWL 1.1 functional syntax, see:
http://webont.org/owl/1.1/owl_specification.html#4
But I'd want it to be composition, i.e., "Cell" to be replacable with
a complex class expression, e.g., another existential_restriction. If
we are going to talk "relational model" then we've added function
terms, at the very least.
With [2]:
subClassOf(CellNucleus,_r1)
restriction(_r1)
onProperty(_r1,part_of)
someValuesFrom(_r1,Cell)
Yes, this is exactly the "trouble with triples". ewww. hate that
bnode too.
[snip]
And of course SQL and most implementations of the relational model
give you little or no deductive facilities; but then, this is also
true for most SPARQL implementations too. Even with RDFS
entailment, you don't have enough for basic class-level (TBox)
transitivity.
Pellet and KAON2 support SPARQL syntax for *Abox* queries (to some
degree, it varies) and racer has a similar language.
Anyway, I think I'm being pedantic and straying from the point. The
issue is that queries expressed in SPARQL over class-level
relations (eg part_of in a TBox)
And relatively new. I.e.,most conjunctive query in DL land is purely
over *aboxes*. Querying *TBoxes* is done with special functions a la
DIG. Thus, the triple syntax of sparql is a bit misleading.
However, in at least one version of Pellet we had experimental
support for mixed TBox/Abox queries and we've written this up:
http://clarkparsia.com/files/pdf/sparqldl.pdf
with an eye to getting a spec together at OWLED. Intuitively, you
compile out the TBox query parts and turn them into DIG calls, then
perform query expansion on the class or property variables in the
abox atoms.
represented using owl restrictions are verbose, contrasted to
representations that use a single predicate for the class-level
relation. The issue here is not the syntax per se, rather the
additional triples and bNode created when layering the OWL on the
RDF model. I don't know if it's such a huge problem - I have
learned to live with it - but I know that people used to n-ary
relational queries balk at doing a multi-triple-with-bnode query
for simple TBox queries such as the above.
Hence my preference to plug in a better syntax for SPARQL/DL. The
triple syntax is also misleading as it leads users to expect some
queries to be legal (and useful) which just aren't.
This is something which we'll spend some considerable time at OWLED on.
One solution here is Alan's alternate non-OWL layering of class
level relations in the RDF model, possibly controversial. Another
is an additional layer on top of SPARQL - eg some macro language
that provides constructs such as a single predicate for class level
relations - and compiles down to SPARQL - this appears to be what
is suggested below? Manchester syntax is mentioned - a QL based on
Manchester syntax would be nice. For our ABox query we could say "?
X part_of some Cell". I imagine this could trivially compile down
to SPARQL - or it could be an OWL QL that has its own model.
As I said, Kendall Clark and I made progress on an XML syntax for
SPARQL. You could then leave the algebra parts constant and plug in
OWL 1.1's xml syntax as either a compilation target or source.
This is related to but different from the issue of entailment -
many RDF systems, including most SPARQL implementations - give you
little or no entailment - eg RDFS. This isn't enough to give you a
complete answer for [2] (assuming part_of is transitive). Alan's
transformation does, I believe, give you a correct answer for when
you have RDFS entailment.
Yyou can write some very effective SPARQL queries against it,
after playing with it a bit to get a more complete understanding
of what the ontology is trying to express.
I've certainly been having pretty good luck creating SPARQL
queries - even by hand (i.e., without fancy end-user oriented
tools) - against some of the similarly modeled data in the
NeuroCommons repository.
SPARQL seems adequate in many respects for data oriented queries
(typically, but not always, ABox) - the verbosity manifests in TBox
queries, and possibly other scenarios that dictate the standard n-
ary pattern transform.
And in arbitrary DL systems, you are most likely to only *get* ABox
queries, since traditionally you used an API for your TBox queries.
The above reference paper is trying to change that.
(Cerebra had a sorta mixed tbox/abox query language based on XQuery,
but it just made the DIGgish calles more or less explicit.)
Cheers,
Bijan.