RE: Mulgara and sameTerm

Seaborne, Andy Tue, 29 Jul 2008 09:35:23 -0700

Hi all,

I hope you don't mind but this is the long-winded answer to put everything in 
context.

SPARQL is defined for simple entailment and there is a extension mechanism for 
other entailment regimes [A].  An entailment regime can add some level of 
D-entailment [C] which allows matching of graphs to be performed with respect 
to value space not just the lexical space.  The XSD datatype hierarchy [E] for 
D-entailment being a very important case and is specially treated in FILTERs.

Most of the SPARQL filters require value space comparison.  The definition of 
"=" allows extensibility by causing a type error if two terms might be the same 
value but the processor does not know.  (Aside two literals are definitely 
equal if they are the same lexical form and same datatype, for any datatype 
whether anything else if know to the processor about it, because the lexical to 
value space mapping of the datatype is functional.)

sameTerm works on the definition of equality from RDF Concepts so no 
D-entailment. [B]  But SPARQL does not prescribe what is "in" the store - there 
is dataset that is queried.  Especially in the case where the dataset comes 
from execution context (no FROM etc, no protocol parameter), SPARQL says 
nothing about how that dataset came to be.  It just is.  So if you load RDF 
that has "+1"^^xsd:int, whether the store preserves the exact lexical form, or 
it's datatype, is a feature of the store.  SPARQL does not cover this step.  If 
you load "+1"^^xsd:integer and "01"^^xsd:byte, it's a store decision whether 
there are two terms or one, or whether what is stored and returned is 
"1"^^xsd:integer which wasn't directly mentioned (or even "1"^^xsd:decimal as 
the primitive XSD type that they are all derived from).

Whether this all happens when the data is actually loaded, some intermediate 
time or even under the covers at query time is merely implementation detail.  
It just might not be an easy implementation detail for store builders :-)

Different user classes want different things.  If your ontology editing, what 
is stored being exactly the terms specified is an expectation.  But plain 
literals with no language tag and xsd:strings are same-value (RDF MT rules XSD 
1a, and XSD 1b).  We see both expectations - preserve exact form and equate 
such plain literal and xsd:strings even within the ontology editing users.

And double and floats aren't even derived type related but they are datatypes 
the FILTER system requires to be understood.  Their equality is comes XSD F&O 
[D] op:numeric-equal.  So whether basic graph pattern matching (generative, 
joining) and FILTER value testing (restrictive) exactly line up depends on (D-) 
entailment provided.

The test suite is a slightly different case: it is providing tests for a 
specific set of choices.  The tests do label what the assumptions are.  Some 
tests are labelled as making more than just basic assumptions (e.g language 
tags).

And the short answer: SPARQL does not spec out the whole lifecycle and it seems 
OK to me - which is good, because that's what we do for TDB.

        Andy

[A] http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend
[B] http://www.w3.org/TR/rdf-sparql-query/#func-RDFterm-equal
[C] http://www.w3.org/TR/rdf-mt/#D_entailment
[D] http://www.w3.org/TR/xpath-functions/#func-numeric-equal
[E] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

> -----Original Message-----
> From: James Leigh [mailto:[EMAIL PROTECTED]
> Sent: 29 July 2008 16:50
> To: [email protected]
> Cc: Seaborne, Andy; Arjohn Kampman; Andrae Muys; Paul Gearon
> Subject: Re: Mulgara and sameTerm
>
> On Tue, 2008-07-29 at 10:44 -0500, Paul Gearon wrote:
> > Because I was being asked to make this "work" with the SPARQL test
> > suite, I presumed that duplication was required. I also presumed that
> > most applications inserting a non-canonical form of data would stick
> > to the same lexical form each time, which would minimize the issue for
> > that application.
> >
> > Of course, it is always possible to take the easy road and rely on
> > RDF-equals. So instead of using:
> >   ns:foo ns:bar ?x . ?x ns:baz ns:boo
> >
> > You'd instead use:
> >   ns:foo ns:bar ?x . ?y ns:baz ns:boo FILTER (?x = ?y)
> >
> > However, this is never going to perform as well, and can potentially
> > take up significantly more storage, so I'm not for it at all.
> >
> If this brakes SPARQL compatibility, would you be against full SPARQL
> compatibility in Mulgara?
>
> > I'm OK to move this thread onto the SPARQL list.
> >
> > Paul
> >
> > On Tue, Jul 29, 2008 at 10:28 AM, Seaborne, Andy <[EMAIL PROTECTED]>
> wrote:
> > > Does anyone mind if this discussion happens on public-sparql-
> [EMAIL PROTECTED]
> > >
> > >        Andy
> > >
> > >> -----Original Message-----
> > >> From: James Leigh [mailto:[EMAIL PROTECTED]
> > >> Sent: 29 July 2008 13:21
> > >> To: Arjohn Kampman; [EMAIL PROTECTED]; Seaborne, Andy
> > >> Cc: Paul Gearon; Andrae Muys
> > >> Subject: Re: Mulgara and sameTerm
> > >>
> > >> Hi all,
> > >>
> > >> Including Andy to get his interpretation (read on down the page for
> more
> > >> information).
> > >>
> > >> I spoke with Andrae (he is having email troubles). He thought this
> was a
> > >> very serious problem and wanted to take this up with Andy Seaborne.
> > >>
> > >> His concerns where:
> > >> The problem is that this would prevent us from ever storing nodes
> > >> inline; forcing a string-pool lookup on *every* resolution.
> > >> What should be the result of joining "1"^^xsd:int and "+1"^^xsd:int ?
> > >> Will this mean that they will have different localnodes?
> > >>
> > >> Paul what is your take on these concerns/questions?
> > >>
> > >> I think "1"^^xsd:int should be a different term then "+1"^^xsd:int
> and
> > >> have different localnodes.
> > >>
> > >> Maybe we could introduce new internal types, instead of just integer,
> we
> > >> could have integer and integer-with-plus-prefix and others to handle
> all
> > >> possible numeric formats?

There are more cases than this: anything that is a derived type is in the same 
value space of it's primitive type.

All these are the same value by XSD (Schema Part 2: Datatypes)
(The XSD decimal derived types are the most extensive)

Variations on lexical form:

"1"^^xsd:integer
"01"^^xsd:integer
"+1"^^xsd:integer

Derived types:

"1"^^xsd:nonNegativeInteger
"1"^^xsd:positiveInteger
"1"^^xsd:unsignedLong

"1"^^xsd:long
"1"^^xsd:int
"1"^^xsd:short
"1"^^xsd:byte

There would be quite a lot of different internal types.

> > >>
> > >> James
> > >>
> > >> On Mon, 2008-07-28 at 13:09 -0400, James Leigh wrote:
> > >> > Hi Arjohn, Paul and Andrae,
> > >> >
> > >> > Mulgara 2.0 was released last week. It includes some of the bugs
> that
> > >> > were discovered through the Sesame SPARQL test-suite. However,
> there are
> > >> > a few core issues that will prevent us from releasing a stable
> SPARQL
> > >> > compliant RDF store using Mulgara.
> > >> >
> > >> > The biggest problem is that Mulgara stores only the literal _value_
> for
> > >> > known datatypes. That means that "+1"^^xsd:int is stored identical
> to
> > >> > "1"^^xsd:int. This has significant consequences with how we
> implement
> > >> > sameTerm as these literals originally have different labels, but
> are
> > >> > collapsed into the same label.
> > >> >
> > >> > RDF Concepts states that for two literals to be the same "The
> strings
> > >> > of the two lexical forms compare equal, character by character."
> (see
> > >> > below for more context). Mulgara will have to begin storing the
> original
> > >> > label with all literals (at least for unreproducible labels) before
> we
> > >> > can release a stable SPARQL compliant RDF store.
> > >> >
> > >> >  ** Paul/Andrae, can this change be put into the Mulgara road-map?
> **
> > >> >
> > >> > Thanks,
> > >> > James
> > >> >
> > >> > ---%<---
> > >> > The SPARQL sameTerm states that[1]:
> > >> >         Returns TRUE if term1 and term2 are the same RDF term as
> defined
> > >> >         in Resource Description Framework (RDF): Concepts and
> Abstract
> > >> >         Syntax [CONCEPTS]; returns FALSE otherwise.
> > >> >
> > >> > Here is a excerpt from RDF Concepts[2]:
> > >> >         6.5.1 Literal Equality
> > >> >         Two literals are equal if and only if all of the following
> hold:
> > >> >
> > >> >               * The strings of the two lexical forms compare equal,
> > >> >                 character by character.
> > >> >               * Either both or neither have language tags.
> > >> >               * The language tags, if any, compare equal.
> > >> >               * Either both or neither have datatype URIs.
> > >> >               * The two datatype URIs, if any, compare equal,
> character
> > >> >                 by character.
> > >> >
> > >> > [1] http://www.w3.org/TR/rdf-sparql-query/#func-sameTerm
> > >> > [2] http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal
> > >> >
> > >
> > >

RE: Mulgara and sameTerm

Reply via email to