Andy

Thanks for the great overview, I've been looking at supporting this on
dotNetRDF as well lately so have been thinking much along the same lines.

I think the check language first needs to be emphasized in messaging to
users about this change, dotNetRDF has the same issue and I've seen
recently that Sesame was also affected by this.  Therefore I think we need
to be clear about the need for this change in usage.

My feeling is we should make this a configurable behavior, the default
going forward should be RDF 1.1 but it would be nice if users could toggle
that back to RDF-2004 behaviors if they need to produce data for older
systems.

On the database side particularly for TDB would it be feasible to produce
a migration utility which would check a database to see if it is affected
and if so produce a migrated version of the database?

Rob

On 10/13/13 12:29 PM, "Andy Seaborne" <a...@apache.org> wrote:

>== Summary
>
>This email covers the changes in RDF 1.1 around plain literals.  In RDF
>1.1, all literals have a datatype.
>
>* simple literals have datatype xsd:string.
>* literals with a language tag have a datatype rdf:langString.
>
>This change may have some impact on databases.
>
>== RDF 1.1
>
>The current situation for RDF (know as RDF-2004) is that "plain
>literals" are literals which have no datatype.  They are either "simple
>literals" (no datatype, no language tag) or have a language tag.  A
>literal does not have both a language tag and a datatype in RDF-2004.
>
>In RDF 1.1, all literals have a datatype always.
>
>* simple literals have datatype xsd:string.
>   simple literals and xsd:strings are the same RDF term.
>
>* literals with a language tag have datatype rdf:langString.
>
>This is a change but the working group believes it is a small one. Mixed
>data, with both plain literals and xsd:string is assumed to be rare.
>
>The first one, simple literal/xsd:string, is the more significant change.
>
>== Example
>
>Previously:
>
>:s :p "foo" .
>:s :p "foo"^^xsd:string .
>
>was 2 triples.  In RDF 1.1 there is a graph of one triple there because
>a graph is a set of triples; "foo" and "foo"^^xsd:string are different
>ways of writing the same thing much like this shows two ways to write
>the same triple:
>
>---------
>@prefix : <http://example/> .
>
>:x :p 123 .
><http://example/x> :p 123 .
>---------
>
>== Syntax
>
>This change happens because of the treatment of syntax, input and output:
>
>On input, simple literal and xsd:string create the same RDF term, with
>datatype xsd:string. Langtags cause a literal with type rdf:langString,
>and a language tag, to be created.
>
>On output, the plain literal forms are used.  xsd:string and
>xsd:langString do not appear in the output.
>
>(Aside: rdf:plainLiteral should never appear in RDF data but we could do
>the same transforms to the canonical value form)
>
>== Effects
>(due to xsd:string)
>
>Systems using xsd:string, and sensitive to an explicit type, are
>affected.  At a guess, OWL systems, maybe Protégé (but I have no
>evidence one way of the other. They see to have xsd:strings in the data
>and until converted may see data without explicit xsd:string and get
>confused.)
>
>The numbers of triples changes IF the same subject/predicate is used
>with simple literals and with xsd:strings.
>
>Generally, I see data that either uses xsd:string or uses simple
>literals.  Mixing seems quite rare.
>
>== Jena
>(xsd:string)
>
>Jena in-memory already equates simple literals and xsd:strings for
>searching (i.e. Graph.find) so while the number of results can change,
>it should not a case of not finding data.
>
>The worse case is producing data for other systems that are not RDF 1.1
>and do expect an explicit xsd:string datatype on literals.
>
>== RDF API users
>(rdf:langString)
>
>The key is "test language before datatype" - if tested that way round
>the appearance of rdf:langString will not matter.  If the test is
>"datatype first, null meaning plain literal", it will matter.
>
>I doubt much code outside Jena does this sort of thing - it's something
>writers do so that needs completely checking but it's just a case of
>finding all the calls of getLiteralLanguage().
>
>This is the most significant rdf:langString related change as far as I
>can see.
>
>== SPARQL
>(xsd:string)
>
>SPARQL already has some adaptation:
>    datatype("x") = xsd:string           (SPARQL 1.0)
>    datatype("x"@en) = rdf:langString    (SPARQL 1.1)
>
>Due to the xsd:string change, matching basic graph patterns may produce
>a result it didn't before:
>
>{ ?x :p "foo"^^xsd:string }  will match data  :x :p "foo"
>{ ?x :p "foo" }              will match data  :x :p "foo"^^xsd:string
>
>It makes it easier to optimize FILTER(?x = "foo")
>
>== Databases
>(xsd:string)
>
>Anything that relies on a hash of literal in a system that uses
>xsd:string will need to reload.  Currently, if keeping simple literals
>and xsd:strings apart includes hashing them differently, then this
>change is significant.
>
>This does affect TDB and SDB.
>
>= Compatibility
>
>We could provide some compatibility
>
>1/ The ability to write data with explicit xsd:string
>2/ Hide rdf:langString from Node.getLiteralDatatype()
>
>What does not work is recording whether an RDF term was originally
>written as xsd:string or as a simple literal.  That could end up with
>two different terms (Nodes) that represent the same term, or
>non-determinism depending on which term is seen first.
>
>       Andy




Reply via email to