RDF 1.1 -- changes to plain literals

Andy Seaborne Sun, 13 Oct 2013 04:30:36 -0700

== Summary

This email covers the changes in RDF 1.1 around plain literals. In RDF1.1, all literals have a datatype.


* simple literals have datatype xsd:string.
* literals with a language tag have a datatype rdf:langString.

This change may have some impact on databases.

== RDF 1.1

The current situation for RDF (know as RDF-2004) is that "plainliterals" are literals which have no datatype. They are either "simpleliterals" (no datatype, no language tag) or have a language tag. Aliteral does not have both a language tag and a datatype in RDF-2004.


In RDF 1.1, all literals have a datatype always.

* simple literals have datatype xsd:string.
  simple literals and xsd:strings are the same RDF term.

* literals with a language tag have datatype rdf:langString.

This is a change but the working group believes it is a small one. Mixeddata, with both plain literals and xsd:string is assumed to be rare.


The first one, simple literal/xsd:string, is the more significant change.

== Example

Previously:

:s :p "foo" .
:s :p "foo"^^xsd:string .

was 2 triples. In RDF 1.1 there is a graph of one triple there becausea graph is a set of triples; "foo" and "foo"^^xsd:string are differentways of writing the same thing much like this shows two ways to writethe same triple:


---------
@prefix : <http://example/> .

:x :p 123 .
<http://example/x> :p 123 .
---------

== Syntax

This change happens because of the treatment of syntax, input and output:

On input, simple literal and xsd:string create the same RDF term, withdatatype xsd:string. Langtags cause a literal with type rdf:langString,and a language tag, to be created.

On output, the plain literal forms are used. xsd:string andxsd:langString do not appear in the output.

(Aside: rdf:plainLiteral should never appear in RDF data but we could dothe same transforms to the canonical value form)


== Effects
(due to xsd:string)

Systems using xsd:string, and sensitive to an explicit type, areaffected. At a guess, OWL systems, maybe Protégé (but I have noevidence one way of the other. They see to have xsd:strings in the dataand until converted may see data without explicit xsd:string and getconfused.)

The numbers of triples changes IF the same subject/predicate is usedwith simple literals and with xsd:strings.

Generally, I see data that either uses xsd:string or uses simpleliterals. Mixing seems quite rare.


== Jena
(xsd:string)

Jena in-memory already equates simple literals and xsd:strings forsearching (i.e. Graph.find) so while the number of results can change,it should not a case of not finding data.

The worse case is producing data for other systems that are not RDF 1.1and do expect an explicit xsd:string datatype on literals.


== RDF API users
(rdf:langString)

The key is "test language before datatype" - if tested that way roundthe appearance of rdf:langString will not matter. If the test is"datatype first, null meaning plain literal", it will matter.

I doubt much code outside Jena does this sort of thing - it's somethingwriters do so that needs completely checking but it's just a case offinding all the calls of getLiteralLanguage().

This is the most significant rdf:langString related change as far as Ican see.


== SPARQL
(xsd:string)

SPARQL already has some adaptation:
   datatype("x") = xsd:string           (SPARQL 1.0)
   datatype("x"@en) = rdf:langString    (SPARQL 1.1)

Due to the xsd:string change, matching basic graph patterns may producea result it didn't before:


{ ?x :p "foo"^^xsd:string }  will match data  :x :p "foo"
{ ?x :p "foo" }              will match data  :x :p "foo"^^xsd:string

It makes it easier to optimize FILTER(?x = "foo")

== Databases
(xsd:string)

Anything that relies on a hash of literal in a system that usesxsd:string will need to reload. Currently, if keeping simple literalsand xsd:strings apart includes hashing them differently, then thischange is significant.


This does affect TDB and SDB.

= Compatibility

We could provide some compatibility

1/ The ability to write data with explicit xsd:string
2/ Hide rdf:langString from Node.getLiteralDatatype()

What does not work is recording whether an RDF term was originallywritten as xsd:string or as a simple literal. That could end up withtwo different terms (Nodes) that represent the same term, ornon-determinism depending on which term is seen first.


        Andy

RDF 1.1 -- changes to plain literals

Reply via email to