Potential change - \u and \U escape handling in SPARQL

Andy Seaborne Mon, 26 Oct 2020 06:14:48 -0700

https://github.com/apache/jena/pull/820 makes a change to SPARQL queriesin the way that \u and \U escapes are handled.


\U is like \u but with 8 hex.

It applies to Syntax.ARQ ("with extensions") which is the default injena-arq and also used in Fuseki.


Syntax.SPARQL_11 remains untouched.

See discussion:

    w3c/rdf-tests#67
    w3c/sparql-12#77

tl;dr:

SPARQL and Turtle differ in the way they handle \u and \U escapes.

In SPARQL, they can appear anywhere (even for special characters like"{" or "#" or in keywords "SEL\u0045CT") because it happens in the inputstream before parsing. A javacc feature that isn't present in otherparser tools.


ARQ has not supported \U up to now. RIOT does.

In Turtle, \u and \U can appear in strings and URIs, not in prefixnames, and nowhere else. This is a better design.

The general community consensus seems to be that the change shouldhappen and it was a mistake in SPARQL 1.0.

Personally, I don't recall having seen a query in the wild that uses \uoutside of strings or URIs. It is a confustication vector.


    Andy

Potential change - \u and \U escape handling in SPARQL

Reply via email to