Yes, that sounds like a sensible approach

Value handling that involves normalizing and canonicalizing what are 
effectively document formats just seems like a major DoS vector in the same way 
we’ve seen with things like XML DTDs in the past

Rob

From: Andy Seaborne <a...@apache.org>
Date: Sunday, 26 February 2023 at 17:07
To: dev@jena.apache.org <dev@jena.apache.org>
Subject: Datatypes in the rdf: namespace.
(Moral: Never pull on the end of a loose bit of string in a codebase...)

There are 3 datatypes in the RDF namespace which are there for
convenience but not mentioned in the RDF Abstract data model. So they
are not required even if they were normatively defined.

rdf:XMLLiteral, rdf:HTML, rdf:JSON

Jena's XMLLiteralType is compliant with RDF 1.0 but RDF 1.1 changed the
rdf:XMLLiteral (no canonicalization, the value space is DOM4 based).

In RDF 1.0, rdf:XMLLiteral is the one and only required datatype. It's
weird because the lexical space has canonicalization and normalization
requirement (the lexical space is the same as value space - puts all the
work on the user!).

In RDF 1.1, rdf:XMLLiteral is not required (even if normative, which it
isn't for other reasons) and it has become just a datatype definition.

In RDF 1.1, there is rdf:HTML. The Jena RDF vocabulary has a constant.
There is no value handling.

rdf:JSON exists in http://www.w3.org/1999/02/22-rdf-syntax-ns, it was
defined by JSON-LD. The Jena RDF vocabulary has a constant. There is no
value handling.

rdf:JSON is likely to make it into RDF 1.2 Concepts. Its value space is
a canonicalized form of JSON.

All three have complex requirements for the value space (making them a
bit of a DOS vector!).

It might be simpler to do the same for all 3 datatypes - constants but
no value support.

     Andy

Reply via email to