Yes, that sounds like a sensible approach Value handling that involves normalizing and canonicalizing what are effectively document formats just seems like a major DoS vector in the same way we’ve seen with things like XML DTDs in the past
Rob From: Andy Seaborne <a...@apache.org> Date: Sunday, 26 February 2023 at 17:07 To: dev@jena.apache.org <dev@jena.apache.org> Subject: Datatypes in the rdf: namespace. (Moral: Never pull on the end of a loose bit of string in a codebase...) There are 3 datatypes in the RDF namespace which are there for convenience but not mentioned in the RDF Abstract data model. So they are not required even if they were normatively defined. rdf:XMLLiteral, rdf:HTML, rdf:JSON Jena's XMLLiteralType is compliant with RDF 1.0 but RDF 1.1 changed the rdf:XMLLiteral (no canonicalization, the value space is DOM4 based). In RDF 1.0, rdf:XMLLiteral is the one and only required datatype. It's weird because the lexical space has canonicalization and normalization requirement (the lexical space is the same as value space - puts all the work on the user!). In RDF 1.1, rdf:XMLLiteral is not required (even if normative, which it isn't for other reasons) and it has become just a datatype definition. In RDF 1.1, there is rdf:HTML. The Jena RDF vocabulary has a constant. There is no value handling. rdf:JSON exists in http://www.w3.org/1999/02/22-rdf-syntax-ns, it was defined by JSON-LD. The Jena RDF vocabulary has a constant. There is no value handling. rdf:JSON is likely to make it into RDF 1.2 Concepts. Its value space is a canonicalized form of JSON. All three have complex requirements for the value space (making them a bit of a DOS vector!). It might be simpler to do the same for all 3 datatypes - constants but no value support. Andy