[
https://issues.apache.org/jira/browse/JENA-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262293#comment-16262293
]
ASF GitHub Bot commented on JENA-1384:
--------------------------------------
Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/308#discussion_r152531097
--- Diff:
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
return n2 ;
}
+ /** Convert the lexical form to a canonical form if one of the known
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})
+ */
+ public static Node canonicalValue(Node node) {
+ if ( ! node.isLiteral() )
+ return node ;
+ // Fast-track
+ if ( NodeUtils.isLangString(node) )
+ return node;
+ if ( NodeUtils.isSimpleString(node) )
+ return node;
+
+ if ( !
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+ // Invalid lexical form for the datatype - do nothing.
+ return node;
+
+ RDFDatatype dt = node.getLiteralDatatype() ;
+ // Datatype, not rdf:langString (RDF 1.1).
+ DatatypeHandler handler = dispatch.get(dt) ;
+ if ( handler == null )
+ return node ;
+ Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+ if ( n2 == null )
+ return node ;
+ return n2 ;
+ }
+
+ /** Convert the language tag of a lexical form to a canonical form if
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code
==})
+ */
private static Node canonicalLangtag(String lexicalForm, String
langTag) {
String langTag2 = LangTag.canonical(langTag);
if ( langTag2.equals(langTag) )
--- End diff --
Here, node isn't passed in so it can't be returned. Style thing. Node is
already known to have a language tag so I don't like passing in a Node which
can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces
it to be the information for a language tagged literal.
It's tested at line 74
```
if ( n2 == null )
return node ;
```
and elsewhere conversion also sometimes returns `null` for "no conversion"
which means no new node is needed which is more efficient (meaureably).
> Make canonical literals lowercase language tags.
> ------------------------------------------------
>
> Key: JENA-1384
> URL: https://issues.apache.org/jira/browse/JENA-1384
> Project: Apache Jena
> Issue Type: Improvement
> Affects Versions: Jena 3.4.0
> Reporter: Elie Roux
> Assignee: Andy Seaborne
> Priority: Minor
> Fix For: Jena 3.6.0
>
>
> Please make an option so that canonicalLiterals follows the RDF 1.1
> definition of a canonical literal instead of the BCP-47 one. Right now for my
> dataset I have:
> - lower-cased value for JSON-LD output (as mandated by the JSON-LD spec
> following a RDF 1.1 option)
> - BCP-47 canonical value for TTL output if I make Jena canonicalize literals
> (which I want to, I want them to be uniform)
> - lower-cased value for TTL output if I choose not to canonicalize them
> So please allow for users just to use lower-case uniformly, so that there can
> be a homogeneous canonicalization among different outputs.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)