[ 
https://issues.apache.org/jira/browse/JENA-1384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262293#comment-16262293
 ] 

ASF GitHub Bot commented on JENA-1384:
--------------------------------------

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/308#discussion_r152531097
  
    --- Diff: 
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
 ---
    @@ -73,6 +76,36 @@ public Node apply(Node node) {
             return n2 ;
         }
         
    +    /** Convert the lexical form to a canonical form if one of the known 
datatypes,
    +     * otherwise return the node argument. (same object :: {@code ==})  
    +     */
    +    public static Node canonicalValue(Node node) {
    +        if ( ! node.isLiteral() )
    +            return node ;
    +        // Fast-track
    +        if ( NodeUtils.isLangString(node) )
    +            return node;
    +        if ( NodeUtils.isSimpleString(node) )
    +            return node;
    +
    +        if ( ! 
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
    +            // Invalid lexical form for the datatype - do nothing.
    +            return node;
    +            
    +        RDFDatatype dt = node.getLiteralDatatype() ;
    +        // Datatype, not rdf:langString (RDF 1.1). 
    +        DatatypeHandler handler = dispatch.get(dt) ;
    +        if ( handler == null )
    +            return node ;
    +        Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
    +        if ( n2 == null )
    +            return node ;
    +        return n2 ;
    +    }
    +    
    +    /** Convert the language tag of a lexical form to a canonical form if 
one of the known datatypes,
    +     * otherwise return the node argument. (same object; compare by {@code 
==})  
    +     */
         private static Node canonicalLangtag(String lexicalForm, String 
langTag) {
             String langTag2 = LangTag.canonical(langTag);
             if ( langTag2.equals(langTag) )
    --- End diff --
    
    Here, node isn't passed in so it can't be returned. Style thing. Node is 
already known to have a language tag so I don't like passing in a Node which 
can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces 
it to be the information for a language tagged literal.
    
    It's tested at line 74
    ```
            if ( n2 == null )
                return node ;
    ```
    and elsewhere conversion also sometimes returns `null` for "no conversion" 
which means no new node is needed which is more efficient (meaureably).



> Make canonical literals lowercase language tags.
> ------------------------------------------------
>
>                 Key: JENA-1384
>                 URL: https://issues.apache.org/jira/browse/JENA-1384
>             Project: Apache Jena
>          Issue Type: Improvement
>    Affects Versions: Jena 3.4.0
>            Reporter: Elie Roux
>            Assignee: Andy Seaborne
>            Priority: Minor
>             Fix For: Jena 3.6.0
>
>
> Please make an option so that canonicalLiterals follows the RDF 1.1 
> definition of a canonical literal instead of the BCP-47 one. Right now for my 
> dataset I have:
> - lower-cased value for JSON-LD output (as mandated by the JSON-LD spec 
> following a RDF 1.1 option)
> - BCP-47 canonical value for TTL output if I make Jena canonicalize literals 
> (which I want to, I want them to be uniform)
> - lower-cased value for TTL output if I choose not to canonicalize them
> So please allow for users just to use lower-case uniformly, so that there can 
> be a homogeneous canonicalization among different outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to