[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824663#comment-15824663
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
------------------------------------------

Github user ansell commented on the issue:

    https://github.com/apache/commons-rdf/pull/30
  
    In the RDF4J/Sesame case, we have had some users request, and some other 
users complain about , both lowercasing, which was used in the past, and 
canonicalisation, so RDF4J will default to leaving case alone, but any user is 
free to switch on the canonicalisation. Currently there isn't a 
lowercase-all-tags option, but that may also appear in the future.
    
    For reference, the language tag canonicalisation procedure that RDF4J 
optionally uses, which relies on the JDK's copy of the IANA Language Subtag 
Registry, is:
    
    ```
    new Locale.Builder().setLanguageTag(tag).build().toLanguageTag()
    ```
    
    There are other possible methods, but the method above is the only one that 
I could find which throws an error if the original tag is illformed.


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
>                 Key: COMMONSRDF-51
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
>             Project: Apache Commons RDF
>          Issue Type: Bug
>          Components: api
>    Affects Versions: 0.3.0
>            Reporter: Peter Ansell
>            Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to