[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858077#comment-15858077
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user asfgit closed the pull request at:

https://github.com/apache/commons-rdf/pull/30


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839778#comment-15839778
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
I propose now to merge this branch following COMMONSRDF-55 fixing. Thanks 
everyone!


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839771#comment-15839771
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
Thanks @afs , that makes sense, `JenaGraphImpl` was indeed using 
`graph.delete()`. 

I have fixed in both `JenaGraphImpl` and `JenaDatasetImpl`.  See comment - 
do you think there is much performance gain from not splitting into pattern but 
passing the original Jena Triple (safe only when there's no Literal object with 
langtag) - or shall we always use the pattern?

(e.g. would Jena TDB do things like get an internal Triple row ID out of 
the jena `Triple` for faster delete?)

I added tests for Dataset that reveals that statements in default graph 
come back from Jena in the named graph `` - that's a 
separate bug in `JenaDatasetImpl` and the converters - we should represent that 
always as `Optional.empty()` in Commons RDF land.



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834333#comment-15834333
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
I added equivalent tests for `Graph` add/contains/remove, and this fails 
for Jena:

```
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.commons.rdf.api.AbstractGraphTest.containsLanguageTagsCaseInsensitive(AbstractGraphTest.java:415)
...
```

basically if I add `"Hello"@EN-GB` to a Jena Graph and then try to remove 
`"Hello"@en-GB` then the statement remains in the graph as `"Hello"@EN-GB`.

How can we fix this? It would not be enough to just lowercase in all 
Commons RDF methods with Jena Literal language tags, as the Jena graph/model 
could be populated through other means (e.g. parsing a file).




> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833531#comment-15833531
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
OK, so then it makes sense for the Commons RDF tests to only care about the
value being preserved (whatever the case going in or out is upper or
lower), and that our .equals and .hashCode is based on lowercase in the
ROOT Locale.

We don't have equivalent tests if datatyped floats etc preserve their
specific syntactic value (e.g. "-.0"^^xsd:float) so we should not do that
for langtags either.

I'll modify the branch and merge.

On 21 Jan 2017 9:39 pm, "Andy Seaborne"  wrote:

> RDF 1.1 mentions:
>
>1. Turtle parsing - there is a lang tag rule.
>2. The text that conversion to a lowercase lexical is allowed.
>3. Value-comparison is case insensitive.
>
> Which is that test for? Lexical or value?
>
> At least acknowledging that RDF's "lowercase" is not in keeping with BCP
> 47 syntax canonicalization (the registry may change the characters)
> whatever the spec makes sense to me and I suspect domain experts; it's
> following the spec that "owns" language tags. Focus on the value 
comparison.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> ,
> or mute the thread
> 

> .
>



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833155#comment-15833155
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user afs commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
RDF 1.1 mentions:

1. Turtle parsing - there is a lang tag rule.
2. The text that conversion to a lowercase lexical is allowed.
3. Value-comparison is case insensitive.

Which is that test for? Lexical or value?

At least acknowledging that RDF's "lowercase" is not in keeping with BCP 47 
syntax canonicalization (the registry may change the characters) whatever the 
spec makes sense to me and I suspect domain experts; it's following the spec 
that "owns" language tags. Focus on the value comparison.



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832359#comment-15832359
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user ansell commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
Don't take what is not explicitly said in the standard to mean that it is 
disallowed.


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832139#comment-15832139
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
Right, BCP47 normalisation would make sense, but sadly that is not directly
permitted by RDF 1.1, only normalisation to lower case :-( - probably to
avoid dependency on the registry.

However I think we can try to make Commons RDF present a consistent RDF
1.1-compliant view, which I would think includes that creating a literal
with en-gb lowercase would return en-gb lowercase, also with RDF4J as back
end. (Would this require our wrapper LiteralImpl to always lowercase for
RDF4J?)

Can we extend the RDF4J test to also cover the other settings? How are they
provided? If the user is explicitly asking to go beyond the RDF standards,
then they should not be surprised if Commons RDF's view goes along with
that (or falls over), so then perhaps we don't need to worry about it here?
(e.g. Jena can be configured to support generalized RDF which don't work
well with the normal TripleImpl).



On 16 Jan 2017 9:52 pm, "Peter Ansell"  wrote:

*@ansell* commented on this pull request.

Looks fairly good to me. I disagree with the test assertion that disallows
normalisation using the BCP47 conventions (e.g., en-GB) in their
constructors, but it is a minor issue.
--

In api/src/test/java/org/apache/commons/rdf/api/AbstractRDFTest.java
:

> @@ -194,6 +194,114 @@ public void testCreateLiteralLangISO693_3() throws 
Exception {
 assertEquals("\"Herbert Van de Sompel\"@vls", 
vls.ntriplesString());
 }

+public void testCreateLiteralLangCaseInsensitive() throws Exception {

Does this need @Test  annotation?
--

In api/src/test/java/org/apache/commons/rdf/api/AbstractRDFTest.java
:

> @@ -194,6 +194,114 @@ public void testCreateLiteralLangISO693_3() throws 
Exception {
 assertEquals("\"Herbert Van de Sompel\"@vls", 
vls.ntriplesString());
 }

+public void testCreateLiteralLangCaseInsensitive() throws Exception {
+// COMMONSRDF-51: Literal langtag may not be in lowercase, but
+// must be COMPARED (aka .equals and .hashCode()) in lowercase
+// as the language space is lower case.
+final Literal lower = factory.createLiteral("Hello", "en-gb");
+final Literal upper = factory.createLiteral("Hello", "EN-GB");
+final Literal mixed = factory.createLiteral("Hello", "en-GB");
+
+
+assertEquals("en-gb", lower.getLanguageTag().get());

RDF4J may not follow this in some cases. It may use the BCP47 normalisation
conventions to obtain en-GB instead.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
,
or mute the thread


.



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--

[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824663#comment-15824663
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user ansell commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
In the RDF4J/Sesame case, we have had some users request, and some other 
users complain about , both lowercasing, which was used in the past, and 
canonicalisation, so RDF4J will default to leaving case alone, but any user is 
free to switch on the canonicalisation. Currently there isn't a 
lowercase-all-tags option, but that may also appear in the future.

For reference, the language tag canonicalisation procedure that RDF4J 
optionally uses, which relies on the JDK's copy of the IANA Language Subtag 
Registry, is:

```
new Locale.Builder().setLanguageTag(tag).build().toLanguageTag()
```

There are other possible methods, but the method above is the only one that 
I could find which throws an error if the original tag is illformed.


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824660#comment-15824660
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user afs commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
@ansell mentions one of the reasons the wording for RDF 1.1is not so direct 
- RDF 1.0 did not sanction the common normalization defined in BCP47 
canonicalization, although that actually requires consulting the registry as 
well.

Jena is lax by default, and retains the form as originally written. In 
practice, datasets seem to be internally consistent, all lower case or all 
syntax-canonical. 

Variations of case are different nodes in the general case but are 
`Node.sameValue` (compare) and cause matching in graph.find. Some storage 
layers may differ and canonicalize the form, in order to index.



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824636#comment-15824636
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/commons-rdf/pull/30#discussion_r96309546
  
--- Diff: api/src/test/java/org/apache/commons/rdf/api/AbstractRDFTest.java 
---
@@ -194,6 +194,114 @@ public void testCreateLiteralLangISO693_3() throws 
Exception {
 assertEquals("\"Herbert Van de Sompel\"@vls", 
vls.ntriplesString());
 }
 
+public void testCreateLiteralLangCaseInsensitive() throws Exception {
--- End diff --

Does this need @Test annotation?


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824368#comment-15824368
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
There seems to be consensus on 
http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/thread.html and 
http://lists.w3.org/Archives/Public/semantic-web/2017Jan/thread.html in the 
_Are literal language tags case sensitive?_ threads that it is not meant to be 
a change from RDF 1.0 - that language tags should still be compared case 
insensitively.

That should be inline with what this PR suggests - case insensitive in 
`.equals()` and `.hashCode()`

Do you agree on that line, @afs and @ansell ..?


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824366#comment-15824366
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

Github user stain commented on the issue:

https://github.com/apache/commons-rdf/pull/30
  
This pull request returns `getLanguageTag()` in whatever case the 
underlying platform does (e.g. I think RDF4J and JSONLD-Java preserves casing, 
while Jena and Simple converts to lowercase.

I think it is only in `.equals()` and `.hashCode()` we need case 
insensitivity.

There's arguments both ways if we should provide a consistent view across 
the implementations (e.g. always lowercase); or if we should provide a 
consistency with what the underlying implementation does (e.g. if it is 
preserves casing for presentation purposes). 

Commons RDF don't have any value handling mechanisms now for say 
converting`"13.37"^^xsd:float` to a Java float `13.37f` (without going through 
the underlying implementations and related methods); or determining value 
equality, so I think it is not too weird if  Commons RDF doesn't do anything 
clever about language tags either (beyond spec  compliance).

But if someone were to add a Common RDF API for such literal value 
handling, it could be natural to also add "utils" methods for presenting or 
parsing language tags (e.g. `isLanguageTagEqual("en-us", "en-US")` as well as 
hierarchical comparisons, something like `isSameLanguageTagFamily("en-us", 
"en-GB")`



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821169#comment-15821169
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
--

GitHub user stain opened a pull request:

https://github.com/apache/commons-rdf/pull/30

COMMONSRDF-51 language tags compared lower case

This fixes 
[COMMONSRDF-51](https://issues.apache.org/jira/browse/COMMONSRDF-51) - at least 
from `Literal.equals()` and `Literal.hashCode()`

Further test might be needed to verify consistent behaviour in `Graph` and 
`Dataset` if underlying framework does not correctly do langtag comparison in 
lower case (e.g. Turkish locale problem).

Please comment on the fixes and the suggested updated javadoc:

* 
[Literal.equals(Object)](http://stain.github.io/commons-rdf/COMMONSRDF-51/org/apache/commons/rdf/api/Literal.html#equals-java.lang.Object-)
* 
[Literal.hashCode()](http://stain.github.io/commons-rdf/COMMONSRDF-51/org/apache/commons/rdf/api/Literal.html#hashCode--)
* 
[Literal.getLanguageTag()](http://stain.github.io/commons-rdf/COMMONSRDF-51/org/apache/commons/rdf/api/Literal.html#getLanguageTag--)

For code improvements of this PR, feel free to push to the 
`COMMONSRDF-51-langtag-lcase` branch at 
https://git-wip-us.apache.org/repos/asf/commons-rdf.git or use the "Start 
review" mechanism in GitHub.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/commons-rdf COMMONSRDF-51-langtag-lcase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/commons-rdf/pull/30.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #30


commit 3064d219606cbe42c0150d81dbf6cdbc74bf7491
Author: Stian Soiland-Reyes 
Date:   2017-01-12T14:51:26Z

COMMONSRDF-51: compare language tags in lower case




> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821170#comment-15821170
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

Pull request ready for review:

https://github.com/apache/commons-rdf/pull/30

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820850#comment-15820850
 ] 

Andy Seaborne commented on COMMONSRDF-51:
-

This is confusing the lexical representation and the value space.

See the text:

[[
Two literals are term-equal ...
Thus, two literals can have the same value without being the same RDF term.
]]

I see no confusion of the character by character text - it is defining "same 
term", not "same value" which is what a case-insensitive lang tag comparison of 
use for as covered by the second part.

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820846#comment-15820846
 ] 

Andy Seaborne commented on COMMONSRDF-51:
-

"a-z and A-Z are permitted" -- not permitted, that is the range of allowed, no 
more.  Only ASCII characters are in a language tag.  Turtle does not allow any 
other characters.

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820820#comment-15820820
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

[BCP47 section 2.1.1|https://tools.ietf.org/html/bcp47#section-2.1.1] also 
clearly states case has no meaning, just convention. Therefore we should 
probably try to preserve the casing, but not use it for comparison:

{quote}
   At all times, language tags and their subtags, including private use
   and extensions, are to be treated as case insensitive: there exist
   conventions for the capitalization of some of the subtags, but these
   MUST NOT be taken to carry meaning.

   Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-
   cYrL-Mn" (or any other combination), and each of these variations
   conveys the same meaning: Mongolian written in the Cyrillic script as
   used in Mongolia.

   The ABNF syntax also does not distinguish between upper- and
   lowercase: the uppercase US-ASCII letters in the range 'A' through
   'Z' are always considered equivalent and mapped directly to their US-
   ASCII lowercase equivalents in the range 'a' through 'z'.  So the tag
   "I-AMI" is considered equivalent to that value "i-ami" in the
   'irregular' production.
{quote}

I'll push the branch as a pull request and make bugs for the Turkish issue 
upstream.


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820814#comment-15820814
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

Got one reply already on 
[public-rdf-comments|http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/thread.html],
 from [Richard 
Cyganiak|http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/0005.html]:

{quote}
RDF 2004 forced the language tag to be lower-cased in the abstract syntax. 
Implementations of RDF 2004 often did not do that, but retained the case when 
storing or transforming RDF, while still treating @en and @EN as equal. My 
recollection is that we wanted to change the language of the spec to make this 
behaviour legal. Unfortunately it seems the language came out less clear than 
it should be. I do not think that there was any intention to make @en and @EN 
not equal.
{quote}


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-12 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820811#comment-15820811
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

Yes, so both a-z and A-Z are permitted and valid in Turtle etc. However the 
spec also says (and here is the ambiguity against "character by character"):

> Lexical representations of language tags may be converted to lower case. 

and :

> The value space of language tags is always in lower case.

then doing a case-sensitive comparison sounds fragile, e.g. impl1 may do 
lowercase (e.g. Jena) and impl2 leave them as-is (e.g. JSON-LD) - and then that 
would break calls like graph.contains().

So even if my [public-rdf-comments 
question|http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/thread.html]
 concludes with case sensitivity, we would probably want to make Commons RDF do 
its best for lowercase comparisons anyway for consistent interoperability. In 
that case perhaps we should add tests also to the graph and datasets to ensure 
any "call-through" don't break literal equivalence.

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-11 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819001#comment-15819001
 ] 

Andy Seaborne commented on COMMONSRDF-51:
-

Languages tags in BCP 47 says "ALPHA" which is defined in RFC5234 as "A-Z, a-z".


> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-11 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818570#comment-15818570
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

TODO: Stian to report to public-rdf-comments@w3

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COMMONSRDF-51) RDF-1.1 specifies that language tags need to be compared using lower-case

2017-01-11 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818566#comment-15818566
 ] 

Stian Soiland-Reyes commented on COMMONSRDF-51:
---

I think this needs to be clarified on public-rdf-comme...@w3.org as our 
"character by character" is a [quote from the 
spec|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal]:

{quote}

Literal term equality: Two literals are term-equal (the same RDF literal) if 
and only if the two lexical forms, the two datatype IRIs, and the two language 
tags (if any) compare equal, character by character. Thus, two literals can 
have the same value without being the same RDF term. For example:

  "1"^^xs:integer
  "01"^^xs:integer

denote the same value, but are not the same literal RDF terms and are not 
term-equal because their lexical form differs.
{quote}

It also says above the value space is always in lower case, but then says 
equality is done "character by character" and not by value space.  (As that 
example shows, the lexical value of data types like integers are also compared 
by character instead of by value space)

I have nevertheless started a branch 
[COMMONSRDF-51-langtag-lcase|https://github.com/apache/commons-rdf/compare/COMMONSRDF-51-langtag-lcase]
 to try this out.. this revealed bugs in the bindings for simple (just the 
Turkish case), jsonld-java (which does no validation of language tags), rdf4j 
(fails Turkish test) and jena (fails Turkish test).

As both RDF4J and Jena are vulnerable to the Turkish case, that should be 
reported upstream after rdf-comments clarifications.

Would it make sense for Commons RDF to strengthen getLanguageTag() to ALWAYS 
return the language tag in lower case for any RDF implementations (e.g. 
normalize if implementation does not do it correctly internally) - as a kind of 
interoperability/RDF 1.1 measure - or should we strive to keep their current 
case representation as-is? 

> RDF-1.1 specifies that language tags need to be compared using lower-case
> -
>
> Key: COMMONSRDF-51
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
> Project: Apache Commons RDF
>  Issue Type: Bug
>  Components: api
>Affects Versions: 0.3.0
>Reporter: Peter Ansell
>Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)