[ 
https://issues.apache.org/jira/browse/JENA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157925#comment-13157925
 ] 

Andy Seaborne commented on JENA-170:
------------------------------------

Henry - 

 Node n1 = SSE.parseNode("'AA'^^xsd:hexBinary") ;
 Node n2 = SSE.parseNode("' AA '^^xsd:hexBinary") ;
        
 System.out.println(n1.equals(n2)) ;             // ==> false
 System.out.println(n1.sameValueAs(n2)) ;  // ==> true

The same would be true for Literal.sameValueAs.

You are right that xsd:hexBinary has the whitespace facet enabled (oddly, so 
does xsd:anyURI).

Jena keeps the lexical for the literal as given, and in creating nodes, it does 
not modify the presented lexicial form (one eception rdf:XMLLiterals, because 
parseType="literal" requires XC14N to be applied).  RDF/XML makes the lexical 
form of a literal to be the text of the XML element, which does not apply XSD 
rules.  The RDF abstract syntax is agnostic to value processing (i.e. 
D-entailment).  

There is no XML scheme processing in parsing RDF/XML so no applying the 
whitespace facet.

This case is the same as integers 0001 and 1.  Same value but different lexical 
forms so different RDF literals.  I guess Clerezza uses .equals not 
.sameValueAs.

With the emergence of Turtle, this situation will be messier.

We have talked about canonicalization of all input (see 
org.openjena.riot.pipeline.normalize.CanonicalizeLiteral).  Whitespace 
processing could be included.  Canonicalization is not free thiorugh 

But loosing the layout on large xsd:hexBinary/xsd:base64Binary might mean 
needing to teach the writers to layout these literals.

The situation in ARQ is that in basic graph pattern matching, matching is by 
exact node equality (simple entailment unless you are using a reasoner).

Filters however do value testing for certain well-known datatypes.  ARQ adds 
various types over the minimum required by SPARQL - it adds the Gregorian dates 
(gYear, gMonthyDay etc), xsd:date, and XSD durations.  It does not include 
xsd:hexBinary though.


                
> hexBinary whitespace issue
> --------------------------
>
>                 Key: JENA-170
>                 URL: https://issues.apache.org/jira/browse/JENA-170
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ, Jena, RDF/XML
>         Environment: 2.6.4
>            Reporter: Henry Story
>
> As I understand, initial and final white spaces in xsd:hexBinary in xml 
> should be ignored
>    http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#hexBinary
>  
> because of the whitespace facet.
> With Jena 2.6.4 this is not the case, as shown by the test below. 
> I found that in Clerezza when using the graph api, so this is a problem even 
> when one does not use SPARQL.
> Removing the white space solves the proble. 
> xsd:hexBinary is already a very fragile encoding. Making it this fragile is 
> bound to lead to issues in communication.
> The same is true with the N3 encoding.
> -----------------------------------------------------------------
> hjs@bblfish[0]$ cat q1.sparql 
> PREFIX : <http://me.example/p#> 
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
> SELECT ?S WHERE {
>   ?S :related "AAAA"^^xsd:hexBinary .
> }
> hjs@bblfish[0]$ cat c1.rdf 
> <rdf:RDF xmlns="http://me.example/p#";
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
>     <rdf:Description rdf:about="http://me.example/p#me";>
>         <related rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary";>
> AAAA
> </related>
>     </rdf:Description>
> </rdf:RDF>
> hjs@bblfish[0]$ arq --query=q1.sparql --data=c1.rdf
> -----
> | S |
> =====
> -----

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to