[jira] [Commented] (JENA-170) hexBinary whitespace issue

Andy Seaborne (Commented) (JIRA) Sun, 27 Nov 2011 13:16:02 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158021#comment-13158021
 ]


Andy Seaborne commented on JENA-170:
------------------------------------

The .sameValueAs() method tests value, .equals() tests identify of lexical 
form.  You call the one you want.

Presumably Clerezza is calling .equals() or it's a persistent storage layer.  
TDB doesn't handle xsd:hexBinary as a value-based type.  It only handles 
numeric types, dates, dateTimes, Gregorian dates.

And it is an RDF datatype.

RDF datatypes are declared in RDF/XML with rdf:datatype="....." -- an RDF 
mechanism, which is open.  There isn't a fixed set of datatypes like XML Schema 
Datatypes.

XML datatypes use external declaration or xsi:type.  I believe that xsi:type 
can only refer to an XSD datatype.
It only applies to XML.  

RDF isn't the XML document model and isn't necessarily in XML (c.f. Turtle).  
There may be other reasons the XML Schema datatype syntax was not applicable - 
I wasn't there at the time.  Timing might be part of it - RDF finished Feb 
2004, XQuery/Xpath data model is Jan 2007 with earliest candidate rec Nov 2005.

SPARQL (and RDF by encouragement) uses the data model from XSD datatypes 
(lexical/value mapping), but not the syntax.

Jena memory models do support a lot of value-based matching but this is costly. 
 They support matching xsd:hexBinary by value if you call .sameValueAs; if you 
call .equals, you get strict equality.  "001"^^xsd:integer and "1":;xsd:integer 
are, at the lowest level of the RDF abstract data model different.  It could be 
an RDF datatype that has never been met before -- "IIII"^^my:roman and 
"IV"^^my:roman.

Users ask that reading in and writing out data does not change the format; the 
memory model keeps both forms around which is OK for numerics, but 
xsd:hexBinary can be large blobs, which is unfortunate.

Canonicalization is a technique that emphasises the value at the expense of 
loosing different forms in different places in the data.  A tradeoff.

Jena persistent storage layers don't keep both value and lexical form about.  
Indexing does not work.

Instead, TDB stores the value of numeric types, dates, dateTimes, Gregorian 
dates (in binary).  It rebuilds nodes as their canonical form.
TDB does not do anything special for xsd:hexBinary, typically used a blobs so 
does not do value-based matching, only lexical form matching.

It could be added - users also want round-trip of layout.

                
> hexBinary whitespace issue
> --------------------------
>
>                 Key: JENA-170
>                 URL: https://issues.apache.org/jira/browse/JENA-170
>             Project: Jena
>          Issue Type: Bug
>          Components: ARQ, Jena, RDF/XML
>         Environment: 2.6.4
>            Reporter: Henry Story
>            Assignee: Andy Seaborne
>            Priority: Minor
>
> As I understand, initial and final white spaces in xsd:hexBinary in xml 
> should be ignored
>    http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#hexBinary
>  
> because of the whitespace facet.
> With Jena 2.6.4 this is not the case, as shown by the test below. 
> I found that in Clerezza when using the graph api, so this is a problem even 
> when one does not use SPARQL.
> Removing the white space solves the proble. 
> xsd:hexBinary is already a very fragile encoding. Making it this fragile is 
> bound to lead to issues in communication.
> The same is true with the N3 encoding.
> -----------------------------------------------------------------
> hjs@bblfish[0]$ cat q1.sparql 
> PREFIX : <http://me.example/p#> 
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
> SELECT ?S WHERE {
>   ?S :related "AAAA"^^xsd:hexBinary .
> }
> hjs@bblfish[0]$ cat c1.rdf 
> <rdf:RDF xmlns="http://me.example/p#";
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
>     <rdf:Description rdf:about="http://me.example/p#me";>
>         <related rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary";>
> AAAA
> </related>
>     </rdf:Description>
> </rdf:RDF>
> hjs@bblfish[0]$ arq --query=q1.sparql --data=c1.rdf
> -----
> | S |
> =====
> -----

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-170) hexBinary whitespace issue

Reply via email to