Re: Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)

2021-09-10 Thread Andy Seaborne




On 10/09/2021 19:23, Shaw, Ryan wrote:



On 09/09/2021 23:32, Shaw, Ryan wrote:

riot gives me the warning “Lexical form '' not valid for datatype XSD 
gYear”. But according to XSD 1.1 Part 2,  is a permitted value for gYear, 
representing 1 BCE.



On Sep 10, 2021, at 6:25 AM, Andy Seaborne  wrote:

Command line riot?


I am using riot --validate as a final QA check on some RDF generated by other 
(non-Jena) code.


It is just a warning the triple and it's object literal is still output from the parser. 
From the command line "--nocheck" turns off the checking.


The output is fine, but since I’m using riot specifically for validation I 
don’t want to turn off checking.


Workaround:

"grep -v" of stderr will remove it


Now logged as
https://issues.apache.org/jira/browse/JENA-2158


Thanks, I will watch this issue.


There is a constant to turn on XSD 1.1 schema mode for checking. It affects 
year , including the value of negative years, and some duration detection.


Where is this constant? Does this mean I could write my own CLI tool to do 
validation with this flag set? (Or submit a PR for setting this constant via a 
riot command line option)?


The constant is in org.apache.jena.ext.xerces.impl.Constants

(also need to change 
org.apache.jena.ext.xerces.jaxp.datatype/XMLGregorianCalendarImpl.java)


There's a PR#1069 in-progress.

It does not mean all arithmetic involving  and indeed BCE dates will 
work. Xerces does not support XSD 1.1 "" year in its arithmetic 
support nor does the JDK in my testing.


(And to everyone that points to java.time.* : useful for parsing to 
TemporalAccessors but it has a different concept of duration)




I wonder if this flag should be on by default, since RDF 1.1 Concepts and 
Abstract Syntax says [1]:


IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of 
a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: 
Datatypes.


That text is saying that XSD IRIs can't be redefined. The section above 
says  "Any datatype definition that conforms to this abstraction MAY be 
used in RDF" -- so not a requirement.


Andy



Thanks,
Ryan

[1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes




Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)

2021-09-10 Thread Shaw, Ryan

>> On 09/09/2021 23:32, Shaw, Ryan wrote:
>> 
>> riot gives me the warning “Lexical form '' not valid for datatype XSD 
>> gYear”. But according to XSD 1.1 Part 2,  is a permitted value for 
>> gYear, representing 1 BCE.

> On Sep 10, 2021, at 6:25 AM, Andy Seaborne  wrote:
> 
> Command line riot?

I am using riot --validate as a final QA check on some RDF generated by other 
(non-Jena) code.

> It is just a warning the triple and it's object literal is still output from 
> the parser. From the command line "--nocheck" turns off the checking.

The output is fine, but since I’m using riot specifically for validation I 
don’t want to turn off checking.

> Now logged as
> https://issues.apache.org/jira/browse/JENA-2158

Thanks, I will watch this issue.

> There is a constant to turn on XSD 1.1 schema mode for checking. It affects 
> year , including the value of negative years, and some duration detection.

Where is this constant? Does this mean I could write my own CLI tool to do 
validation with this flag set? (Or submit a PR for setting this constant via a 
riot command line option)?

I wonder if this flag should be on by default, since RDF 1.1 Concepts and 
Abstract Syntax says [1]:

> IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name 
> of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 
> 2: Datatypes. 

Thanks,
Ryan

[1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes




Re: Using Xerces2 2.12.1 with Jena

2021-09-10 Thread Andy Seaborne

Hi Ryan,



On 09/09/2021 23:32, Shaw, Ryan wrote:



On Sep 9, 2021, at 4:00 PM, Andy Seaborne  wrote:

What is your usage scenario?


riot gives me the warning “Lexical form '' not valid for datatype XSD 
gYear”. But according to XSD 1.1 Part 2,  is a permitted value for gYear, 
representing 1 BCE.


Command line riot?




Jena supports the XSD datatypes relevant to RDF independently of XML.


I had thought that the reason for this warning was that Jena was relying on the 
default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I 
guess I was wrong, in which case using the Xerces2 parser would not resolve 
this?


I'm afraid it won't.

This is warning happens for any input not just RDF/XML. The checking is 
happening after parsing.


In Java code: RDFParser.checking(false).

It is just a warning the triple and it's object literal is still output 
from the parser. From the command line "--nocheck" turns off the checking.


Now logged as
https://issues.apache.org/jira/browse/JENA-2158

Jena has it's own XSD parsing code which is copied from the internal 
implementation in Xerces 2.11.0, not going through public APIs. It was 
repackaged into the Jena code base so that any XML parser can be used, 
normally the JDK one.


There is a constant to turn on XSD 1.1 schema mode for checking. It 
affects year , including the value of negative years, and some 
duration detection. It does not seem to affect other parts of the Xerces 
subsystem though.


Between the JDK and Xerces code there are slight differences (the JDK 
does not, or at least did not, handle "T24:00:00" which is a legal time 
by XSD.) So some checking to do.


Andy



Thanks,
Ryan



Re: Using Xerces2 2.12.1 with Jena

2021-09-09 Thread Shaw, Ryan

> On Sep 9, 2021, at 4:00 PM, Andy Seaborne  wrote:
> 
> What is your usage scenario?

riot gives me the warning “Lexical form '' not valid for datatype XSD 
gYear”. But according to XSD 1.1 Part 2,  is a permitted value for gYear, 
representing 1 BCE.

> Jena supports the XSD datatypes relevant to RDF independently of XML.

I had thought that the reason for this warning was that Jena was relying on the 
default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I 
guess I was wrong, in which case using the Xerces2 parser would not resolve 
this?

Thanks,
Ryan



Re: Using Xerces2 2.12.1 with Jena

2021-09-09 Thread Andy Seaborne

Hi Ryan,

On 09/09/2021 17:08, Shaw, Ryan wrote:

I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the 
Jena XML parser, specifically to gain support for XSD 1.1 datatypes.


XML 1.1 != XSD 1.1

Jena supports the XSD datatypes relevant to RDF independently of XML.

What is your usage scenario?


I see at [1] that “any XML parser can be used with Jena … through the usual 
mechanism for adding to the application.”

But I don’t know what that usual mechanism is.


For Java:

https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/parsers/DocumentBuilderFactory.html#newInstance()



The Xerces docs say something about the Java Endorsed Standards Override 
Mechanism, but elsewhere I see that this has been deprecated.

What’s the recommended way to do this for Jena?

Thanks,
Ryan

[1] 
https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591