Re: Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)
On 10/09/2021 19:23, Shaw, Ryan wrote: On 09/09/2021 23:32, Shaw, Ryan wrote: riot gives me the warning “Lexical form '' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, is a permitted value for gYear, representing 1 BCE. On Sep 10, 2021, at 6:25 AM, Andy Seaborne wrote: Command line riot? I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code. It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking. The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking. Workaround: "grep -v" of stderr will remove it Now logged as https://issues.apache.org/jira/browse/JENA-2158 Thanks, I will watch this issue. There is a constant to turn on XSD 1.1 schema mode for checking. It affects year , including the value of negative years, and some duration detection. Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)? The constant is in org.apache.jena.ext.xerces.impl.Constants (also need to change org.apache.jena.ext.xerces.jaxp.datatype/XMLGregorianCalendarImpl.java) There's a PR#1069 in-progress. It does not mean all arithmetic involving and indeed BCE dates will work. Xerces does not support XSD 1.1 "" year in its arithmetic support nor does the JDK in my testing. (And to everyone that points to java.time.* : useful for parsing to TemporalAccessors but it has a different concept of duration) I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]: IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part 2: Datatypes. That text is saying that XSD IRIs can't be redefined. The section above says "Any datatype definition that conforms to this abstraction MAY be used in RDF" -- so not a requirement. Andy Thanks, Ryan [1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes
Validation of XSD 1.1 datatypes (was Re: Using Xerces2 2.12.1 with Jena)
>> On 09/09/2021 23:32, Shaw, Ryan wrote: >> >> riot gives me the warning “Lexical form '' not valid for datatype XSD >> gYear”. But according to XSD 1.1 Part 2, is a permitted value for >> gYear, representing 1 BCE. > On Sep 10, 2021, at 6:25 AM, Andy Seaborne wrote: > > Command line riot? I am using riot --validate as a final QA check on some RDF generated by other (non-Jena) code. > It is just a warning the triple and it's object literal is still output from > the parser. From the command line "--nocheck" turns off the checking. The output is fine, but since I’m using riot specifically for validation I don’t want to turn off checking. > Now logged as > https://issues.apache.org/jira/browse/JENA-2158 Thanks, I will watch this issue. > There is a constant to turn on XSD 1.1 schema mode for checking. It affects > year , including the value of negative years, and some duration detection. Where is this constant? Does this mean I could write my own CLI tool to do validation with this flag set? (Or submit a PR for setting this constant via a riot command line option)? I wonder if this flag should be on by default, since RDF 1.1 Concepts and Abstract Syntax says [1]: > IRIs of the form http://www.w3.org/2001/XMLSchema#xxx, where xxx is the name > of a datatype, denote the built-in datatypes defined in XML Schema 1.1 Part > 2: Datatypes. Thanks, Ryan [1] https://www.w3.org/TR/rdf11-concepts/#xsd-datatypes
Re: Using Xerces2 2.12.1 with Jena
Hi Ryan, On 09/09/2021 23:32, Shaw, Ryan wrote: On Sep 9, 2021, at 4:00 PM, Andy Seaborne wrote: What is your usage scenario? riot gives me the warning “Lexical form '' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, is a permitted value for gYear, representing 1 BCE. Command line riot? Jena supports the XSD datatypes relevant to RDF independently of XML. I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this? I'm afraid it won't. This is warning happens for any input not just RDF/XML. The checking is happening after parsing. In Java code: RDFParser.checking(false). It is just a warning the triple and it's object literal is still output from the parser. From the command line "--nocheck" turns off the checking. Now logged as https://issues.apache.org/jira/browse/JENA-2158 Jena has it's own XSD parsing code which is copied from the internal implementation in Xerces 2.11.0, not going through public APIs. It was repackaged into the Jena code base so that any XML parser can be used, normally the JDK one. There is a constant to turn on XSD 1.1 schema mode for checking. It affects year , including the value of negative years, and some duration detection. It does not seem to affect other parts of the Xerces subsystem though. Between the JDK and Xerces code there are slight differences (the JDK does not, or at least did not, handle "T24:00:00" which is a legal time by XSD.) So some checking to do. Andy Thanks, Ryan
Re: Using Xerces2 2.12.1 with Jena
> On Sep 9, 2021, at 4:00 PM, Andy Seaborne wrote: > > What is your usage scenario? riot gives me the warning “Lexical form '' not valid for datatype XSD gYear”. But according to XSD 1.1 Part 2, is a permitted value for gYear, representing 1 BCE. > Jena supports the XSD datatypes relevant to RDF independently of XML. I had thought that the reason for this warning was that Jena was relying on the default Java implementation of XSD datatypes (which is 1.0 not 1.1). But I guess I was wrong, in which case using the Xerces2 parser would not resolve this? Thanks, Ryan
Re: Using Xerces2 2.12.1 with Jena
Hi Ryan, On 09/09/2021 17:08, Shaw, Ryan wrote: I would like to use Xerces2 2.12.1 (with support for XML Schema 1.1) as the Jena XML parser, specifically to gain support for XSD 1.1 datatypes. XML 1.1 != XSD 1.1 Jena supports the XSD datatypes relevant to RDF independently of XML. What is your usage scenario? I see at [1] that “any XML parser can be used with Jena … through the usual mechanism for adding to the application.” But I don’t know what that usual mechanism is. For Java: https://docs.oracle.com/en/java/javase/11/docs/api/java.xml/javax/xml/parsers/DocumentBuilderFactory.html#newInstance() The Xerces docs say something about the Java Endorsed Standards Override Mechanism, but elsewhere I see that this has been deprecated. What’s the recommended way to do this for Jena? Thanks, Ryan [1] https://issues.apache.org/jira/browse/JENA-341?focusedCommentId=16463591&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16463591