[jira] [Commented] (XERCESJ-1726) Possible Bug: Xerces 2.12.1 for XML Validation with XSD 1.1 Schema under Java

J Morris (Jira) Fri, 24 Dec 2021 17:03:06 -0800


    [ 
https://issues.apache.org/jira/browse/XERCESJ-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465116#comment-17465116
 ]


J Morris commented on XERCESJ-1726:
-----------------------------------

{{Hi,

This is all a long time ago now. That is not to say that I have lost interest 
though.

My first observation is that the assert that you quoted in your comment does 
not appear to correspond to the one in the *test1.xsd* file in the *testX.zip* 
attached to the bug report.

The one that you quoted was:

<xs:assert test="./text()[matches(.,'^([a-z]
{2}[a-z]?-[A-Z]{2}

\.((((com)|(lib)|(mod([a-z][a-z0-9])?)|(plg[a-z][a-z0-9](-[a-z0-9][a-z0-9])*)|(tpl))_[a-z][a-z0-9](\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z]
{2}[a-z]?-[A-Z]{2}

\.xml)|(install\.xml)$')]"/>

whereas my one in the *test1.xsd* file was:

<xs:assert 
test="./text()[matches(.,'^([a-z]{2}[a-z]?-[A-Z]{2}\.((((com)|(lib)|(mod(_[a-z][a-z0-9]+)?)|(plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*)|(tpl))_[a-z][a-z0-9]+(\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z]{2}[a-z]?-[A-Z]{2}\.xml)|(install\.xml)$')]"/>

You will notice differences in that some of the key repeat counts in the regex 
are missing in your version, as well as some of the underscores ("_").

The intention of my version was to match lines with *text()* in any of the 
following formats:

1)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.com_[a-z][a-z0-9]+(\.sys)?\.ini$
2)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.lib_[a-z][a-z0-9]+(\.sys)?\.ini$
3)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.mod(_[a-z][a-z0-9]+)?_[a-z][a-z0-9]+(\.sys)?\.ini$
4)  
^[a-z]{2}[a-z]?-[A-Z]{2}\.plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*_[a-z][a-z0-9]+(\.sys)?\.ini$
5)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.tpl_[a-z][a-z0-9]+(\.sys)?\.ini$
6)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.css$
7)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.ini$
8)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.localise\.php$
9)  ^[a-z]{2}[a-z]?-[A-Z]{2}\.xml$
10) ^install\.xml$

Note: The pattern *[a-z]{2}[a-z]?-[A-Z]{2}* is a language prefix (of at least 
two lowercase alphabetic characters followed by a minus followed by exactly two 
uppercase alphabetic characters) in the same spirit as locale settings in 
operating systems.

This set of 10 patterns was supposed to match the *text()* entries for ALL of 
lines in the *test1.xml* file so it was a surprise to me to get any errors 
reported.

Thank you for your continuing interest.}}

> Possible Bug: Xerces 2.12.1 for XML Validation with XSD 1.1 Schema under Java
> -----------------------------------------------------------------------------
>
>                 Key: XERCESJ-1726
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1726
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: Samples
>    Affects Versions: 2.12.1
>         Environment: Windows 7
> Java 1.8.0_261
> Xerces-J 2.12.1
>            Reporter: J Morris
>            Priority: Major
>              Labels: test
>         Attachments: testX.zip, test_cases_ mukul.zip
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I have recently been trying to validate the XML file *test1.xml* with a 
> schema *test.xsd* containing *assert*/*assertion* constructs, using the 
> sample program *jaxp.SourceValidator*.
> Unexpectedly, the result was several reported errors in what appeared to be 
> syntactically correct and valid XML lines (*test1.xml*: 9 errors).
> After significant experimentation, it appeared that these errors were 
> occurring at line numbers which the validation found troublesome. Inserting 
> an extra line at one of the troublesome line numbers made the previously 
> erroneous line (now *not* appearing at a troublesome line number) pass 
> validation. On the other hand, the newly inserted line (occupying the 
> troublesome line number) would fail validation.
> I tentatively interpreted this as meaning that *the validation errors were 
> not real* and began to try to develop a test-case, as similar as possible to 
> *test1.xml*, but which passed validation. The result was *test2.xml*, which 
> was generated from *test1.xml* by inserting XML comment lines at each of the 
> troublesome line numbers, thereby displacing the previously erroneous lines 
> to non-trooublesome line numbers. Since XML comment lines do not require 
> validation, this file passed validation for me (*test2.xml*: 0 errors).
> I then contacted Mukul Gandhi and he re-ran my validations *but came to a 
> different result*. He saw errors in both XML files (*test1.xml*: 9 errors; 
> *test2.xml*: 18 errors). Despite our joint efforts to achieve convergence 
> between our respective validation runs, we have not so far succeeded.
> Mukul did point out a couple of things:
> 1) The way that I was using the "matches" function in the *assert* 
> constructs. His experience suggested that this was unreliable. However, I was 
> not certain whether this would have led to the type of behaviour that I was 
> seeing (apparent troublesome line numbers).
> 2) He found that certain characters (probably the two accented French 
> characters) in my XML files were not supported in the default XML encoding 
> scheme, UTF-8. However, for me, no errors were reported for those by the 
> validation program *jaxp.SourceValidator*.
> I would be very gratefull foe some help in getting to the bottom of this 
> (both the original behaviour and the discrepancies with Mukul's validation 
> runs).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (XERCESJ-1726) Possible Bug: Xerces 2.12.1 for XML Validation with XSD 1.1 Schema under Java

Reply via email to