[
https://issues.apache.org/jira/browse/XERCESJ-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17465116#comment-17465116
]
J Morris commented on XERCESJ-1726:
-----------------------------------
{{Hi,
This is all a long time ago now. That is not to say that I have lost interest
though.
My first observation is that the assert that you quoted in your comment does
not appear to correspond to the one in the *test1.xsd* file in the *testX.zip*
attached to the bug report.
The one that you quoted was:
<xs:assert test="./text()[matches(.,'^([a-z]
{2}[a-z]?-[A-Z]{2}
\.((((com)|(lib)|(mod([a-z][a-z0-9])?)|(plg[a-z][a-z0-9](-[a-z0-9][a-z0-9])*)|(tpl))_[a-z][a-z0-9](\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z]
{2}[a-z]?-[A-Z]{2}
\.xml)|(install\.xml)$')]"/>
whereas my one in the *test1.xsd* file was:
<xs:assert
test="./text()[matches(.,'^([a-z]{2}[a-z]?-[A-Z]{2}\.((((com)|(lib)|(mod(_[a-z][a-z0-9]+)?)|(plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*)|(tpl))_[a-z][a-z0-9]+(\.sys)?(\.ini))|(ini)|(css)|(localise\.php)))|([a-z]{2}[a-z]?-[A-Z]{2}\.xml)|(install\.xml)$')]"/>
You will notice differences in that some of the key repeat counts in the regex
are missing in your version, as well as some of the underscores ("_").
The intention of my version was to match lines with *text()* in any of the
following formats:
1) ^[a-z]{2}[a-z]?-[A-Z]{2}\.com_[a-z][a-z0-9]+(\.sys)?\.ini$
2) ^[a-z]{2}[a-z]?-[A-Z]{2}\.lib_[a-z][a-z0-9]+(\.sys)?\.ini$
3) ^[a-z]{2}[a-z]?-[A-Z]{2}\.mod(_[a-z][a-z0-9]+)?_[a-z][a-z0-9]+(\.sys)?\.ini$
4)
^[a-z]{2}[a-z]?-[A-Z]{2}\.plg_[a-z][a-z0-9]+(\-[a-z0-9][a-z0-9]+)*_[a-z][a-z0-9]+(\.sys)?\.ini$
5) ^[a-z]{2}[a-z]?-[A-Z]{2}\.tpl_[a-z][a-z0-9]+(\.sys)?\.ini$
6) ^[a-z]{2}[a-z]?-[A-Z]{2}\.css$
7) ^[a-z]{2}[a-z]?-[A-Z]{2}\.ini$
8) ^[a-z]{2}[a-z]?-[A-Z]{2}\.localise\.php$
9) ^[a-z]{2}[a-z]?-[A-Z]{2}\.xml$
10) ^install\.xml$
Note: The pattern *[a-z]{2}[a-z]?-[A-Z]{2}* is a language prefix (of at least
two lowercase alphabetic characters followed by a minus followed by exactly two
uppercase alphabetic characters) in the same spirit as locale settings in
operating systems.
This set of 10 patterns was supposed to match the *text()* entries for ALL of
lines in the *test1.xml* file so it was a surprise to me to get any errors
reported.
Thank you for your continuing interest.}}
> Possible Bug: Xerces 2.12.1 for XML Validation with XSD 1.1 Schema under Java
> -----------------------------------------------------------------------------
>
> Key: XERCESJ-1726
> URL: https://issues.apache.org/jira/browse/XERCESJ-1726
> Project: Xerces2-J
> Issue Type: Bug
> Components: Samples
> Affects Versions: 2.12.1
> Environment: Windows 7
> Java 1.8.0_261
> Xerces-J 2.12.1
> Reporter: J Morris
> Priority: Major
> Labels: test
> Attachments: testX.zip, test_cases_ mukul.zip
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> I have recently been trying to validate the XML file *test1.xml* with a
> schema *test.xsd* containing *assert*/*assertion* constructs, using the
> sample program *jaxp.SourceValidator*.
> Unexpectedly, the result was several reported errors in what appeared to be
> syntactically correct and valid XML lines (*test1.xml*: 9 errors).
> After significant experimentation, it appeared that these errors were
> occurring at line numbers which the validation found troublesome. Inserting
> an extra line at one of the troublesome line numbers made the previously
> erroneous line (now *not* appearing at a troublesome line number) pass
> validation. On the other hand, the newly inserted line (occupying the
> troublesome line number) would fail validation.
> I tentatively interpreted this as meaning that *the validation errors were
> not real* and began to try to develop a test-case, as similar as possible to
> *test1.xml*, but which passed validation. The result was *test2.xml*, which
> was generated from *test1.xml* by inserting XML comment lines at each of the
> troublesome line numbers, thereby displacing the previously erroneous lines
> to non-trooublesome line numbers. Since XML comment lines do not require
> validation, this file passed validation for me (*test2.xml*: 0 errors).
> I then contacted Mukul Gandhi and he re-ran my validations *but came to a
> different result*. He saw errors in both XML files (*test1.xml*: 9 errors;
> *test2.xml*: 18 errors). Despite our joint efforts to achieve convergence
> between our respective validation runs, we have not so far succeeded.
> Mukul did point out a couple of things:
> 1) The way that I was using the "matches" function in the *assert*
> constructs. His experience suggested that this was unreliable. However, I was
> not certain whether this would have led to the type of behaviour that I was
> seeing (apparent troublesome line numbers).
> 2) He found that certain characters (probably the two accented French
> characters) in my XML files were not supported in the default XML encoding
> scheme, UTF-8. However, for me, no errors were reported for those by the
> validation program *jaxp.SourceValidator*.
> I would be very gratefull foe some help in getting to the bottom of this
> (both the original behaviour and the discrepancies with Mukul's validation
> runs).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]