Hi Udayanga,
Here's a quick overview on how schemas are processed in Xerces-J.
1. We preprocess all include, import and redefine elements and create all
the necessary grammar objects. This way we have all the schema information
handy (constructTrees)
2. We then go through all global declarations in each schema document we
preprocessed. This is where we do some redefine preprocessing where we
change the names of redefined components. This process will also store the
first occurrence of each global component, whether it's a type, an
element, an attribute, etc. (key is a concat of a component name and its
namespace) as well as the corresponding schema document for that
component. This way you can easily check for duplicates and know which
global component to process when it's referred to by another component
during processing of the actual components (e.g. <element name="a"
type="ns:type1"/>). In that case when processing element, "a" we can get
access to the representation of "type1" if it was not yet processed. This
of course will cause some problems with xs:override. Since xs:override now
has precedence. So, the logic will need to change to take xs:override into
consideration.
3. We then go and process all global components in each schema document we
have preprocessed. A global component is any schema component (excluding
<include>, <import>, <redefine>. and <override>) that's a child of the
<schema> element, e.g.
<xs:schema .....>
...
<xs:element name="child1" type="xs:string"/>
<xs:complexType name="ctype1">
<xs:sequence>
<xs:element name="gchild1" type="xs:int">
</xs:sequence>
</xs:complexType>
</xs:schema>
So, we will process element "child1" and type "ctype1".
4. We then process any local elements components. A local element is a
usually a child of a component such as complex type or group. So, in the
above example "gchild1" is a local element. We would process that element
after we have processed all global components
When implementing xs:override, a lot of considerations has to be taken
care of during preprocessing. We also need to make sure that when
processing a global component or a local component that refers to a
component that is being overriden, that we use the overriding component.
Regards,
Khaled
From:
udayanga wickramasinghe <[email protected]>
To:
[email protected]
Date:
03/27/2010 07:08 PM
Subject:
Re: About Xerces projects for GSoc 2010
Hi Khaled,
I went through quite a bit of xerces interfaces/implementations (and
samples ie:-xs.QueryXS , jaxp.SourceValidator ,xni.parser ,etc) related to
my project ,including the one's you have mentioned .Now I have a fair
amount of understanding on how Xerces parsers
works,configurations,pipes,core interfaces,etc.. From what i have
gathered so far i'm trying to outline the Xerces process of schema
processing/loading and instance validation as follows...please correct me
if my understanding of them are wrong...
XMLSchemaValidator--> instance documents are parsed and validated later in
the pipeline( of ie:-XML11Configuration parser ) against the loaded
schemas (ie:-loaded by XMLScehmaLoader) for schema documents...
XMLSchemaFactory ,XMLScehmaLoader --> loads xml schema from various .xsd
sources and initiates a Grammer/pool ()from the provided schema documents
to do this ,i suppose Schema Loader wraps an XSDHandler class ,which is
responsible for parsing each schema source(ie:-using SchemaDomParser) ,
resolving and loading grammer (in #parseSchema(...) ) ,etc respective to
each schema document source...
i see several stages present in XSDHandler's primary method
,#parseSchema(...) . (ie:-construcutngTees,build
globalRegistries,travesersSchemastraverseLocal,etc...)...
it seems #constructTrees is a very important method (although i dont hv a
complete understanding on it's exact semantics..) to xs:override
implementation since it tries to resolve included,redfined schema
components and build a dependency map (havin said that , can we build a
custom dependency map/s here for override components as well for
processing necessary schema semantics...??) .I see Hiranya's patch focused
on this method mostly.....
It would be very helpful if you could explain these schema parsing
stages(in #parseSchema(...)) in little detail and how it relate to
xs:override implementation so that i could have a better understanding and
put together the pieces of the puzzle.......also i do want to clarify,as
mentioned in #buildGlobalNameRegistries() ,#traverseLocalElements() ,etc
...what is meant by "globaly"and "locally" declared components...thnx in
advance...
I am currently in the process of writing a GSoc proposal for xs:override
implementation. i'll make it available as a draft proposal to you asap
,so i could do discuss necessary modifications (if required) with you and
put final submission in place...(also deadline for GSoc proposals are on
April9 )....thnx again..
Regards,
Udayanga
On Thu, Mar 25, 2010 at 2:37 AM, Khaled Noaman <[email protected]> wrote:
Hi Udayanga,
See my comments below (<kn>).
Regards,
Khaled
From:
udayanga wickramasinghe <[email protected]>
To:
[email protected]
Date:
03/24/2010 04:46 PM
Subject:
Re: About Xerces projects for GSoc 2010
Hi Khaled,
thnx for your feedback. please see some of my comments, below...
On Wed, Mar 24, 2010 at 9:13 PM, Khaled Noaman <[email protected]> wrote:
Hi Udayanga,
According to your example, B and B' would be considered 2 different
documents and we would end up with conflicting components, not just e2
(assuming B.xsd had other global components). The reason we consider B and
B' as different documents is the facet that B' now contain a different
declaration for e2.
If xs:override did not apply from C->B, then in that case we can consider
B and B' the same and there would be no duplicate components.
yes.....just to verify...what you mean is like , if B.xsd has the
following format ,
Schema B
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="schemaC.xsd">
<xs:element name="e1" type="xs:int"/>
<xs:element name="e2" type="xs:date"/>
<xs:override>
<xs:element name="e3" type="xs:string"/>
</xs:schema>
then C->B overrride won't ocuur , since overrdden schema B.xsd , dont have
either element e2 or e1 for override. Hence no schematic difference in [B]
and [B'] and schema inclusion for both A.xsd and C.xsd would be
idential...i suppose this is what you meant...
<kn>Yes. That's what I meant</kn>
You would need to apply override to check for cyclical dependencies.As I
mentioned above if override does not modify the overridden schema, then 2
similar schema documents (B and B') would be treated as similar (in other
words, no duplicates).
As from the above example , nw i see after only applying override we can
definitely say for sure whether there exists cyclic dependency
conflicts..
Consider the following case:
A include B and C, B and C override D. Now you end up with 2 versions of D
(D' included by B and D'' included by C). If neither B or C changes D,
then both D' and D'' are considered the same.
It would be great if you can start by looking at the following packages in
Xerces-J:
* org.apache.xerces.impl.xs.traversers - schema processing classes
(XSDHandler is a starting point)
* org.apache.xerces.impl.xs - classes representing the different schema
components as well the main class for schema validation
(XMLSchemaValidator)
* org.apache.xerces.impl.xs.models - content model classes (e.g. DFA. all,
empty)
sure i'll go through the above implementations n interfaces and get to you
incase i want to clarify some finer points....thnx for the details....Btw
are there any architecture docs/articles on Xerces Xmlschema processing ?
(i found several docs related to Xerces2 parsers,XNI and validators but
not a lot on XmlSchema ) .thnx again..
<kn>Check the documentation for Xerces-J (
http://xerces.apache.org/xerces2-j/xml-schema.html). You can also take a
look at the samples that are included as part of Xerces-J source
code.</kn>
Regards,
Udayanga
--
http://www.udayangawiki.blogspot.com