In particular, could you compare in detail this relaxng method and
the similar xml schema approach using substitution groups and
abstract schema types? (note that the substitution group part of
this is unnecessary but results in easier to read xml)? After
studying your blog post I don't see any major differences between
the relaxng approach and the xml schema approach beyond the
differences between relaxng and xmlschema. On the other hand this
is the first time I've seen relaxng.
I haven't written any XSD in a long time, so bear with me a little
bit... and please correct me if there's a better, more concise way to
state things in XSD than I have here, or if there IS a way to do the
things that I claim XSD can't do!
First off, I basically reimplemented the final example from my blog
post using XSD instead of RelaxNG. First, I have a schema for project
files:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://freedomandbeer.com/project/1.0"
elementFormDefault="qualified">
<xs:element name="project">
<xs:complexType>
<xs:all>
<xs:element name="name" type="xs:string"/>
<xs:element name="developers">
<xs:complexType>
<xs:sequence>
<xs:element name="developer" type="xs:string"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="extensions">
<xs:complexType>
<xs:sequence>
<xs:element name="extension">
<xs:complexType>
<xs:sequence>
<xs:any minOccurs="0"
maxOccurs="unbounded" namespace="##other"/>
</xs:sequence>
<xs:attribute name="id"
type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:all>
<xs:attribute name="id" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:schema>
and, I have another schema for my "svn" extension:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://freedomandbeer.com/project/ext/svn/1.0"
elementFormDefault="qualified">
<xs:element name="svnRepository">
<xs:complexType>
<xs:sequence>
<xs:element name="url" type="xs:string" minOccurs="1"
maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
which does correctly validate this XML:
<project
xmlns="http://freedomandbeer.com/project/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://freedomandbeer.com/project/1.0
project.xsd
http://freedomandbeer.com/project/ext/svn/1.0 svnext.xsd"
id="myproject">
<name>My Project</name>
<developers>
<developer>Bryon</developer>
<developer>Jasmine</developer>
</developers>
<extensions>
<extension id="source_control">
<svnRepository xmlns="http://freedomandbeer.com/project/ext/svn/1.0
">
<url>http://svn.freedomandbeer.com/myproject/trunk</
url>
</svnRepository>
</extension>
</extensions>
</project>
so, apart from the fact that I think the RelaxNG is MUCH nicer to
read, the basic flavor is implementable fairly easily in XSD.
One place where we've found a need to do something that is just not
possible in XSD is where our "core" product uses multiple namespaces,
and we want to allow "mixing-in" of things that are OUTSIDE of those
namespaces -- in XSD, the rule is "One Schema, One Namespace". We
define a new namespace every time we version our schema, and add the
new schema elements into the new namespace.
If you look at the line in the project XSD where we use an <xs:any>
element to allow for elements outside the schema's namespace to be
included, you'll see that we declared the namespace to be ##other - in
XSD, that means "any element in a namespace other than the one defined
by this schema". Let's call this schema version 1. The problem is
that if we were to add another schema that defined a new "version" of
this schema, and it's own corresponding "version 2" namespace, now the
##other from version 1 can include version 2 elements, and any ##other
references in version 2 can include version 1 elements, which is not
what we intend. There is not (to the best of my knowledge) any way to
"group" namespaces in a meaningful way to say that an element must or
must not come from one of those...
Another thing that is very easy to do in RelaxNG, and hard to do in
XSD, is deal with order and multiplicity in XML docs -- consider a
piece of XML like this:
<car>
<make>Ford</make>
<model>Mustang</model>
<year>2007</year>
<option>Chrome Wheels</option>
<option>CD Changer</option>
<option>Sunroof</option>
</car>
but, maybe you'd like to be able to accept this document as well...
<car>
<make>Ford</make>
<model>Mustang</model>
<option>Chrome Wheels</option>
<option>CD Changer</option>
<option>Sunroof</option>
<year>2007</year>
</car>
in RelaxNG, you can say:
start = CAR
MAKE = element make {text}
MODEL = element model {text}
YEAR = element year {text}
OPTION = element option {text}
CAR = element car { MAKE & MODEL & YEAR & OPTION* }
Which means "a car has exactly one each of make, model, and year
elements, and zero or more option elements - all of which can appear
in any order". Generally, when your XML document represents an object
graph, you don't really care what order child elements appear - just
whether they are there or not. There's no reason why the second
version of the document should be less valid than the first, and
adding this openness makes it easier to interoperate with systems that
speak in the language of your schema.
Trying to get this exact behavior in XSD is hard - there are several
options in XSD that almost do what we want, but not quite...
<xs:all> - this allows for all of the sub-elements to appear in any
order, but each element may only occur 0 or 1 times - not 0 or more
like our option elements.
<xs:sequence> - this is the most commonly used - it allows for each
element to occur any number of times, but the sequence is set, so we
lose the flexibility to move things around.
The best I can do to truly re-create the RelaxNG schema above in XSD is:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://freedomandbeer.com/cars">
<xs:element name="car">
<xs:complexType>
<xs:sequence>
<xs:element name="option" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:choice>
<xs:sequence>
<xs:element name="make" type="xs:string"/>
<xs:element name="option" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:choice>
<xs:sequence>
<xs:element name="model"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="year"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:sequence>
<xs:element name="year"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="model"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
<xs:sequence>
<xs:element name="model" type="xs:string"/>
<xs:element name="option" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:choice>
<xs:sequence>
<xs:element name="make"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="year"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:sequence>
<xs:element name="year"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="make"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
<xs:sequence>
<xs:element name="year" type="xs:string"/>
<xs:element name="option" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:choice>
<xs:sequence>
<xs:element name="make"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="model"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:sequence>
<xs:element name="model"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="make"
type="xs:string"/>
<xs:element name="option"
type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Notice that what we are basically doing is building a DFA that walks
valid documents - because it is a requirement of XSD that what "path"
you go down be deterministic. This means that the size of the schema
will actually be exponential in the number of elements in your
"arbitrary choice group".
Now, to be fair to XSD, if you simply put a "wrapper" element around
all of your options, which is a reasonable thing to do in most cases,
the schema gets MUCH simpler:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://freedomandbeer.com/cars">
<xs:element name="car">
<xs:complexType>
<xs:all>
<xs:element name="make" type="xs:string"/>
<xs:element name="model" type="xs:string"/>
<xs:element name="year" type="xs:string"/>
<xs:element name="options" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="option" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:all>
</xs:complexType>
</xs:element>
</xs:schema>
But, I don't really like my choice of schema language telling me that
I can't do something that is perfectly valid XML... Still, it's a
reasonable tradeoff to make if you like the tool support that XSD
offers (which is undeniably better than what exists for RelaxNG...)
Actually, the reason that came to RelaxNG in the first place was
because I had to retroactively design a schema for a legacy system
that had none - and it did something isomorphic to the "car" example
here -- except that it had much more than the make/model/year/option
mix to deal with -- the XSD would have had thousands of paths and
would have been totally impossible to maintain.
Anyways - sorry if this has devolved into a generic "why RelaxNG is
better than XSD" argument... I do think that's very true - but that's
not the intention here. I think that RelaxNG provides a much more
robust way than XSD of dealing with documents that need a high degree
of extensibility -- and it gives you the freedom to define your schema
in the most natural way for document authors and consumers to
understand, without your schema language getting in the way.
Additionally, RelaxNG schemas are VERY readable, and provide excellent
documentation that virtually anyone can quickly grok - I don't find
the same to be true with XSD...