On May 21, 2008, at 9:56 AM, Jason van Zyl wrote:
Bryon,
How would you compare this with a method like this using XML Schema.
I'm not a huge fan of XSD, but there is a lot of tooling for XSD
especially in IDEs and that would be something we have to consider.
Do you actually use this method you describe in a production system,
or just a thought experiment?
We use this method to define XML schemas for all of our web services
at our very web-service oriented company...
We chose to move to RelaxNG to solve a particular problem with our XML
service interfaces - we wanted to have a guarantee that clients and
servers on all "minor versions" of a service within a "major version"
were completely interoperable. So, if your client is using service
Foo, and you have version 1.x of the schema, you can access a server
running any version 1.y of the service, and you will be able to
validate the correctness of any elements up to and including 1.x,
while quietly ignoring things outside of that scope.
We also wanted to allow for arbitrary XML that was outside the scope
of the "core" elements to be "mixed-in" freely within the document.
It turns out that solving these two problems really amounts to the
same thing. First, we declare a brand new RelaxNG schema when we move
"major versions", and within those docs, a new namespace for each
"minor version".
For example, if you had the 2.2 version of one of our service schemas,
it would look like:
default namespace = "http://freedomandbeer.com/service/v2/rev0"
namespace v2_0 = "http://freedomandbeer.com/service/v2/rev0"
namespace v2_1 = "http://freedomandbeer.com/service/v2/rev1"
namespace v2_2 = "http://freedomandbeer.com/service/v2/rev2"
Then, we create an ANY_OTHER_ELEMENT declaration, similar to the one
in my blog post, that excludes anything in any of our declared
namespaces:
ANY_OTHER_ELEMENT =
element * - (v2_0:* | v2_1:* | v2_2:*) {
attribute * { text }*,
( text* & ANY_OTHER_ELEMENT* )*
}
And finally, we declare ALL of our elements to "mix-in" any number of
ANY_OTHER_ELEMENTs into their body:
element person {
attribute id { text },
(
element firstName { text } &
element lastName { text } &
element v2_2:ssn { text } &
ANY_OTHER_ELEMENT*
)*
}
Note that because of the "default" namespace declaration, the "person"
element, and its "firstName" and "lastName" children are in the v2_0
namespace, whereas the "ssn" element is in the v2_2 namespace.
it's easy to see how this allows arbitrary "mix-ins" from unrelated
namespaces. How this handles the version compatibility problem is a
little more subtle, but very related to the mixins. Let's look at the
case of someone who has the 2.1 version of our schema - their document
doesn't declare the v2_2 namespace, and their ANY_OTHER_ELEMENT looks
like:
ANY_OTHER_ELEMENT =
element * - (v2_0:* | v2_1:*) {
attribute * { text }*,
( text* & ANY_OTHER_ELEMENT* )*
}
their "person" declaration looks like:
element person {
attribute id { text },
(
element firstName { text } &
element lastName { text } &
ANY_OTHER_ELEMENT*
)*
}
If we had a person, defined as:
<person xmlns="http://freedomandbeer.com/service/v2/rev0">
<firstName>Monty</firstName>
<lastName>Burns</lastName>
<ssn xmlns="http://freedomandbeer.com/service/v2/
rev2">000-00-0002</ssn>
</person>
It would validate fine against both schemas -- in the 2.2 version, the
"ssn" element is validated to be correct. In the 2.1 version, the 2.2
version of the namespace is just another ANY_OTHER_ELEMENT, so it is
ignored. Note that this is only BACKWARDS compatible -- if you are
validating against the 2.2 version of the schema, the "ssn" element is
required, so it must be there. Achieving bi-directional compatibility
is easy enough by just following the rule that any elements that are
ADDED to the schema are "optional". If you truly need to add
something that absolutely must be "required", then (in our world), you
must make a major version of your API...
I think this technique could be useful in defining schemas for things
like Maven's POM -- it allows you to version your schema regularly (as
often as every new maven release). It allows projects built with
maven to utilize the very newest elements without worrying that they
will break compatibility with people on a previous version of maven
(of course, this is also dictated by the semantics of the system - if
my build requires a feature from the 2.1.2 version of maven to build,
then someone with 2.1.1 is just dead in the water... but if the
things added to the schema are "optional" in some sense, then you have
a situation where systems that are aware of it can validate it, and
systems that aren't can safely skip it)
The real application of this technique to something like maven,
though, is in doing something like the example in my blog post --
adding extensions as "mix-ins" that can be defined in their own
schema, and put where the core application can find them.
As to your other question -- the biggest con to using RelaxNG is the
lack of tool support. The JING library (http://www.thaiopensource.com/relaxng/jing.html
), written by longtime XML guru James Clark, is a good library for
RelaxNG validation in java. The oXygen editor (http://www.oxygenxml.com/
) has great support for RelaxNG - but it's not free, and it's not all
that cheap either :(.