Hi Rob,
You wrote:
A format should be described with a schema (XML Schema, OWL etc.) or at
least a standard. Mostly this schema already has a namespace or similar
identifier that can be used for the whole format.
This is unfortunately not the case.
It is mostly the case - but people like to misinterpret schemas and
tailor them to their needs.
For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML
Namespace http://www.loc.gov/mods/v3 so this is the best identifier to
identify MODS.
And this is a perfect example of why this is not the case.
The same mods schema (let alone namespace) defines TWO formats, mods and
modsCollection.
That's your interpretation. According to the schema, the MODS format
*is* either a single mods-element or a modsCollection-element. That's
exactely what you can refer to with the namespace identifier
http://www.loc.gov/mods/v3.
If you need to identify the specific element 'mods' of the format only,
then you need another identifer. Up to now there is no default way to
create an identifier for a specific element in an XML format, see
http://www.w3.org/TR/webarch/#xml-fragids
But if the MODS specification defines that you can refer to any element
with an URI fragment identifier, then the right identifier would be
http://www.loc.gov/mods/v3#mods
You wrote:
> I totally agree that it's an awful design choice. However it's a
> demonstration that XML namespaces _do not identify format_. And
> hence, we need another identifier which is not the namespace of
> the top level element.
The namespace http://www.loc.gov/mods/v3 of the top level element 'mods'
does not identify the top level element but the MODS *format* (in any of
the versions 3.0-3.4) itself. This format *includes* the top level
element 'mods'.
Also consider the following more hypothetical, but perfectly feasible
situations:
* One namespace is used to define two _totally_ separate sets of
elements. There's no reason why this can't be done.
Ok, let A and B be two formats with two totally sets of elements (and
rules how to use them). If you put them into one namespace, then you get
a new format C that is the union of A and B.
* One namespace defines so many elements that it's meaningless to call
it a format at all. Even though the top level tag might be the same,
the contents are so varied that you're unable to realistically process
it.
Sad but true: The word "format" in the context of library applications
does not make sense anyway in most cases. Technically a format is just a
set of possible instances, defined as a formal language or with any
other type of specification. The problem of library formats is that many
people refer to them without providing a proper specification.
Coming back to the mods example: If the SRU Schema registry lists
"info:srw/schema/1/mods-v3.3" as the identifier for "MODS Schema Version
3.3" with a pointer to the XML Schema
"http://www.loc.gov/standards/mods/v3/mods-3-3.xsd" then *any* XML
document that validates against this schema must be considered to be a
MODS 3.3 document - either with 'mods' or with 'modsCollection' as root
element.
Greetings
Jakob
--
Jakob Voß <jakob.v...@gbv.de>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de