Hi,
I summarized my thoughts about identifiers for data formats in a blog
posting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/
In short it’s not a technology issue but a commitment issue and the
problem of identifying the right identifiers for data formats can be
reduced to two fundamental rules of thumb:
1. reuse: don’t create new identifiers for things that already have one.
2. document: if you have to create an identifier describe its referent
as open, clear, and detailled as possible to make it reusable.
A format should be described with a schema (XML Schema, OWL etc.) or at
least a standard. Mostly this schema already has a namespace or similar
identifier that can be used for the whole format.
For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XML
Namespace http://www.loc.gov/mods/v3 so this is the best identifier to
identify MODS. If you need to identify a specific version then you
should *first* look if such identifiers already exist, *second* push the
publisher (LOC) to assign official URIs for MODS versions, if this do
not already exist, or *third* create and document specific URIs and make
that everyone knows about this identifiers. At the moment there are:
MODS Version 3 http://www.loc.gov/mods/v3
MODS Version 3.0 info:srw/schema/1/mods-v3.0
MODS Version 3.1 info:srw/schema/1/mods-v3.1
MODS Version 3.2 info:srw/schema/1/mods-v3.2
info:ofi/fmt:xml:xsd:mods
MODS Version 3.3 info:srw/schema/1/mods-v3.3
The SRU Schemas registry links the "info:srw/schema/1/mods-v3*"
identifiers to its XML Schemas which is very little documentation but it
links to http://www.loc.gov/mods/v3 at least in some way.
Ross wrote:
First, and most importantly, how do we reconcile these different
identifiers for the same thing? Can we come up with some agreement on
which ones we should really use?
Use the one that is documented best.
Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
>
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"
The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget about
identifiers that are not URIs. OAI-PMH at least includes a mechanism to
map metadataPrefixes to official URIs but this mechanism is not always
used. If unAPI lacks a way to map a local name to a global URI, we
should better fix unAPI to tell us:
<?xml version="1.0" encoding="UTF-8"?>
<formats xmlns="http://unapi.info/">
<format name="foaf" uri="http://xmlns.com/foaf/0.1/"/>
</formats>
unAPI should be revised and specified bore strictly to become an RFC
anyway. Yes, this requires a laborious and lengthy submission and review
process but there is no such thing as a free lunch.
3) be so lax that it throws all hope of authority out the window
Reuse existing authorities and document better to create authority.
I would expect the various communities to still maintain their own
registries of "approved" data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).
There should be a distinction between descriptive registries that only
list identifiers and formats that are defined elsewhere and
authoritative registries that define new identifiers and formats. The
number of authoritatively defined identifiers should be small for a
given API because the identifier should better be defined by the creator
of the format instead by a user of the format. If the creator does not
support usable identifiers then better talk to him instead of creating
something in parallel.
Greetings,
Jakob
--
Jakob Voß <jakob.v...@gbv.de>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de