Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Jakob Voss Mon, 11 May 2009 03:33:02 -0700

Hi,

I summarized my thoughts about identifiers for data formats in a blogposting: http://jakoblog.de/2009/05/10/who-identifies-the-identifiers/

In short it’s not a technology issue but a commitment issue and theproblem of identifying the right identifiers for data formats can bereduced to two fundamental rules of thumb:


1. reuse: don’t create new identifiers for things that already have one.

2. document: if you have to create an identifier describe its referentas open, clear, and detailled as possible to make it reusable.

A format should be described with a schema (XML Schema, OWL etc.) or atleast a standard. Mostly this schema already has a namespace or similaridentifier that can be used for the whole format.

For instance MODS Version 3 (currently 3.0, 3.1, 3.2, 3.4) has the XMLNamespace http://www.loc.gov/mods/v3 so this is the best identifier toidentify MODS. If you need to identify a specific version then youshould *first* look if such identifiers already exist, *second* push thepublisher (LOC) to assign official URIs for MODS versions, if this donot already exist, or *third* create and document specific URIs and makethat everyone knows about this identifiers. At the moment there are:


MODS Version 3     http://www.loc.gov/mods/v3
MODS Version 3.0   info:srw/schema/1/mods-v3.0
MODS Version 3.1   info:srw/schema/1/mods-v3.1
MODS Version 3.2   info:srw/schema/1/mods-v3.2
                   info:ofi/fmt:xml:xsd:mods
MODS Version 3.3   info:srw/schema/1/mods-v3.3

The SRU Schemas registry links the "info:srw/schema/1/mods-v3*"identifiers to its XML Schemas which is very little documentation but itlinks to http://www.loc.gov/mods/v3 at least in some way.


Ross wrote:

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?


Use the one that is documented best.

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:

1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say "hey, here's my FOAF available via UnAPI"

The identifier for FOAF is http://xmlns.com/foaf/0.1/. Forget aboutidentifiers that are not URIs. OAI-PMH at least includes a mechanism tomap metadataPrefixes to official URIs but this mechanism is not alwaysused. If unAPI lacks a way to map a local name to a global URI, weshould better fix unAPI to tell us:


<?xml version="1.0" encoding="UTF-8"?>
<formats xmlns="http://unapi.info/";>
  <format name="foaf" uri="http://xmlns.com/foaf/0.1/"/>
</formats>

unAPI should be revised and specified bore strictly to become an RFCanyway. Yes, this requires a laborious and lengthy submission and reviewprocess but there is no such thing as a free lunch.

3) be so lax that it throws all hope of authority out the window


Reuse existing authorities and document better to create authority.

I would expect the various communities to still maintain their own
registries of "approved" data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).

There should be a distinction between descriptive registries that onlylist identifiers and formats that are defined elsewhere andauthoritative registries that define new identifiers and formats. Thenumber of authoritatively defined identifiers should be small for agiven API because the identifier should better be defined by the creatorof the format instead by a user of the format. If the creator does notsupport usable identifiers then better talk to him instead of creatingsomething in parallel.


Greetings,
Jakob

--
Jakob Voß <jakob.v...@gbv.de>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to