Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Ross Singer Thu, 30 Apr 2009 18:55:33 -0700

Technically it's 4 communities, but, yes, only two currently have
"credible" registries in place.


-Ross.

On Thu, Apr 30, 2009 at 9:28 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote:
> Crosswalk is exactly the wrong answer for this. Two very small overlapping 
> communities of most library developers can surely agree on using the same 
> identifiers, and then we make things easier for US.  We don't need to solve 
> the entire universe of problems. Solve the simple problem in front of you in 
> the simplest way that could possibly work and still leave room for future 
> expansion and improvement. From that, we learn how to solve the big problems, 
> when we're ready. Overreach and try to solve the huge problem including every 
> possible use case, many of which don't apply to you but SOMEDAY MIGHT... and 
> you end up with the kind of over-abstracted over-engineered 
> too-complicated-to-actually-catch-on solutions that... we in the library 
> community normally end up with.
> ________________________________________
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr 
> [pno...@museglobal.com]
> Sent: Thursday, April 30, 2009 6:37 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule 
> Them All
>
> Some further observations. So far this threadling has mentioned only trying 
> to unify two different sets of identifiers. However there are a much larger 
> number of them out there (and even larger numbers of schemas and other 
> "standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about")
>  and the problem exists for any of these things (identifiers, etc.) where 
> there are more than one of them. So really unifying two sets of identifiers, 
> while very useful, is not actually going to solve much.
>
> Is there any broader methodology we could approach which potentially allows 
> multiple unifications or (my favourite) cross-walks. (Complete unification 
> requires everybody agrees and sticks to it, and human history is sort of not 
> on that track...) And who (people and organizations) would undertake this?
>
> Ross' point about a lightweight approach is necessary for any sort of 
> adoption, but this is a problem (which plagues all we do in federated search) 
> which cannot just be solved by another registry. Somebody/organisation has to 
> look at the identifiers or whatever and decide that two of them are identical 
> or, worse, only partially overlap and hence scope has to be defined. In a 
> syntax that all understand of course. Already in this thread we have the 
> sub/super case question from Karen (in a post on the openurl (or Z39.88 
> <sigh> - identifiers!) listserv). And the various identifiers for MARC 
> (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now 
> explain in words of one (computer understandable) syllable what the 
> differences are.
>
> I'm not trying to make problems. There are problems and this is only a small 
> subset of them, and they confound us every day. I would love to adopt 
> standard definitions for these things, but which Standard? Because anyone can 
> produce any identifier they like, we have decided that the unification of 
> them has to be kept internal where we at least have control of the 
> unifications, even if they change pretty frequently.
>
> Peter
>
>
> Dr Peter Noerr
> CTO, MuseGlobal, Inc.
>
> +1 415 896 6873 (office)
> +1 415 793 6547 (mobile)
> www.museglobal.com
>
>
>> -----Original Message-----
>> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
>> Ross Singer
>> Sent: Thursday, April 30, 2009 12:00
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
>> All
>>
>> Hello everybody.  I apologize for the crossposting, but this is an
>> area that could (potentially) affect every one of these groups.  I
>> realize that not everybody will be able to respond to all lists,
>> but...
>>
>> First of all, some back story (Code4Lib subscribers can probably skip
>> ahead):
>>
>> Jangle [1] requires URIs to explicitly declare the format of the data
>> it is transporting (binary marc, marcxml, vcard, DLF
>> simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
>> own URI structure for this (http://jangle.org/vocab/formats#...) but
>> this was always been with the intention of moving out of the
>> jangle.org into a more "generic" space so it could be used by other
>> initiatives.
>>
>> This same concept came up in UnAPI [2] (I think this thread:
>> http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
>> March/thread.html#682
>> discusses it a bit - there is a reference there that it maybe had come
>> up before) although was rejected ultimately in favor of an (optional)
>> approach more in line with how OAI-PMH disambiguates metadata formats.
>>  That being said, this page used to try to set sort of convention
>> around the UnAPI formats:
>> http://unapi.stikipad.com/unapi/show/existing+formats
>> But it's now just a squatter page.
>>
>> Jakob Voss pointed out that SRU has a schema registry and that it
>> would make sense to coordinate with this rather than mint new URIs for
>> things that have already been defined there:
>> http://www.loc.gov/standards/sru/resources/schemas.html
>>
>> This, of course, made a lot of sense.  It also made me realize that
>> OpenURL *also* has a registry of metadata formats:
>> http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataP
>> refix=oai_dc&set=Core:Metadata+Formats
>>
>> The problem here is that OpenURL and SRW are using different info URIs
>> to describe the same things:
>>
>> info:srw/schema/1/marcxml-v1.1
>>
>> info:ofi/fmt:xml:xsd:MARC21
>>
>> or
>>
>> info:srw/schema/1/onix-v2.0
>>
>> info:ofi/fmt:xml:xsd:onix
>>
>> The latter technically isn't the same thing since the OpenURL one
>> claims it's an identifier for ONIX 2.1, but if I wasn't sending this
>> email now, eventually SRU would have registered
>> info:srw/schema/1/onix-v2.1
>>
>> There are several other examples, as well (MODS, ISO20775, etc.) and
>> it's not a stretch to envision more in the future.
>>
>> So there are a couple of questions here.
>>
>> First, and most importantly, how do we reconcile these different
>> identifiers for the same thing?  Can we come up with some agreement on
>> which ones we should really use?
>>
>> Secondly, and this gets to the reason why any of this was brought up
>> in the first place, how can we coordinate these identifiers more
>> effectively and efficiently to reuse among various specs and
>> protocols, but not:
>> 1) be tied to a particular community
>> 2) require some laborious and lengthy submission and review process to
>> just say "hey, here's my FOAF available via UnAPI"
>> 3) be so lax that it throws all hope of authority out the window
>> ?
>>
>> I would expect the various communities to still maintain their own
>> registries of "approved" data formats (well, OpenURL and SRU, anyway
>> -- it's not as appropriate to UnAPI or Jangle).
>>
>> Does something like this interest any of you?  Is there value in such
>> an initiative?
>>
>> Thanks,
>> -Ross.
>

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

Reply via email to