Re: Nature launches web debate "Future e-access to the primary literature"

Leslie Carr Thu, 6 Sep 2001 10:27:21 +0100

On Wed, 5 Sep 2001, Declan Butler wrote:

> As metadata are expensive to create - it is estimated that tagging
> papers with even minimal metadata can add as much as 40% to costs


For what purpose is the metadata? Minimal retrieval metadata (title, author,
date) is different from minimal bibliographic metadata (journal, volume,
issue, page range) which is certainly different from minimal 'ontological'
metadata (effective, community-agreed vocabularies of subject descriptors).

The 40% is estimated by whom? And 40% of which costs, precisely?

Self-archivers are used to adding their own metadata at minimal
inconvenience. Automatic extraction and analysis tools allow more
bibliographic and reference metadata to be extracted, as we can all see
from ResearchIndex http://citeseer.nj.nec.com/cs
as well as our own OpCit project http://opcit.eprints.org/ .
There are issues concerning quality and maintenance, but these apply to the
literature as well as the metadata, and have well-rehearsed solutions.

> OAI is developing its core metadata as a lowest common denominator to
> avoid putting an excessive burden on those who wish to take part.

My memory of the OAI minimalist decision
http://www.openarchives.org/meetings/SantaFe1999/sfc_entry.htm
was that a "lowest common denominator" was necessary to
allow realistic interoperability: ie it was all we could reasonably expect
people to agree on at that stage of the game!

"Don't make things more complicated than they need to be to get
something simple working NOW." This is in the sprit of the Los Alamos
Lemma:

http://oaisrv.nsdl.cornell.edu/pipermail/ups/1999-November/000048.html

Of course this can be seen in an economic context (little funding and
little time) but not the economic context invoked in the Nature essay.

> Not all papers will warrant the costs of marking up with metadata, nor
> will much of the grey literature, such as conference proceedings or the
> large internal documentation of government agencies.

Of course there is a metadata trade off between what you are willing to
put in and what you expect to get out. However, it is precisely the
grey literature (so-called) which needs effective retrieval mechanisms,
for much of this forms the cutting edge of research communication.

Our own studies of arXiv.org indicate that the majority of unpublished
preprints go directly on to become journal articles, and the majority
of the remainder are presentations and reports that are reworked as
subsequent journal articles.

http://opcit.eprints.org/opcitresearch.shtml

We hope that our ongoing analyses of what is really happening in Open
Archives will help inform us (and funding agencies) about what is
truly valuable and therefore what materials are worth the effort (and
cost) of "marking up with metadata". As to the documentation of
government agencies, I leave that for another day (but I believe a
similar argument will apply).

--------------------------------------------------------------------
Les Carr                          l...@ecs.soton.ac.uk
Department of Electronics and     phone: +44 23-80 594-479
             Computer Science     fax:   +44 23-80 592-865
University of Southampton         http://www.ecs.soton.ac.uk/~lac/

Re: Nature launches web debate "Future e-access to the primary literature"

Reply via email to