On Wed, 5 Sep 2001, Declan Butler wrote: > As metadata are expensive to create - it is estimated that tagging > papers with even minimal metadata can add as much as 40% to costs
For what purpose is the metadata? Minimal retrieval metadata (title, author, date) is different from minimal bibliographic metadata (journal, volume, issue, page range) which is certainly different from minimal 'ontological' metadata (effective, community-agreed vocabularies of subject descriptors). The 40% is estimated by whom? And 40% of which costs, precisely? Self-archivers are used to adding their own metadata at minimal inconvenience. Automatic extraction and analysis tools allow more bibliographic and reference metadata to be extracted, as we can all see from ResearchIndex http://citeseer.nj.nec.com/cs as well as our own OpCit project http://opcit.eprints.org/ . There are issues concerning quality and maintenance, but these apply to the literature as well as the metadata, and have well-rehearsed solutions. > OAI is developing its core metadata as a lowest common denominator to > avoid putting an excessive burden on those who wish to take part. My memory of the OAI minimalist decision http://www.openarchives.org/meetings/SantaFe1999/sfc_entry.htm was that a "lowest common denominator" was necessary to allow realistic interoperability: ie it was all we could reasonably expect people to agree on at that stage of the game! "Don't make things more complicated than they need to be to get something simple working NOW." This is in the sprit of the Los Alamos Lemma: http://oaisrv.nsdl.cornell.edu/pipermail/ups/1999-November/000048.html Of course this can be seen in an economic context (little funding and little time) but not the economic context invoked in the Nature essay. > Not all papers will warrant the costs of marking up with metadata, nor > will much of the grey literature, such as conference proceedings or the > large internal documentation of government agencies. Of course there is a metadata trade off between what you are willing to put in and what you expect to get out. However, it is precisely the grey literature (so-called) which needs effective retrieval mechanisms, for much of this forms the cutting edge of research communication. Our own studies of arXiv.org indicate that the majority of unpublished preprints go directly on to become journal articles, and the majority of the remainder are presentations and reports that are reworked as subsequent journal articles. http://opcit.eprints.org/opcitresearch.shtml We hope that our ongoing analyses of what is really happening in Open Archives will help inform us (and funding agencies) about what is truly valuable and therefore what materials are worth the effort (and cost) of "marking up with metadata". As to the documentation of government agencies, I leave that for another day (but I believe a similar argument will apply). -------------------------------------------------------------------- Les Carr l...@ecs.soton.ac.uk Department of Electronics and phone: +44 23-80 594-479 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~lac/