On Thu, 27 Mar 2003, Hussein Suleman wrote: > ...why not use sets for the separate > disciplines, aimed at particular service providers?... > some disciplines are not well-defined (namely, computer science) > so such archives may want to play ball with multiple service providers > and hence may need different sets.
The question of taxonomic classification sets and version-control for Open Archives is a technical one, so I will not presume to comment on it except from the point of view of the potential *users* of one particular kind of Archive Content, namely, unrefereed preprints and refereed postprints of research papers from one or many or all disciplines: This -- in the google-age of boolean inverted full-text searchability -- does not require a detailed a-priori taxonomy, as book metadata or the metadata for other kinds of material might. A fairly general sorting by discipline should suffice. http://www.eprints.org/self-faq/#26.Classification http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2385.html > ...the service provider can provide an > interface for potential data providers to self-register. I hope that once the number and contents of Open-Access Eprint Archives for research preprints and postprints have scaled up toward something closer to universality, the simple metadata descriptors "pre-refereeing preprint" and "refereed journal article" plus perhaps "discipline name" will be enough to guide relevant service-providers in automatically harvesting their relevant metadata. Multiple self-registration seems a tedious and unnecessary constraint. (Possibly a master-registry of valid institutions and disciplinary archives will also help, but may not be necessary unless commercial spamming invades this sector too.) > what remains a difficult problem, however, is how to recreate the > metadata used by the service provider as its native format. so, for a > typical example, if arXiv classifies items using a specific set > structure, this is certainly not going to be the default for an > institutional archive. does the service provider automatically or > manually reclassify? or does it not allow browsing by categories? Worrying about "recreating the categories" in this boolean full-text age is, I believe, a waste of time (for research preprints/postprints). Just harness google's harvested full-text to your engine's search capability, if it is incapable of contending with boolean full-text search on its own. (Manual reclassification! Heaven forfend! Don't bother classifying this material in the first place, beyond the simplest of first-cuts, such as discipline. Any further classification should be algorithmic and text-data-driven, not manual.) > in either event, the quality of the metadata from the perspective of the > service provider may be an impetus for potential users to want to > replicate their effort rather than rely on the automated submission from > their own institutions ... this needs more thought ... Again, I speak only for research preprints/postprints, but please let's not inject any further credibility into the notion that self-archiving author/institutions will also have to self-advertise by multiple self-archiving of the same paper. Surely that is one headache that OAI-interoperability should eradicate from the planet! Self-archiving itself is self-advertising (and effort) enough. Please let us not now -- when the momentum is still not big enough -- saddle would-be self-archivers with needless extra worries, and tasks! http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm Stevan Harnad