[2 Postings: (1) L. Waaijers; (2) T. Brody]

(1) Leo Waaijers (SURF, Netherlands)

Stevan Harnad wrote:

By the way, the real OAI google is OAIster, and it
contains over 3 million pearls from nearly 300 institutions
http://oaister.umdl.umich.edu/o/oaister/ but many are not journal articles
(and even if they all were, that still wouldn't be nearly enough yet!):
http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif

And -- as of March 10 -- Yahoo searches OAIster! See
http://www.umich.edu/~urecord/0304/Mar08_04/07.shtml

Leo Waaijers

----

(2) Tim Brody (ECS, Southampton):

Henk Ellermann, Google Searches Repositories: So What Does Google
Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html

But it is not only the quantity. Even when documents are available it does not
mean that they are available to everyone. And if it's available to anyone, you
still can't be sure that the system is running...

What we badly need is a continuous and authoritative review of existing
Institutional Repositories. The criteria to "judge" the repositories would
have to include:

   * number of documents, (with breakdown per document type)
   * percentage of freely accessible documents
   * up-time

It is great that Google becomes part of the Institutional Repositories effort, 
but
we should learn to give fair and honest [data] about what we have to offer. 
There
is actually not that much at the moment. We can only hope that what Google will
expose is more than just the message "amateurs at work".

I would agree with Henk that the current -- early -- state of
'Institutional Repositories' (aka Eprint Archives) is not yet the promised
land of open access to research material.

Institutional research archives (and hence the services built on them)
will succeed or fail depending on whether there is the drive within the
institution to enhance its visibility and impact by mandating that its
author-employees deposit all their refereed-research output. Then,
once it achieves critical mass, the archive can support itself as part
of the culture of the institution.

The archive is the public record of "the best the institution
has done". So those archives that Henk refers to, with their patchy,
minimal contents, need to look at what is going into this public record
of their research output, and must decide whether it reflects the
institution's achievements.

As a technical aside, DP9 was developed for exposing OAI things to Web
crawlers some time ago: http://arc.cs.odu.edu:8080/dp9/about.jsp

I would be surprised if Google were to base any long-term service on
only an archive's contents. Without the linking structure of the Web a
search engine is left with only keyword-frequency techniques, which the
Web has shown fails to scale to very large data sets. For my money,
Google-over-Citebase/Citeseer-over-Institutional Archives is much more
interesting (the Archive gives editorial management, Citebase/Citeseer
the linking structure, and Google the search wizardry).

Stevan Harnad:

Eprints, for example, has over 120 archives worldwide of exactly the same kind,
with over 40,000 papers in them:
http://archives.eprints.org/eprints.php?action=analysis

I have revised the description on that page to say that a *record*
is not necessarily a full-text. And of course a full-text is not
necessarily a peer-reviewed postprint. It would help bean-counters like
myself if repository/archive administrators would tag in an obvious place
what their content types are (i.e. what type of material is in the
system), and how the number of metadata records corresponds to publicly
accessible full-texts.

Tim Brody
Southampton University
http://citebase.eprints.org/

Reply via email to