Re: EPrints, DSpace or ESpace?

2004-04-13 Thread Tim Brody

   [2 Postings: (1) L. Waaijers; (2) T. Brody]

(1) Leo Waaijers (SURF, Netherlands)

Stevan Harnad wrote:


By the way, the real OAI google is OAIster, and it
contains over 3 million pearls from nearly 300 institutions
http://oaister.umdl.umich.edu/o/oaister/ but many are not journal articles
(and even if they all were, that still wouldn't be nearly enough yet!):
http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif


And -- as of March 10 -- Yahoo searches OAIster! See
http://www.umich.edu/~urecord/0304/Mar08_04/07.shtml

Leo Waaijers



(2) Tim Brody (ECS, Southampton):


Henk Ellermann, Google Searches Repositories: So What Does Google
Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html

But it is not only the quantity. Even when documents are available it does not
mean that they are available to everyone. And if it's available to anyone, you
still can't be sure that the system is running...

What we badly need is a continuous and authoritative review of existing
Institutional Repositories. The criteria to "judge" the repositories would
have to include:

   * number of documents, (with breakdown per document type)
   * percentage of freely accessible documents
   * up-time

It is great that Google becomes part of the Institutional Repositories effort, 
but
we should learn to give fair and honest [data] about what we have to offer. 
There
is actually not that much at the moment. We can only hope that what Google will
expose is more than just the message "amateurs at work".


I would agree with Henk that the current -- early -- state of
'Institutional Repositories' (aka Eprint Archives) is not yet the promised
land of open access to research material.

Institutional research archives (and hence the services built on them)
will succeed or fail depending on whether there is the drive within the
institution to enhance its visibility and impact by mandating that its
author-employees deposit all their refereed-research output. Then,
once it achieves critical mass, the archive can support itself as part
of the culture of the institution.

The archive is the public record of "the best the institution
has done". So those archives that Henk refers to, with their patchy,
minimal contents, need to look at what is going into this public record
of their research output, and must decide whether it reflects the
institution's achievements.

As a technical aside, DP9 was developed for exposing OAI things to Web
crawlers some time ago: http://arc.cs.odu.edu:8080/dp9/about.jsp

I would be surprised if Google were to base any long-term service on
only an archive's contents. Without the linking structure of the Web a
search engine is left with only keyword-frequency techniques, which the
Web has shown fails to scale to very large data sets. For my money,
Google-over-Citebase/Citeseer-over-Institutional Archives is much more
interesting (the Archive gives editorial management, Citebase/Citeseer
the linking structure, and Google the search wizardry).


Stevan Harnad:

Eprints, for example, has over 120 archives worldwide of exactly the same kind,
with over 40,000 papers in them:
http://archives.eprints.org/eprints.php?action=analysis


I have revised the description on that page to say that a *record*
is not necessarily a full-text. And of course a full-text is not
necessarily a peer-reviewed postprint. It would help bean-counters like
myself if repository/archive administrators would tag in an obvious place
what their content types are (i.e. what type of material is in the
system), and how the number of metadata records corresponds to publicly
accessible full-texts.

Tim Brody
Southampton University
http://citebase.eprints.org/


Re: EPrints, DSpace or ESpace?

2004-04-13 Thread Stevan Harnad
Prior Topic Thread:
"EPrints, DSpace or ESpace?"
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2670.html

> 2 Open Access News Posting by Garrett Eastman:
>
> http://www.earlham.edu/~peters/fos/2004_04_11_fosblogarchive.html#a108179305530712543
>
> Skeptical eye on Google repository searching
>
> Henk Ellermann, Google Searches Repositories: So What Does Google
> Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html -=(In
> Between)=-:, April 12, 2004. Ellermann puts the brakes on enthusiasm
> for Google's proposed federated repository searching, reported in the
> Chronicle of Higher Education on Friday, April 9 (see earlier OAN posting:
> http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637
> .) His questions relate to the actual number of documents concerned;
> press accounts have said the 17 repositories hold an average of 1000
> documents, but Ellermann's calculations show a number considerably
> smaller. He maintains that the repository movement has a long way to go
> to attract and index content and provide reliable access, that there be
> something for Google users to search and find.
>
> Google partners with universities to mine invisible academic literature
> http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637
>
> Jeffrey R. Young, Google Teams Up with 17 Colleges to Test Searches of 
> Scholarly
> Materials, Chronicle of Higher Education Daily Update, April 9, 2004.
> http://chronicle.com/free/2004/04/2004040901n.htm
> MIT and 16 other institutions are collaborating with Google, who, pending
> the success of the test project, will activate a feature that enables
> searching of online repositories such as DSpace. MacKenzie Smith of MIT
> is quoted. "A lot of times the richest scholarly literature is buried"
> in search-engine results, said Ms. Smith. "As more and more content
> is on the Web, it's harder and harder to find the high-quality stuff
> that you need." The universities extensive use of metadata and OCLC's
> involvement in developing a search configuration for the test promise
> a highly useful search tool across multiple collections.

---

> Google searches repositories: so what does Google search for?
> http://eepi.ubib.eur.nl/iliit/archives/000479.html
>
> Henk Ellerman
>
> The Chronicle of Higher Education reports that Google has ' teamed up' with a
> number of DSpace using universities to develop and add-on to Google's advanced
> search option. The add-on will consist of a search through the contents of
> Institutional repositories.
>
> Although it is not stated in the article, rumor has it that the search will 
> be on
> the full text as well as on the metadata. Within a few months Google therefore
> will offer their users an option to restrict searches to an "intellectual 
> zone".
> That is the official message and it sounds good.
>
> The only problem is that the official message is based on a -how to put it
> nicely?- distorted view on reality. It is stated for instance that the
> participants in this pilot have repositories containing on the average a 1000
> documents. Is that so? let's count.
>
> The following list shows how many documents there are (currently) in the
> repositories of the participating institutions.
>
> MIT   3565 (but not all are available to all)
> Australian National University34050 (but 0 texts)
> Cornell University41
> Cranfield University  49
> European University Institute - internal error-
> Hong Kong University of Science and Technology986
> Indiana University-Purdue University at Indianapolis  27
> Minho University  311
> Ohio State University -cannot be reached-
> Parma University  29
> University of Arizona 1
> University of Calgary 135
> University of Oregon  106
> University of Rochester   138
> University of Toronto 819
> University of Washington  1772 (of which at least 962 pictures and most
> documents not accessible outside UW)
> University of Wisconsin   21
>
> Now 1000 documents on the average? Don't think so.
>
> But it is not only the quantity. Even when documents are available it does not
> mean that they are available to everyone. And if it's available to anyone, you
> still can't be sure that the system is running...
>
> What we badly need is a continuous and authoritative review of existing
> Institutional Repositories. The criteria to "judge" the repositories would 
> have to
> include:
>
> * number of documents, (with breakdown per document type)
> * percentage of freely accessible documents
> * up-time
>
> It is great that Google becomes part if the Institutional Repositories 
> effort, but
> we should learn to give fair and honest about what we have to offer. It is is
> actually not that much at the moment. We can only hope that what Google will
> expose is more than just the message "amateurs at work".

---