Re: EPrints, DSpace or ESpace?
[2 Postings: (1) L. Waaijers; (2) T. Brody] (1) Leo Waaijers (SURF, Netherlands) Stevan Harnad wrote: By the way, the real OAI google is OAIster, and it contains over 3 million pearls from nearly 300 institutions http://oaister.umdl.umich.edu/o/oaister/ but many are not journal articles (and even if they all were, that still wouldn't be nearly enough yet!): http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif And -- as of March 10 -- Yahoo searches OAIster! See http://www.umich.edu/~urecord/0304/Mar08_04/07.shtml Leo Waaijers (2) Tim Brody (ECS, Southampton): Henk Ellermann, Google Searches Repositories: So What Does Google Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html But it is not only the quantity. Even when documents are available it does not mean that they are available to everyone. And if it's available to anyone, you still can't be sure that the system is running... What we badly need is a continuous and authoritative review of existing Institutional Repositories. The criteria to "judge" the repositories would have to include: * number of documents, (with breakdown per document type) * percentage of freely accessible documents * up-time It is great that Google becomes part of the Institutional Repositories effort, but we should learn to give fair and honest [data] about what we have to offer. There is actually not that much at the moment. We can only hope that what Google will expose is more than just the message "amateurs at work". I would agree with Henk that the current -- early -- state of 'Institutional Repositories' (aka Eprint Archives) is not yet the promised land of open access to research material. Institutional research archives (and hence the services built on them) will succeed or fail depending on whether there is the drive within the institution to enhance its visibility and impact by mandating that its author-employees deposit all their refereed-research output. Then, once it achieves critical mass, the archive can support itself as part of the culture of the institution. The archive is the public record of "the best the institution has done". So those archives that Henk refers to, with their patchy, minimal contents, need to look at what is going into this public record of their research output, and must decide whether it reflects the institution's achievements. As a technical aside, DP9 was developed for exposing OAI things to Web crawlers some time ago: http://arc.cs.odu.edu:8080/dp9/about.jsp I would be surprised if Google were to base any long-term service on only an archive's contents. Without the linking structure of the Web a search engine is left with only keyword-frequency techniques, which the Web has shown fails to scale to very large data sets. For my money, Google-over-Citebase/Citeseer-over-Institutional Archives is much more interesting (the Archive gives editorial management, Citebase/Citeseer the linking structure, and Google the search wizardry). Stevan Harnad: Eprints, for example, has over 120 archives worldwide of exactly the same kind, with over 40,000 papers in them: http://archives.eprints.org/eprints.php?action=analysis I have revised the description on that page to say that a *record* is not necessarily a full-text. And of course a full-text is not necessarily a peer-reviewed postprint. It would help bean-counters like myself if repository/archive administrators would tag in an obvious place what their content types are (i.e. what type of material is in the system), and how the number of metadata records corresponds to publicly accessible full-texts. Tim Brody Southampton University http://citebase.eprints.org/
Re: EPrints, DSpace or ESpace?
Prior Topic Thread: "EPrints, DSpace or ESpace?" http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2670.html > 2 Open Access News Posting by Garrett Eastman: > > http://www.earlham.edu/~peters/fos/2004_04_11_fosblogarchive.html#a108179305530712543 > > Skeptical eye on Google repository searching > > Henk Ellermann, Google Searches Repositories: So What Does Google > Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html -=(In > Between)=-:, April 12, 2004. Ellermann puts the brakes on enthusiasm > for Google's proposed federated repository searching, reported in the > Chronicle of Higher Education on Friday, April 9 (see earlier OAN posting: > http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637 > .) His questions relate to the actual number of documents concerned; > press accounts have said the 17 repositories hold an average of 1000 > documents, but Ellermann's calculations show a number considerably > smaller. He maintains that the repository movement has a long way to go > to attract and index content and provide reliable access, that there be > something for Google users to search and find. > > Google partners with universities to mine invisible academic literature > http://www.earlham.edu/~peters/fos/2004_04_04_fosblogarchive.html#a108152131781448637 > > Jeffrey R. Young, Google Teams Up with 17 Colleges to Test Searches of > Scholarly > Materials, Chronicle of Higher Education Daily Update, April 9, 2004. > http://chronicle.com/free/2004/04/2004040901n.htm > MIT and 16 other institutions are collaborating with Google, who, pending > the success of the test project, will activate a feature that enables > searching of online repositories such as DSpace. MacKenzie Smith of MIT > is quoted. "A lot of times the richest scholarly literature is buried" > in search-engine results, said Ms. Smith. "As more and more content > is on the Web, it's harder and harder to find the high-quality stuff > that you need." The universities extensive use of metadata and OCLC's > involvement in developing a search configuration for the test promise > a highly useful search tool across multiple collections. --- > Google searches repositories: so what does Google search for? > http://eepi.ubib.eur.nl/iliit/archives/000479.html > > Henk Ellerman > > The Chronicle of Higher Education reports that Google has ' teamed up' with a > number of DSpace using universities to develop and add-on to Google's advanced > search option. The add-on will consist of a search through the contents of > Institutional repositories. > > Although it is not stated in the article, rumor has it that the search will > be on > the full text as well as on the metadata. Within a few months Google therefore > will offer their users an option to restrict searches to an "intellectual > zone". > That is the official message and it sounds good. > > The only problem is that the official message is based on a -how to put it > nicely?- distorted view on reality. It is stated for instance that the > participants in this pilot have repositories containing on the average a 1000 > documents. Is that so? let's count. > > The following list shows how many documents there are (currently) in the > repositories of the participating institutions. > > MIT 3565 (but not all are available to all) > Australian National University34050 (but 0 texts) > Cornell University41 > Cranfield University 49 > European University Institute - internal error- > Hong Kong University of Science and Technology986 > Indiana University-Purdue University at Indianapolis 27 > Minho University 311 > Ohio State University -cannot be reached- > Parma University 29 > University of Arizona 1 > University of Calgary 135 > University of Oregon 106 > University of Rochester 138 > University of Toronto 819 > University of Washington 1772 (of which at least 962 pictures and most > documents not accessible outside UW) > University of Wisconsin 21 > > Now 1000 documents on the average? Don't think so. > > But it is not only the quantity. Even when documents are available it does not > mean that they are available to everyone. And if it's available to anyone, you > still can't be sure that the system is running... > > What we badly need is a continuous and authoritative review of existing > Institutional Repositories. The criteria to "judge" the repositories would > have to > include: > > * number of documents, (with breakdown per document type) > * percentage of freely accessible documents > * up-time > > It is great that Google becomes part if the Institutional Repositories > effort, but > we should learn to give fair and honest about what we have to offer. It is is > actually not that much at the moment. We can only hope that what Google will > expose is more than just the message "amateurs at work". ---