Re: OA Archives: Full-texts vs. metadata-only and other digital objects
Stevan Thank you for your full and illuminating reply to my query about how much material in OA archives is available as full text. I am surprised at how low you estimate the figure to be and that it is not, yet, possible to produce a definitive number. I am wondering if the Open DOAR (Directory of Oopen Access Repositories - the 'sister project' to the Directory of Open Access Journals, DOAJ) will set strictly 'full text only' rules for inclusion in its directory? And how will it relate to the archives.eprints directory you are involved with? It gets confusing to me because there are so many lists of repositories around on the web. How does the celestial harvesting list you mention relate to the archives.eprints list (are they the same list?) or the large list kept by the University of Illinois at Urbana-Champaign (UIUC) at http://gita.grainger.uiuc.edu/registry/? I take the archives.eprints to be the closest to a definitive list of the OA Institutional Repositories which we are concerned with here - alhtough I notice that our 'DSpace@Cambridge' repository http://www.lib.cam.ac.uk/dspace/index.htm is not included. I see the distinction between OA Archives and the Open Access Initiative. Maybe this is not strictly relevant to this forum and a basic misunderstanding of the purposes of archiving, but I still cannot understand why people are archiving *just* the metadata and not the full text. It makes OA search engines like OAIster more like a any other standard bibliographic database with mostly subscription-only access. I am interested in the whole area of Open Access and keeping up with developments. This forum is excellent for that purpose. Thank you.
Re: OA Archives: Full-texts vs. metadata-only and other digital objects
On Mon, 13 Jun 2005, Tim Gray wrote: Thank you for your full and illuminating reply to my query about how much material in OA archives is available as full text. I am surprised at how low you estimate the figure to be and that it is not, yet, possible to produce a definitive number. Why the number of full texts in OA archives is so low is because the number of institutions with OA self-archiving mandates (as opposed to the number institutions with OA Archives) is so low. Cf.: http://archives.eprints.org/eprints.php?action=browse vs. http://www.eprints.org/signup/fulllist.php The remedy is quite obvious (and will come, but is taking rather than it might). Swan, Alma and Brown, Sheridan (2005) Open access self-archiving: An author study. Technical Report, Joint Information Systems Committee (JISC), UK FE and HE funding councils. http://cogprints.org/4385/ http://www.ecs.soton.ac.uk/~harnad/Temp/alma-amst.pdf I am wondering if the Open DOAR (Directory of Oopen Access Repositories - the 'sister project' to the Directory of Open Access Journals, DOAJ) will set strictly 'full text only' rules for inclusion in its directory? Archives with mixed contents, some of it other than OA full-texts, should not be excluded, but an algorithm must be devised to recognise and record the number of full-texts separately. Tim Brody and co-workers at Southampton are working on this now for the Southampton OA Archives Registry. See: Newly enhanced Registry of Open Access Repositories (ROAR) http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4585.html how will it relate to the archives.eprints directory you are involved with? That remains to be clarified, but my understanding is that there will be a collaboration and DOAR will be built on the Southampton OA Archives Registry. (Others will have to confirm whether that is indeed the case.) It gets confusing to me because there are so many lists of repositories around on the web. That was why the Southampton OA Archives Registry was created, two years ago. Moreover, because all the other registries rely only on voluntary self-registration, and archives have not been rigorous about self-registering, the Southampton OA Archives Registry has been hand-trawling the Web and other registries to find and register new OA Archives as they are created. Perhaps a recognizable, consistent self-identifier tag will evolve, so OA Archives can be automatically harvested and registered, but so far this has not yet happened. Indeed, some of the ostensibly OAI-compliant OA Archives may not even be OAI-compliant! This too will improve, as more institutions adopt institutional self-archiving policies. Germany's DINI certificate will help. Goettingen/DINI/SPARC-Europe Open Access Meeting http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4563.html How does the celestial harvesting list you mention relate to the archives.eprints list (are they the same list?) Celestial, written by Tim Brody, from the University of Southampton, is an OAI aggregator/cache application that imports OAI metadata from version 1.0,1.1,2.0 OAI-compliant repositories, and re-exposes that metadata through either an aggregated or per-repository OAI-compliant 2.0 interface. Tim is also the creator and maintainer of the Southampton OA Archives Registry archives.eprints.org where it is explained that: http://archives.eprints.org/eprints.php?action=about What does Not in Celestial mean? This means the archive has not been listed/harvested by Celestial (celestial.eprints.org). This may be because the archive doesn't have a functioning OAI-PMH interface. What does OAI Interface Unknown mean? Either the archive doesn't have a functioning Open Archives interface, or we couldn't track down where it is. Site admins should say on their 'about' or 'help' page where their OAI interface is and use a common URL for it (e.g. /perl/oai or /cgi-bin/oai). Submitting your site to the OAI registry/Hussein Suleman's Repository Explorer will also help to get your site noticed. or the large list kept by the University of Illinois at Urbana-Champaign (UIUC) at http://gita.grainger.uiuc.edu/registry/? That is one of the registries from which the the Southampton OA Archives Registry hand-harvests. The Registry regularly harvests also from OAIster http://oaister.umdl.umich.edu/o/oaister/ It can also import lists from OAI list-friends automatically: http://archives.eprints.org/eprints.php?action=import I take the archives.eprints to be the closest to a definitive list of the OA Institutional Repositories which we are concerned with here - alhtough I notice that our 'DSpace@Cambridge' repository http://www.lib.cam.ac.uk/dspace/index.htm is not included. DSpace@Cambridge is in the Registry: See http://archives.eprints.org/eprints.php?url=http%3A%2F%2Fwww.dspace.cam.ac.uk%2F But it is not in Celestial
Re: OA Archives: Full-texts vs. metadata-only and other digital objects
Tim Gray wrote: Stevan Thank you for your full and illuminating reply to my query about how much material in OA archives is available as full text. I am surprised at how low you estimate the figure to be and that it is not, yet, possible to produce a definitive number. Knowing the difference between a full-text (also whether it's scholarly/published/peer-reviewed) is something in the realm of Google Scholar, Citeseer etc. Without wishing to recreate one of those services I don't know of a method for producing a definitive number. I suspect simple approaches (e.g. does record have PDF link) will be undermined by (sorry for picking on you!) sites like: http://library.isibang.ac.in:8080/dspace/ No prizes for spotting why that wouldn't work :-) I am wondering if the Open DOAR (Directory of Oopen Access Repositories - the 'sister project' to the Directory of Open Access Journals, DOAJ) will set strictly 'full text only' rules for inclusion in its directory? And how will it relate to the archives.eprints directory you are involved with? It gets confusing to me because there are so many lists of repositories around on the web. How does the celestial harvesting list you mention relate to the archives.eprints list (are they the same list?) or the large list kept by the University of Illinois at Urbana-Champaign (UIUC) at http://gita.grainger.uiuc.edu/registry/? Celestial is an OAI cache - it retrieves every metadata record from those archives I've added to it. To make archives.eprints (IAR) I stapled together the GNU EPrints listing with Celestial's record counts (as an aside, anyone can use the records graphs from Celestial). I keep a firmer technical control of Celestial than I do the IAR. UIUC is the point of entry to get added to OAIster, but provides analyses of all *OAI* repositories registered with it. The IAR includes many archives with no or broken OAI interfaces, as well as aggregates (e.g. single entry with multiple OAI interfaces). We also collect additional metadata in the IAR that isn't exposed by OAI (type, software, etc.). (Not forgetting the registry at www.openarchives.org Hussein Suleman's OAI explorer) My hope and expectation is that OpenDOAR will include some metric of full-textness. There was also an effort for the recent Amsterdam SURF/JISC/CNI meeting to ascertain some figures (by survey) for the content of IRs - I believe that report will be published in the next month or so. I take the archives.eprints to be the closest to a definitive list of the OA Institutional Repositories which we are concerned with here - alhtough I notice that our 'DSpace@Cambridge' repository http://www.lib.cam.ac.uk/dspace/index.htm is not included. Here? http://archives.eprints.org/index.php?url=http%3A%2F%2Fwww.dspace.cam.ac.uk%2F I see the distinction between OA Archives and the Open Access Initiative. Maybe this is not strictly relevant to this forum and a basic misunderstanding of the purposes of archiving, but I still cannot understand why people are archiving *just* the metadata and not the full text. It makes OA search engines like OAIster more like a any other standard bibliographic database with mostly subscription-only access. I'm glad to see you're an archivangilist rather than a repologist ('sorry, the full-text isn't available here')! It's the IR vs Open archives paradigm. The IR serves an institutional need to *track* as well as to *expose* research output. Tracking research output does not require making that research available for-free on the Web. The purpose of Open archives is to make research more efficient by maximising access to research, hence maximising research impact. If a high quality body of freely accessible literature is available through IR's, then the services that build on them will be more useful. There are a lot of records appearing out there, but the full-texts available from ad hoc Web pages still dwarfs IRs. There is also no clear distinction between prestigious research and the capture all philosophy - administrators and authors need to realise that what they put into the IR may very well turn up on automated CVs, and they probably don't want to have their high-impact peer-reviewed articles hidden amongst 1000's of powerpoint slides! Sincerely, Tim Brody tdb...@ecs.soton.ac.uk Administrator, Institutional Archives Registry http://archives.eprints.org/