Re: OA Archives: Full-texts vs. metadata-only and other digital objects

2005-06-13 Thread Tim Gray
Stevan

Thank you for your full and illuminating reply to my query about how much
material in OA archives is available as full text. I am surprised at how
low you estimate the figure to be and that it is not, yet, possible to
produce a definitive number.

I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
the 'sister project' to the Directory of Open Access Journals, DOAJ) will
set strictly 'full text only' rules for inclusion in its directory? And how
will it relate to the archives.eprints directory you are involved with? It
gets confusing to me because there are so many lists of repositories around
on the web. How does the celestial harvesting list you mention relate to
the archives.eprints list (are they the same list?) or the large list kept
by the University of Illinois at Urbana-Champaign (UIUC) at
http://gita.grainger.uiuc.edu/registry/?

I take the archives.eprints to be the closest to a definitive list of the
OA Institutional Repositories which we are concerned with here - alhtough I
notice that our 'DSpace@Cambridge' repository
http://www.lib.cam.ac.uk/dspace/index.htm is not included.

I see the distinction between OA Archives and the Open Access Initiative.
Maybe this is not strictly relevant to this forum and a basic
misunderstanding of the purposes of archiving, but I still cannot
understand why people are archiving *just* the metadata and not the full
text. It makes OA search engines like OAIster more like a any other
standard bibliographic database with mostly subscription-only access.

I am interested in the whole area of Open Access and keeping up with
developments. This forum is excellent for that purpose.

Thank you.


Re: OA Archives: Full-texts vs. metadata-only and other digital objects

2005-06-13 Thread Stevan Harnad
On Mon, 13 Jun 2005, Tim Gray wrote:

 Thank you for your full and illuminating reply to my query about how much
 material in OA archives is available as full text. I am surprised at how
 low you estimate the figure to be and that it is not, yet, possible to
 produce a definitive number.

Why the number of full texts in OA archives is so low is because the
number of institutions with OA self-archiving mandates (as opposed to
the number institutions with OA Archives) is so low. Cf.:

http://archives.eprints.org/eprints.php?action=browse
vs.
http://www.eprints.org/signup/fulllist.php

The remedy is quite obvious (and will come, but is taking rather
than it might).

Swan, Alma and Brown, Sheridan (2005) Open access self-archiving:
An author study. Technical Report, Joint Information
Systems Committee (JISC), UK FE and HE funding councils.
http://cogprints.org/4385/
http://www.ecs.soton.ac.uk/~harnad/Temp/alma-amst.pdf

 I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
 the 'sister project' to the Directory of Open Access Journals, DOAJ) will
 set strictly 'full text only' rules for inclusion in its directory?

Archives with mixed contents, some of it other than OA full-texts, should
not be excluded, but an algorithm must be devised to recognise and record
the number of full-texts separately. Tim Brody and co-workers at Southampton
are working on this now for the Southampton OA Archives Registry.

See:
Newly enhanced Registry of Open Access Repositories (ROAR)
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4585.html

 how will it relate to the archives.eprints directory you are involved with?

That remains to be clarified, but my understanding is that there will be a
collaboration and DOAR will be built on the Southampton OA Archives Registry.
(Others will have to confirm whether that is indeed the case.)

 It gets confusing to me because there are so many lists of repositories around
 on the web.

That was why the Southampton OA Archives Registry was created, two years ago.
Moreover, because all the other registries rely only on voluntary
self-registration, and archives have not been rigorous about self-registering,
the Southampton OA Archives Registry has been hand-trawling the Web and other
registries to find and register new OA Archives as they are created.

Perhaps a recognizable, consistent self-identifier tag will evolve, so
OA Archives can be automatically harvested and registered, but so far
this has not yet happened. Indeed, some of the ostensibly OAI-compliant
OA Archives may not even be OAI-compliant!

This too will improve, as more institutions adopt institutional self-archiving
policies. Germany's DINI certificate will help.

Goettingen/DINI/SPARC-Europe Open Access Meeting
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4563.html

 How does the celestial harvesting list you mention relate to
 the archives.eprints list (are they the same list?)

Celestial, written by Tim Brody, from the University of Southampton,
is an OAI aggregator/cache application that imports OAI metadata from
version 1.0,1.1,2.0 OAI-compliant repositories, and re-exposes that metadata
through either an aggregated or per-repository OAI-compliant 2.0 interface.

Tim is also the creator and maintainer of the Southampton OA Archives Registry
archives.eprints.org where it is explained that:

http://archives.eprints.org/eprints.php?action=about

What does Not in Celestial mean?

This means the archive has not been listed/harvested by Celestial
(celestial.eprints.org). This may be because the archive doesn't
have a functioning OAI-PMH interface.

What does OAI Interface Unknown mean?

Either the archive doesn't have a functioning Open Archives interface,
or we couldn't track down where it is. Site admins should say on
their 'about' or 'help' page where their OAI interface is and use a
common URL for it (e.g. /perl/oai or /cgi-bin/oai). Submitting your
site to the OAI registry/Hussein Suleman's Repository Explorer will
also help to get your site noticed.

 or the large list kept
 by the University of Illinois at Urbana-Champaign (UIUC) at
 http://gita.grainger.uiuc.edu/registry/?

That is one of the registries from which the the Southampton OA Archives 
Registry
hand-harvests. The Registry regularly harvests also from OAIster
http://oaister.umdl.umich.edu/o/oaister/

It can also import lists from OAI list-friends automatically:
http://archives.eprints.org/eprints.php?action=import

 I take the archives.eprints to be the closest to a definitive list of the
 OA Institutional Repositories which we are concerned with here - alhtough I
 notice that our 'DSpace@Cambridge' repository
 http://www.lib.cam.ac.uk/dspace/index.htm is not included.

DSpace@Cambridge is in the Registry: See
http://archives.eprints.org/eprints.php?url=http%3A%2F%2Fwww.dspace.cam.ac.uk%2F

But it is not in Celestial 

Re: OA Archives: Full-texts vs. metadata-only and other digital objects

2005-06-13 Thread Tim Brody

Tim Gray wrote:

Stevan

Thank you for your full and illuminating reply to my query about how much
material in OA archives is available as full text. I am surprised at how
low you estimate the figure to be and that it is not, yet, possible to
produce a definitive number.


Knowing the difference between a full-text (also whether it's
scholarly/published/peer-reviewed) is something in the realm of Google
Scholar, Citeseer etc.

Without wishing to recreate one of those services I don't know of a
method for producing a definitive number. I suspect simple approaches
(e.g. does record have PDF link) will be undermined by (sorry for
picking on you!) sites like:
http://library.isibang.ac.in:8080/dspace/
No prizes for spotting why that wouldn't work :-)


I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
the 'sister project' to the Directory of Open Access Journals, DOAJ) will
set strictly 'full text only' rules for inclusion in its directory? And how
will it relate to the archives.eprints directory you are involved with? It
gets confusing to me because there are so many lists of repositories around
on the web. How does the celestial harvesting list you mention relate to
the archives.eprints list (are they the same list?) or the large list kept
by the University of Illinois at Urbana-Champaign (UIUC) at
http://gita.grainger.uiuc.edu/registry/?


Celestial is an OAI cache - it retrieves every metadata record from
those archives I've added to it. To make archives.eprints (IAR) I
stapled together the GNU EPrints listing with Celestial's record counts
(as an aside, anyone can use the records graphs from Celestial). I keep
a firmer technical control of Celestial than I do the IAR.

UIUC is the point of entry to get added to OAIster, but provides
analyses of all *OAI* repositories registered with it. The IAR includes
many archives with no or broken OAI interfaces, as well as aggregates
(e.g. single entry with multiple OAI interfaces). We also collect
additional metadata in the IAR that isn't exposed by OAI (type,
software, etc.). (Not forgetting the registry at www.openarchives.org 
Hussein Suleman's OAI explorer)

My hope and expectation is that OpenDOAR will include some metric of
full-textness. There was also an effort for the recent Amsterdam
SURF/JISC/CNI meeting to ascertain some figures (by survey) for the
content of IRs - I believe that report will be published in the next
month or so.


I take the archives.eprints to be the closest to a definitive list of the
OA Institutional Repositories which we are concerned with here - alhtough I
notice that our 'DSpace@Cambridge' repository
http://www.lib.cam.ac.uk/dspace/index.htm is not included.


Here?
http://archives.eprints.org/index.php?url=http%3A%2F%2Fwww.dspace.cam.ac.uk%2F


I see the distinction between OA Archives and the Open Access Initiative.
Maybe this is not strictly relevant to this forum and a basic
misunderstanding of the purposes of archiving, but I still cannot
understand why people are archiving *just* the metadata and not the full
text. It makes OA search engines like OAIster more like a any other
standard bibliographic database with mostly subscription-only access.


I'm glad to see you're an archivangilist rather than a repologist
('sorry, the full-text isn't available here')!

It's the IR vs Open archives paradigm. The IR serves an institutional
need to *track* as well as to *expose* research output. Tracking
research output does not require making that research available for-free
on the Web. The purpose of Open archives is to make research more
efficient by maximising access to research, hence maximising research
impact.

If a high quality body of freely accessible literature is available
through IR's, then the services that build on them will be more useful.
There are a lot of records appearing out there, but the full-texts
available from ad hoc Web pages still dwarfs IRs. There is also no clear
distinction between prestigious research and the capture all
philosophy - administrators and authors need to realise that what they
put into the IR may very well turn up on automated CVs, and they
probably don't want to have their high-impact peer-reviewed articles
hidden amongst 1000's of powerpoint slides!

Sincerely,
Tim Brody tdb...@ecs.soton.ac.uk
Administrator, Institutional Archives Registry
http://archives.eprints.org/