Re: Scientometric OAI Search Engines

2004-05-27 Thread Stevan Harnad
Subject Thread:
Scientometric OAI Search Engines
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html

On Wed, 26 May 2004, Michael Leach wrote:

 As we build institutional repositories (IR) and begin the process of
 linking these repositories, we could have the ability to create our own
 impact factors, linking the articles and citations among repositories all
 over the world.

This is not only already possible, but already happening. See:

OpCit: The Open Citation Project providing
Reference Linking and Citation Analysis for Open Archives
http://opcit.eprints.org/

Citebase: The Cross-OAI-Archive Citation and Download Ranking Search
Engine:
http://citebase.eprints.org/

Citeseer: The oldest citation engine of them all, operating on harvested
non-OAI articles in computer science archived on arbitrary websites:
http://citeseer.ist.psu.edu/cs

and the
Usage/Citation Correlator, which can be used to predict eventual
citations from current downloads:
http://citebase.eprints.org/analysis/correlation.php

Many other new forms of digitometric analyses and performance indicators
will emerge as the Open Access Corpus grows.

  Similarly, as IR administrators work with publishers
 (including open access as well as more traditional publishers) to directly
 deposit postprint copies of articles and other digital objects in IRs, the
 new IR-Impact Factors could gain a similar weight to the Thomson/ISI
 Impact Factor.  It is likely that the IR-Impact Factor could cover
 literature not currently covered by Thomson/ISI, so while the two Impact
 Factors overlap, they would provide some independent means of assessing a
 journal's or article's impact in a given community.

They can, and already do. Their only limit is the limited size of the OA
corpus so far.

 However, there may be another way to create an Impact Factor-like  
 statistic to analyze open access materials and other published works.  
 With the COUNTER standard and similar e-journal statistical tools, it is
 possible for a variety of libraries to merge their user access statistics
 and produce lists of most accessed papers or most accessed ejournals  
 for given fields.

These are the download statistics that Tim Brody's citebase and
usage/citation correlator already gather. As the OA corpus grows, there
will no doubt be cross-archive arrangements for monitoring, storing and
harvesting download statistics along with citation statistics.

 For instance, the NERL (NorthEast Research Library) Consortium could pool
 their statistics to produce such lists, or perhaps the top research
 institutes in a given field (e.g. MIT, Harvard, Stanford, CalTech, etc. in
 physics) could produce the lists.  Granted, this ranking would be less
 scientific than the current Thomson/ISI Impact Factor, but it may still
 serve the purpose our users and readers want, which is defining quality
 and relevance.

The only handicap OAI digitometrics has over ISI measures is the size
and scope of the OA corpus. There is nothing less scientific about it.

 License agreements would have to be adjusted with publishers to include a
 provision for publishing and pooling the statistical data.  Open access
 publishers would have to be willing and able to supply such data as well.

If we wait for OA journals to prevail in order to approach 100% OA
coverage we will wait till doomsday. OA self-archiving will prevail far
earlier. I doubt that non-OAI publishers will mind pooling usage data
once OA prevails, perhaps even earlier.

 The debate surrounding open access, in part, resides with quality and
 relevance issues.  Waiting five years for an Impact Factor, as IOP's New
 Journal of Physics did, could hinder the process of open access
 acceptance.  Creating other measures of quality, such as the pooled
 statistics/ranking or IR-Impact Factor model above could provide another
 measure, and an earlier one, for many new publications.  With many such
 quality models available, individual readers and authors could pick what
 works best for them in determining quality and relevance.

OA Eprint archives will not only provide early-days metrics and predictors
in the form of download and citation counts for the published final
drafts (postprints), but also for the even earlier-days pre-refereeing
preprints.

And other, richer digitometric measures will develop too, such as
co-citation statistics (already available with citebase), Google
PageRank-like weightings, but using citations rather than links,
Hub/Authority analysis, co-text semantic analysis, correlation and
prediction, time-series analysis, and much more. All it awaits is the
growth of the Open Access Corpus.

Stevan Harnad

REFERENCES

Hitchcock, S. Carr, L., Jiao, Z., Bergmark, D., Hall, W., Lagoze, C. 
Harnad, S. (2000) Developing services for open eprint archives:
globalisation, integration and the impact of links. Proceedings of the
5th ACM Conference on Digital Libraries. San Antonio Texas June 2000.
http

Re: Scientometric OAI Search Engines

2004-05-05 Thread Robert Kiley
It is recognised that there are here are two ways to provide OA:

(1) publishing articles in OA journals and

(2) publishing them in conventional journals but self-archiving them
publicly on the web as well.

One problem with route 2 that doesn't seem to have been fully addressed
is how should the PubMed or Web of Knowledge user find these open access
articles.  By way of example let us assume I stumble across the
following PubMed article:

Harnad S. Ingelfinger over-ruled...
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
cmd=Retrievedb=pubmeddopt=Abstractlist_uids=11191471
[published in the Lancet]  {Had this article appeared in a more recent
issue - then PubMed would have linked directly to ScienceDirect and
access would be limited to subscribers']

Of course the author has self archived this article:

http://cogprints.ecs.soton.ac.uk/archive/1703/

...but how would the PubMed user know this?  Do we honestly expect users
to search PubMed and then go and search the OAIster service in the hope
that an open access version may be available.

I agree that route 2 is a way to provide open access - but at the same
time we must ensure that the major bibliographic services (PubMed, Web
of Knowledge etc) provide links to the open access version - as well as
the publisher version.  Is there any strategy for addressing this?

Robert Kiley
Head of Systems Strategy - Wellcome Library.
183, Euston Road, London. NW1 2BE
Tel: 020 7611 8338; Fax: 020 7611 8726; mailto:r.ki...@wellcome.ac.uk
Library Web site: http://library.wellcome.ac.uk

The Wellcome Trust is a registered charity, no. 210183. Its sole Trustee
is the Wellcome Trust Limited, a company registered in England, no
2711000, whose registered office is 183 Euston Road, London, NW1 2BE.

Relevant prior threads:

Re: proposed collaboration: google + open citation linking
http://www.openarchives.org/pipermail/oai-general/2001-June/35.html

Economic effects of link-based search engines on e-journals
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0894.html

A Search Engine for Searching Across Distributed Eprint Archives
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0927.html

Testing the citation-ranking search engine: Citebase
 http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2121.html

Scientometric OAI Search Engines
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html

Need for systematic scientometric analyses of open-access data
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2521.html

How to compare research impact of toll- vs. open-access research
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2858.html


Re: Scientometric OAI Search Engines

2004-05-05 Thread Stevan Harnad
 the citation-ranking search engine: Citebase
  http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2121.html

 Scientometric OAI Search Engines
 http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html

 Need for systematic scientometric analyses of open-access data
 http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2521.html

 How to compare research impact of toll- vs. open-access research
 http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2858.html




Re: Scientometric OAI Search Engines

2004-05-05 Thread Bob Parks
Robert Kiley writes:

It is recognised that there are here are two ways to provide OA:
(1) publishing articles in OA journals and
(2) publishing them in conventional journals but self-archiving them
publicly on the web as well.

One problem with route 2 that doesn't seem to have been fully addressed
is how should the PubMed or Web of Knowledge user find these open access
articles.  By way of example let us assume I stumble across the
following PubMed article:

Harnad S. Ingelfinger over-ruled...
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
cmd=Retrievedb=pubmeddopt=Abstractlist_uids=11191471
[published in the Lancet]  {Had this article appeared in a more recent
issue - then PubMed would have linked directly to ScienceDirect and
access would be limited to subscribers']

Of course the author has self archived this article:

http://cogprints.ecs.soton.ac.uk/archive/1703/

google search for 'Ingelfinger over-ruled' produces the cogprints as the first 
item.

google is (intensively) indexing the academic literature (at least the OA 
literature).

However 'Ingelfinger' is too unique and hence easy - But my experience has been
so far that if it is accessible via OAI methods, google finds it.

...but how would the PubMed user know this?  Do we honestly expect users
to search PubMed and then go and search the OAIster service in the hope
that an open access version may be available.

Either that or they have to subscribe to everything, right?

I agree that route 2 is a way to provide open access - but at the same
time we must ensure that the major bibliographic services (PubMed, Web
of Knowledge etc) provide links to the open access version - as well as
the publisher version.  Is there any strategy for addressing this?

My point is that google probably will do it as long as the suppliers
let google index them.  

Bob

*--*
| Bob Parks  Voice: (314) 935-5665 |
| Department of Economics, Campus Box 1208 Fax: (314) 935-4156 |
| Washington University|
| One Brookings Drive  |
| St. Louis, Missouri 63130-4899b...@parks.wustl.edu|
*--*


Re: Scientometric OAI Search Engines

2004-05-05 Thread Tim Brody
The likelihood is the user searched Google before they tried Pubmed or 
ScienceDirect:
Ingelfinger Over-Ruled harnad comes up with an OA version as the top 
match.

With OAI and OpenURL the OA version could be linked in as easily as the 
aggregators currently linked to by PubMed (although perhaps not as 
reliably, but then if you get a hit at least you know the version is 
accessible).

While it would be nice for services to link to OA versions, it doesn't 
take more than 30 seconds to copy/paste some appropriate keywords into 
Google, which seems to do a good job of discovering an accessible version.

Tim Brody
Citebase Search: http://citebase.eprints.org/