Re: Scientometric OAI Search Engines
Subject Thread: Scientometric OAI Search Engines http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html On Wed, 26 May 2004, Michael Leach wrote: As we build institutional repositories (IR) and begin the process of linking these repositories, we could have the ability to create our own impact factors, linking the articles and citations among repositories all over the world. This is not only already possible, but already happening. See: OpCit: The Open Citation Project providing Reference Linking and Citation Analysis for Open Archives http://opcit.eprints.org/ Citebase: The Cross-OAI-Archive Citation and Download Ranking Search Engine: http://citebase.eprints.org/ Citeseer: The oldest citation engine of them all, operating on harvested non-OAI articles in computer science archived on arbitrary websites: http://citeseer.ist.psu.edu/cs and the Usage/Citation Correlator, which can be used to predict eventual citations from current downloads: http://citebase.eprints.org/analysis/correlation.php Many other new forms of digitometric analyses and performance indicators will emerge as the Open Access Corpus grows. Similarly, as IR administrators work with publishers (including open access as well as more traditional publishers) to directly deposit postprint copies of articles and other digital objects in IRs, the new IR-Impact Factors could gain a similar weight to the Thomson/ISI Impact Factor. It is likely that the IR-Impact Factor could cover literature not currently covered by Thomson/ISI, so while the two Impact Factors overlap, they would provide some independent means of assessing a journal's or article's impact in a given community. They can, and already do. Their only limit is the limited size of the OA corpus so far. However, there may be another way to create an Impact Factor-like statistic to analyze open access materials and other published works. With the COUNTER standard and similar e-journal statistical tools, it is possible for a variety of libraries to merge their user access statistics and produce lists of most accessed papers or most accessed ejournals for given fields. These are the download statistics that Tim Brody's citebase and usage/citation correlator already gather. As the OA corpus grows, there will no doubt be cross-archive arrangements for monitoring, storing and harvesting download statistics along with citation statistics. For instance, the NERL (NorthEast Research Library) Consortium could pool their statistics to produce such lists, or perhaps the top research institutes in a given field (e.g. MIT, Harvard, Stanford, CalTech, etc. in physics) could produce the lists. Granted, this ranking would be less scientific than the current Thomson/ISI Impact Factor, but it may still serve the purpose our users and readers want, which is defining quality and relevance. The only handicap OAI digitometrics has over ISI measures is the size and scope of the OA corpus. There is nothing less scientific about it. License agreements would have to be adjusted with publishers to include a provision for publishing and pooling the statistical data. Open access publishers would have to be willing and able to supply such data as well. If we wait for OA journals to prevail in order to approach 100% OA coverage we will wait till doomsday. OA self-archiving will prevail far earlier. I doubt that non-OAI publishers will mind pooling usage data once OA prevails, perhaps even earlier. The debate surrounding open access, in part, resides with quality and relevance issues. Waiting five years for an Impact Factor, as IOP's New Journal of Physics did, could hinder the process of open access acceptance. Creating other measures of quality, such as the pooled statistics/ranking or IR-Impact Factor model above could provide another measure, and an earlier one, for many new publications. With many such quality models available, individual readers and authors could pick what works best for them in determining quality and relevance. OA Eprint archives will not only provide early-days metrics and predictors in the form of download and citation counts for the published final drafts (postprints), but also for the even earlier-days pre-refereeing preprints. And other, richer digitometric measures will develop too, such as co-citation statistics (already available with citebase), Google PageRank-like weightings, but using citations rather than links, Hub/Authority analysis, co-text semantic analysis, correlation and prediction, time-series analysis, and much more. All it awaits is the growth of the Open Access Corpus. Stevan Harnad REFERENCES Hitchcock, S. Carr, L., Jiao, Z., Bergmark, D., Hall, W., Lagoze, C. Harnad, S. (2000) Developing services for open eprint archives: globalisation, integration and the impact of links. Proceedings of the 5th ACM Conference on Digital Libraries. San Antonio Texas June 2000. http
Re: Scientometric OAI Search Engines
It is recognised that there are here are two ways to provide OA: (1) publishing articles in OA journals and (2) publishing them in conventional journals but self-archiving them publicly on the web as well. One problem with route 2 that doesn't seem to have been fully addressed is how should the PubMed or Web of Knowledge user find these open access articles. By way of example let us assume I stumble across the following PubMed article: Harnad S. Ingelfinger over-ruled... http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=Retrievedb=pubmeddopt=Abstractlist_uids=11191471 [published in the Lancet] {Had this article appeared in a more recent issue - then PubMed would have linked directly to ScienceDirect and access would be limited to subscribers'] Of course the author has self archived this article: http://cogprints.ecs.soton.ac.uk/archive/1703/ ...but how would the PubMed user know this? Do we honestly expect users to search PubMed and then go and search the OAIster service in the hope that an open access version may be available. I agree that route 2 is a way to provide open access - but at the same time we must ensure that the major bibliographic services (PubMed, Web of Knowledge etc) provide links to the open access version - as well as the publisher version. Is there any strategy for addressing this? Robert Kiley Head of Systems Strategy - Wellcome Library. 183, Euston Road, London. NW1 2BE Tel: 020 7611 8338; Fax: 020 7611 8726; mailto:r.ki...@wellcome.ac.uk Library Web site: http://library.wellcome.ac.uk The Wellcome Trust is a registered charity, no. 210183. Its sole Trustee is the Wellcome Trust Limited, a company registered in England, no 2711000, whose registered office is 183 Euston Road, London, NW1 2BE. Relevant prior threads: Re: proposed collaboration: google + open citation linking http://www.openarchives.org/pipermail/oai-general/2001-June/35.html Economic effects of link-based search engines on e-journals http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0894.html A Search Engine for Searching Across Distributed Eprint Archives http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0927.html Testing the citation-ranking search engine: Citebase http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2121.html Scientometric OAI Search Engines http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html Need for systematic scientometric analyses of open-access data http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2521.html How to compare research impact of toll- vs. open-access research http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2858.html
Re: Scientometric OAI Search Engines
the citation-ranking search engine: Citebase http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2121.html Scientometric OAI Search Engines http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2237.html Need for systematic scientometric analyses of open-access data http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2521.html How to compare research impact of toll- vs. open-access research http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2858.html
Re: Scientometric OAI Search Engines
Robert Kiley writes: It is recognised that there are here are two ways to provide OA: (1) publishing articles in OA journals and (2) publishing them in conventional journals but self-archiving them publicly on the web as well. One problem with route 2 that doesn't seem to have been fully addressed is how should the PubMed or Web of Knowledge user find these open access articles. By way of example let us assume I stumble across the following PubMed article: Harnad S. Ingelfinger over-ruled... http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=Retrievedb=pubmeddopt=Abstractlist_uids=11191471 [published in the Lancet] {Had this article appeared in a more recent issue - then PubMed would have linked directly to ScienceDirect and access would be limited to subscribers'] Of course the author has self archived this article: http://cogprints.ecs.soton.ac.uk/archive/1703/ google search for 'Ingelfinger over-ruled' produces the cogprints as the first item. google is (intensively) indexing the academic literature (at least the OA literature). However 'Ingelfinger' is too unique and hence easy - But my experience has been so far that if it is accessible via OAI methods, google finds it. ...but how would the PubMed user know this? Do we honestly expect users to search PubMed and then go and search the OAIster service in the hope that an open access version may be available. Either that or they have to subscribe to everything, right? I agree that route 2 is a way to provide open access - but at the same time we must ensure that the major bibliographic services (PubMed, Web of Knowledge etc) provide links to the open access version - as well as the publisher version. Is there any strategy for addressing this? My point is that google probably will do it as long as the suppliers let google index them. Bob *--* | Bob Parks Voice: (314) 935-5665 | | Department of Economics, Campus Box 1208 Fax: (314) 935-4156 | | Washington University| | One Brookings Drive | | St. Louis, Missouri 63130-4899b...@parks.wustl.edu| *--*
Re: Scientometric OAI Search Engines
The likelihood is the user searched Google before they tried Pubmed or ScienceDirect: Ingelfinger Over-Ruled harnad comes up with an OA version as the top match. With OAI and OpenURL the OA version could be linked in as easily as the aggregators currently linked to by PubMed (although perhaps not as reliably, but then if you get a hit at least you know the version is accessible). While it would be nice for services to link to OA versions, it doesn't take more than 30 seconds to copy/paste some appropriate keywords into Google, which seems to do a good job of discovering an accessible version. Tim Brody Citebase Search: http://citebase.eprints.org/