On Sat, 23 Nov 2002, [iso-8859-1] Subbiah Arunachalam wrote: > Why is it that Open Archives/ E-prints works well in > some fields (physics, astronomy, computer science) and > not in other fields (say, agriculture)? I would like > to hear from members of the list.
Others are invited to reply too. Here is my own candidate explanation: (1) It is not that physics or astronomy or computer science are different from other fields with regard to the benefits or feasibility of self-archiving and open access in their fields. All fields can benefit from it and it is feasible in all fields. There are reasons, however, why self-arching *began* in physics/astronomy, and why it came early in computer science too. (2) Self-archiving began in physics (and soon generalized to astronomy) because physics already had, in paper days, a "preprint culture." Physicists had already learned, well before the online era, that they could accelerate the pace and interactivity of research if they did not wait till published versions of papers appeared in print. Especially in high-energy physics, they adopted the practise of mailing preprints of their work to one another, to routing lists, and to a number of central depositories. (3) This practise simply generalized, in the beginning of the '90s, quite naturally, as the technology became available, first to email routing lists, and then to a web depository. Given the existing preprint culture, this subsequent development requires no special explanation. The physicists were smarter than the rest of us in having already discovered the benefits to research progress of sharing preprints as early as possible. They would have had to be rather thick to just keep doing that in paper once email and the web were available! (4) The practise of self-archiving immediately began to spread to other areas of physics and allied fields (astronomy, mathematics), but the important fact has to be noted that from the very beginning in August 1991 to the present day, over a decade later, that growth has been merely linear (which means, currently, 3500 deposits per month) http://arxiv.org/show_monthly_submissions (5) At that linear growth rate, it would take 10 years before everything being published in physics (in that year, 2012) was being self-archived. Physics/astronomy/maths are still ahead of all disciplines, but their lead is not dramatic enough, and another decade would be far, far too long a wait. What is needed is something that will not only (i) accelerate self-archiving in those head-start fields to a curvilinear upward growth-rate that will capture their total current research output much sooner, but also something that will (ii) universalize the practise of self-archiving to all the other late-comer disciplines, and capture their full research output too (currently about 2,000,000 articles per year, appearing in the approximately 20,000 peer-reviewed journals that exist today in all disciplines and languages worldwide). (6) My own hypothesis is that distributed, institutional self-archiving will be the critical factor that will induce this acceleration and universalization of self-archiving, as centralized, discipline-based self-archiving alone has so far failed to do. (7) The reason is that the rationale for institutional self-archiving makes the benefits of open access explicit for all researchers. Researchers and their own institutions (not their disciplines) are the co-beneficiaries of the maximized research visibility, accessibility, usage, citation and impact that are provided by maximizing research access (i.e., universal, open access) through self-archiving. It is researchers and their institutions whose research output and research impact, and the indirect rewards that they bring -- in the form of research funding, income and standing, prizes and prestige -- benefit from open access. (8) In addition, research institutions have the further motivation to try to relieve their serials subscription/license crises by doing whatever they can to promote open access through self-archiving: Distributed self-archiving is reciprocal. (9) And the motivation for institutional reciprocity in self-archiving is not just based on (a) the potential to maximize the impact of institutional research output, nor on the possibility of eventually (b) relieving institutional serials budget burdens. Access itself -- (c) access to the peer-reviewed research output of all other universities -- can only enhance the quality and productivity of their own researchers' word, for in the current toll-access system no institution, not even the biggest or wealthiest institution, can afford to provide access to anywhere near the total peer-reviewed research literature for its researchers (in any field). (10) The fourth reason that distributed institutional self-archiving may well prove to be the way to accelerate and universalize open access is that (d) internal and external research assessment (to reward researchers for their past contributions and to fund their future contributions http://www.hero.ac.uk/rae/ ) also promises to be greatly strengthened through the creation of a global, open-access digital database of total institutional research output, accessible to the many new scientometric assessment tools that are being and will be created to analyze and monitor research productivity and impact (e.g., http://citebase.eprints.org) when applied to this rich new resource. This cause/effect loop, and the means to monitor, measure, and display it, will not remain for long lost on either university administrations or research funders. (11) I have still to reply about computer science: This is another sort of special case. The content of computer science, as a discipline, is by its nature closest to the medium of self-archiving itself, namely, computers, digital data, and distributed networks. It was only natural that computer-scientists should create and store their digital research output on the Net, and they did so, in huge numbers -- greater even than those of physics and the other head-start disciplines. But they stored them on their home websites or departmental tech-report pages rather than in a centralized computer science archive as the physicists had done in ArXiv. (There is a computer-science sector in ArXiv too, but it is still one of the smaller sectors and growing no faster than the others.) (12) The brilliant (but also quite natural) strategy of NEC's Steve Lawrence, Lee Giles and Kurt Bollacker had then been to try to *harvest* all of the anarchically self-archived computer science papers distributed all over the web (and this was before the days of OAI-interoperability -- http://www.openarchives.org -- and OAI-compliant institutional Eprints Archives -- http://www.eprints.org -- which have since made harvesting so much easier). The result, ResearchIndex -- http://citeseer.nj.nec.com/cs -- was (and still is!) the biggest open-access archive of them all, having harvested in computer science over twice as many papers (currently 500,000) as all the papers (currently 200,000) in all the fields in the Physics ArXiv put together. But ResearchIndex is a "virtual" archive, not a centralized one at all; it is a google-style selective harvest from distributed websites all over the Web. Lawrence et al. have also demonstrated the power of such a virtual database to generate rich new citation-based scientometric indicateors of research and researcher productivity and impact. (13) All these currents are currently converging. The Physics ArXiv is OAI-compliant, as are all the distributed institutional Eprint Archives, so they can all be harvested and navigated seamlessly as if they were all one global archive. The computer science archive (ResearchIndex, has announced that it too will shortly become OAI-compliant. So there is no longer any difference bewteen central and distributed archiving. Universities worldwide are becoming increasingly aware of the causal connections between research access and research impact, and their implications for research productivity and funding, and are moving towards self-archiving their institutional research output and the reciprocal benefits it confers to the entire worldwide research community. (14) But it is all still happening far too slowly! We need not, and should not, wait another decade to reap the immense benefits of open access to the planet's research output. (15) For ideas about what researchers, their institutions, and their research funders can do to hasten us all along the road to the optimal and inevitable, see: http://www.eprints.org/self-faq/#researcher/authors-do http://www.eprints.org/self-faq/#institution-facilitate-filling http://www.eprints.org/self-faq/#research-funders-do Replies to Arun's question are invited from others too! Stevan Harnad