> Okay, maybe you didn't say literally that nobody subscribes to all > journals; I understood you as saying that lots of institutions don't > subscribe to lots of journals.
The distinction is critical, and I chose my words (and meanings) quite consciously: The relevant question is about *articles*, and the size of the current potential readership/usership to which they are currently inaccessible because their institutions can't afford access to the journal in which they happen to be published. > My point is that it is not possible to infer from this data that > *every single paper* is inaccessible to many/most of its potential > readers. Although the inference does sound rather shocking and is probably stronger than necessary, I think it *is* possible to infer from the existing data that every single one of the annual 2.5 million articles is inaccessible to *many* of its potential users. (For that to be true, all you need is a few institutional nonsubscribers with several relevant researchers each.) That is why I chose my words as I did: What I said was "inaccessible to many or even most". (For clarification, perhaps I should say "inaccessible to many, perhaps even most.") Whether it is many or most will be indirectly revealed by our citation data. We know that *most* published articles (c. 60%) are not cited at all, and only 10% are cited more than 5 times. (There is substantial self-archiving in every citation-bracket, though more for the higher-cited articles). http://www.crsc.uqam.ca/lab/chawki/classement_citations.htm http://citeseer.ist.psu.edu/online-nature01/ We also know that self-archiving increases citations from 50-300+% http://www.crsc.uqam.ca/lab/chawki/ch.htm http://citebase.eprints.org/isi_study/ We also know that downloads correlate significantly with -- hence predict -- citations: http://citebase.eprints.org/analysis/correlation.php http://eprints.ecs.soton.ac.uk/10206/01/BMJ1.html We also know that the number of readings per article averages under and sometimes well under) 1000, which means that -- if we take the (conservative) upper limit of 5 citations per article -- 200 readings generate one citation (no doubt varying by field: in astrophysics Michael Kurtz reported a 17/1 reads/cites ratio.) http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/0899.html http://psycprints.ecs.soton.ac.uk/archive/00000084/ http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3696.html Although it is not yet possible to make direct comparisons between download counts for OA vs. non-OA articles in the same journal issue (as it is already possible to do with citations), unilateral download counts for OA articles (along with logic), corroborated by the observed download/citation correlation, suggest that OA substantially increases downloads too (by at least the amount implied by the download to citation ratio -- 200/1 to 17/1, take your pick) plus the a-posteriori evidence that OA increases citations by 50-300%. Kurtz (2004; "Restrictive access policies cut readership of electronic research journal articles by a factor of two" http://opcit.eprints.org/feb19oa/kurtz.pdf ) estimates that open access triples the number of downloads per article. In short, I think we can safely infer that self-archiving increases accessibility substantially. If it adds from 0.5 to over 3 citations per article when most articles receive 0.0 citations, this does imply that articles are today missing many (perhaps even most) of their potential users if they are not being self-archived. > To see that, let's imagine an extremely obscure topic, with no > connection to anything else, that is only studied in one place in the > world. A journal on that topic that is subscribed to by that one > institution would achieve 100% coverage! Conceded. Now, how representative do you think that kind of hypothetical special case is, for the 2.5 million articles published annually? And please don't interpret the 60% of articles that receive zero citations as prima facie evidence for your hypothesis that no one is interested in them, as they are just as readily (and more optimistically!) interpretable in exactly the opposite way: that articles have been losing users and citers because they were inaccessible rather than because no one was interested in using and citing them. > A specialist academic journal (and many of the world's journals are > very specialized!) doesn't have to be on such an obscure topic for a > similar affect to be relevant. It is a foregone conclusion that peer-reviewed research journal articles will never be best-sellers! But the question that the OA advantage data are answering is whether they have been maximizing their usage and impact until now. And the answer is that they have not: They have substantially more potential impact than they have actually exhibited to date. And the most parsimonious interpretation is that this is because they have been substantially less accessible than one might have hoped. > I think there are other very important factors at work and > maybe they even dominate when it comes to the behaviour of individual > researchers. For instance: > 1. Publisher X doesn't allow Google and other search engines to index its > journals. So people typing keywords into Google won't see articles > in Publisher X journals, even if they have access to them. They will > see articles in open archives. > 2. Publishers try to make their websites user-friendly but Google > etc. are just so good that getting access to a paper via [a publisher's > or aggregator's website ] or whatever can be more work than typing stuff > into Google, especially since each publisher lays out its website > differently. > 3. Of course, nobody physically goes to libraries anymore. Excellent points, and they will need to be tested by comparing the OA impact advantage for (1) toll-access journals that do and do not have full-text indexing by google and that (2) do and do not have online versions (though virtually all journals now do). > Here's another factor, but I don't know how it affects anything: > 4. I think many, perhaps most, citations are to papers that the > authors haven't actually read, as background material. All an > author needs to make a reference is an accurate citation that they > can cut and paste, and maybe a skim through a few paragraphs from a > preliminary version. No doubt there is some of that (indeed there is some published evidence for it, based on propagated typos, as you note), but how much? And did unread citations not occur in on-paper days too? It will take a much more sophisticated kind of text-analysis to partition citations into read and unread ones, and then to compare the size of the OA advantage for each. I suspect 100% OA self-archiving will have prevailed before we can do that; indeed we probably need the full text corpus as a database to do that sort of analysis thoroughly in the first place! Stevan Harnad AMERICAN SCIENTIST OPEN ACCESS FORUM: A complete Hypermail archive of the ongoing discussion of providing open access to the peer-reviewed research literature online (1998-2005) is available at: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/ To join or leave the Forum or change your subscription address: http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html Post discussion to: american-scientist-open-access-fo...@amsci.org UNIVERSITIES: If you have adopted or plan to adopt an institutional policy of providing Open Access to your own research article output, please describe your policy at: http://www.eprints.org/signup/sign.php UNIFIED DUAL OPEN-ACCESS-PROVISION POLICY: BOAI-1 ("green"): Publish your article in a suitable toll-access journal http://romeo.eprints.org/ OR BOAI-2 ("gold"): Publish your article in a open-access journal if/when a suitable one exists. http://www.doaj.org/ AND in BOTH cases self-archive a supplementary version of your article in your institutional repository. http://www.eprints.org/self-faq/ http://archives.eprints.org/