See article by Arif Jinha in Learned Publishing from 2010 - 50 million: an estimate of the number of scholarly articles in existence
Jinha, Arif E http://dx.doi.org/10.1087/20100308 Sally Sally Morris South House, The Street, Clapham, Worthing, West Sussex, UK BN13 3UU Tel: +44 (0)1903 871286 Email: sa...@morris-assocs.demon.co.uk _____ From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf Of Stevan Harnad Sent: 16 October 2014 13:10 To: Global Open Access List (Successor of AmSci) Subject: [GOAL] Re: 114 million scholarly documents on the web;27 million toll-free On Oct 15, 2014, at 8:46 PM, Andrew A. Adams <a...@meiji.ac.jp> wrote: How many scholarly papers are on the Web? At least 114 million, professor finds https://tinyurl.com/kogygol The Number of Scholarly Documents on the Public Web Madian Khabsa, C. Lee Giles mail Published: May 09, 2014 DOI: 10.1371/journal.pone.0093949 PLOS OnePaper: https://tinyurl.com/pwefk88 THE SOUND OF ONE HAND CLAPPING Extremely interesting finding, but the question it raises can be expressed by the old Maine (sexist) joke, which I will here present in a gender-neutral way: Old-Timer #1: "How's yir spouse?" Old-Timer #2: "Compayured to wot?" 27M articles are OA out of how many articles published? (Not out of how many on the web, but out of how many published? And published when?) 27M is a "dangling numerator." We need to know the denominator. (And also what the ratio was last year, and the year before, so we know how fast it's growing, and whether it's nearer to 10% or 100%.) 114 articles on the web is not the right denominator. According to Ulrich's Global Serials Directory http://ulrichsweb.com there are 105,000 peer-reviewed journals. (I don't know what proportion are English-language, nor what proportion are uncited, but never mind.) Let us (under)estimate extremely conservatively that on average they publish at least 15 articles each per year. That makes at least 1.5M articles published per year (close to the Bjork et al estimate in made in 2009 http://files.eric.ed.gov/fulltext/EJ837278.pdf ) Now we need to know the date of publication of K & G's 27M OA articles. And we need to estimate what proportion of the Ulrichs annual 1.5M articles is among the total 114M articles found on the web, per year or publication. And then we need to calculate what yearly proportion of that yearly subset of Ulrichs is among those 27M articles that are OA. The K & G ratio of 27M/114M = 24% is unfortunately not the ratio we need, neither for the total ratio nor for the yearly ratio. The total ratio would be almost meaningless without dates: The total ratio of all journal articles ever published? So only annual ratios make sense. But if 1.5M were the annual denominator, we would then need to know the corresponding annual OA numerator. In other words, we need an actual Ulrichs sample of the denominator for, say, each of the last 10 years of publication, and then we need to know what proportion of those articles are OA, for each year (the numerator). Unfortunately, Ulrichs indexes only journals, not journal articles. For annual journal articles one needs to use Thomson-Reuters Web of Science or SCOPUS (and they only cover about 12% of Ulrichs -- but never mind, it's certainly a high-priority subset, and perhaps we can estimate the rest from further sampling, the way Bjork et al did). An extremely crude estimate might be derived from K & G's 27M, using 1.5M as the annual denominator, if we had the publication dates for those 27M. (Do K & G have those data?) I don't think 114M is a suitable proxy for that denominator. I am sure that K & G's ingenious method can be used to make estimates of OA/published ratios by year (and by field). I hope that K & G will go on to do so. It will be a great help in tracking the growth of OA. Without at least that it still sounds to my ears like just the sound of one hand clapping - rather like the download stats that individuals proudly post in their CVs these days, without providing any norms, reference points or baselines for comparison. Rather like a pharmaceutical company that tells you how many patients who took their drug survived (without telling you how many didn't, nor how many patients didn't take their drug, nor what happened to those patients!). Stevan Harnad
_______________________________________________ GOAL mailing list GOAL@eprints.org http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal