See article by Arif Jinha in Learned Publishing from 2010 - 

50 million: an estimate of the number of scholarly articles in existence 


Jinha, Arif E

http://dx.doi.org/10.1087/20100308

Sally

Sally Morris
South House, The Street, Clapham, Worthing, West Sussex, UK  BN13 3UU
Tel:  +44 (0)1903 871286
Email:  sa...@morris-assocs.demon.co.uk
 

  _____  

From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf
Of Stevan Harnad
Sent: 16 October 2014 13:10
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Re: 114 million scholarly documents on the web;27 million
toll-free


On Oct 15, 2014, at 8:46 PM, Andrew A. Adams <a...@meiji.ac.jp> wrote:


How many scholarly papers are on the Web? At least 114 million, professor 
finds

https://tinyurl.com/kogygol
The Number of Scholarly Documents on the Public Web
  Madian Khabsa, C. Lee Giles mail

   Published: May 09, 2014
   DOI: 10.1371/journal.pone.0093949
PLOS OnePaper: https://tinyurl.com/pwefk88


THE SOUND OF ONE HAND CLAPPING


Extremely interesting finding, but the question it raises can be 
expressed by the old Maine (sexist) joke, which I will here
present in a gender-neutral way:

Old-Timer #1: "How's yir spouse?"
Old-Timer #2: "Compayured to wot?"

27M articles are OA out of how many articles published?

(Not out of how many on the web, but out of how many
published? And published when?)

27M is a "dangling numerator." We need to know the
denominator. (And also what the ratio was last year,
and the year before, so we know how fast it's growing,
and whether it's nearer to 10% or 100%.) 

114 articles on the web is not the right denominator.

According to Ulrich's Global Serials Directory http://ulrichsweb.com
there are 105,000 peer-reviewed journals. (I don't know what
proportion are English-language, nor what proportion are
uncited, but never mind.)

Let us (under)estimate extremely conservatively that on average
they publish at least 15 articles each per year.

That makes at least 1.5M articles published per year (close to the 
Bjork et al estimate in made in 2009
http://files.eric.ed.gov/fulltext/EJ837278.pdf )

Now we need to know the date of publication of K & G's 27M OA articles.

And we need to estimate what proportion of the Ulrichs annual 1.5M 
articles is among the total 114M articles found on the web, per year or
publication.

And then we need to calculate what yearly proportion of that yearly subset 
of Ulrichs is among those 27M articles that are OA.

The K & G ratio of 27M/114M = 24% is unfortunately not the 
ratio we need, neither for the total ratio nor for the yearly ratio.

The total ratio would be almost meaningless without dates: The total ratio
of all 
journal articles ever published?

So only annual ratios make sense. But if 1.5M were the annual denominator, 
we would then need to know the corresponding annual OA numerator.

In other words, we need an actual Ulrichs sample of the denominator for,
say, 
each of the last 10 years of publication, and then we need to know what
proportion 
of those articles are OA, for each year (the numerator).

Unfortunately, Ulrichs indexes only journals, not journal articles. For
annual
journal articles one needs to use Thomson-Reuters Web of Science or
SCOPUS (and they only cover about 12% of Ulrichs -- but never mind, it's
certainly a high-priority subset, and perhaps we can estimate the rest
from further sampling, the way Bjork et al did).

An extremely crude estimate might be derived from K & G's 27M, using 1.5M
as the annual denominator, if we had the publication dates for those 27M.
(Do K & G have those data?) I don't think 114M is a suitable proxy for that
denominator.

I am sure that K & G's ingenious method can be used to make estimates
of OA/published ratios by year (and by field). I hope that K & G will
go on to do so. It will be a great help in tracking the growth of OA.

Without at least that it still sounds to my ears like just the sound of one 
hand clapping - rather like the download stats that individuals proudly 
post in their CVs these days, without providing any norms, reference 
points or baselines for comparison. Rather like a pharmaceutical company 
that tells you how many patients who took their drug survived (without
telling 
you how many didn't, nor how many patients didn't take their drug, nor what
happened to those patients!).

Stevan Harnad






_______________________________________________
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal

Reply via email to