Re: OA advantage = EA + AA + QB + OA + UA

2004-10-20 Thread Heather Morrison

On 20-Oct-04, at 10:28 AM, Stevan Harnad wrote:



I think the interpretation of this is fairly clear: Once there is 100%
OA,
research is used far more, and although the overall number of
references
per article may not increase, their *selectivity* does, because
authors can
cite what is most important and relevant, rather than just what their
institutions happen to be able to afford to access (as is the case
before
there is 100% OA, which is the prevailing condition in all fields other
than astro currently).


As Stevan points out, the citation advantage for OA as opposed to
non-OA materials is only demonstrable in traditional numeric terms for
fields that are only partially OA.  Once 100% OA is achieved, then all
articles and authors have an equal research impact advantage.

Here is my perspective, as someone who is no expert on bibliometrics:

There are other obvious impact advantages of OA in a 100% OA field,
which may need different types of measurements.  One example is the
increased quality of research that can proceed once researchers have
ready access to all the scholarly knowledge in their fields.  Another
advantage that might not be picked up by traditional measurements is
increased citations in journals that are not covered by western-based
citation indexes.  That is, researchers in developing countries will
have access, and are likely citing articles, but citations in
publications based in their home countries are not covered by current
indexes.  Another clue to increasing impact in terms of usage is the
increase in downloads or readership that Stevan refers to.  This may be
the beginnings of evidence of impact beyond academe, that is, usage by
professionals, teachers, students, etc.

Thoughts?

Heather G. Morrison
Project Coordinator
BC Electronic Library Network

Phone: 604-268-7001
Fax: 604-291-3023
Email:  heath...@eln.bc.ca
Web: http://www.eln.bc.ca


Re: Do Open-Access Articles Have a Greater Research Impact?

2004-10-20 Thread Chawki Hajjem
Below is the latest evidence that the Open Access Impact Advantage is
neither unique to the Physical Sciences and Mathematics:

http://citebase.eprints.org/isi_study/

nor to the Biological Sciences:

http://www.crsc.uqam.ca/lab/chawki/OA_NOA_biologie.gif

The Impact advantage is there in the Social Sciences too:

http://www.crsc.uqam.ca/lab/chawki/sociologie.htm

The explanation for http://www.crsc.uqam.ca/lab/chawki/sociologie.htm
is so far only in French (it will be translated shortly)
but the English explanation for http://citebase.eprints.org/isi_study/
applies to the Social Science data too.

Note that one significant difference between the Physical Sciences and
the Social Sciences is that the rate of self-archiving is not increasing
in the Social Sciences yet (correlation between number of OA
articles and Year is positive for Physics/Mathematics, negative for
Sociology/Anthropology). The OA impact effect is always positive except
in the most recent year (2003), probably because the ISI citation counts
are not yet up to date for 2003.

Chawki Hajjem
Doctoral Candidate
Informatique cognitive
Centre de neuroscience de la cognition (CNC)
Université du Québec à Montréal
Montréal, Québec,  Canada  H3C 3P8
tel: 1-514-987-3000 2297#
fax: 1-514-987-8952


Re: OA advantage = EA + AA + QB + OA + UA

2004-10-20 Thread Stevan Harnad
Prior AmSci Topic Thread:

"OA advantage = EA + AA + QB + OA + UA"
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/3978.html

A forthcoming article by Michael Kurtz (Harvard-Smithsonian Center for
Astrophysics) and co-workers reports that in astrophysics -- which (with
its small, closed circle of journals and with all active astrophysicists
worldwide being at institutions that can afford toll-access to all of
them) has had de-facto 100% OA for several years now -- the total number
of citations (hence the average number per article) has not risen; in
fact it may even have diminished a little. There is instead a threefold
increase in usage (readership, downloads).

Kurtz et al. (2004) "The Effect of Use and Access on Citations"
Information Processing and Management (submitted)
http://cfa-www.harvard.edu/~kurtz/IPM-abstract.html

I think the interpretation of this is fairly clear: Once there is 100% OA,
research is used far more, and although the overall number of references
per article may not increase, their *selectivity* does, because authors can
cite what is most important and relevant, rather than just what their
institutions happen to be able to afford to access (as is the case before
there is 100% OA, which is the prevailing condition in all fields other
than astro currently).

One certainly cannot take the absence of an overall increase of citations
in a field that already has 100% OA, as evidence against the need for 100%
OA in other fields, where OA is far less than 100%!

Michael's interesting finding is probably unique to astro, which was even 100%
OA before the online era (i.e., 100% of astrophysicists were at institutions 
that
could afford 100% of the astro journals in paper), but his pattern of findings
has suggested that there are several components contributing to the OA 
Advantage:

(1) What Michael calls the "EA" or "Early Access" advantage: Papers that are
self-archived as preprints, even in astro, get more citations than those that
are not.

If I understand Michael's data correctly, however, the EA is in fact
a permanent increment in a paper's total cumulative citation count and
not just a phase shift that reaches its peak earlier, without increasing
the cumulative total of citations. This is probably because of a paper's
autocatalytic usage/citation/usage/citation cycle, which Tim Brody has
also detected, and is illustrated in Tim's forthcoming usage/citation
correlation paper:

Brody, T. and Harnad, S. (2004) Using Web Statistics as a Predictor
of Citation Impact .
http://www.ecs.soton.ac.uk/~harnad/Temp/timcorr.doc

(2) The "AA" or "Arxiv advantage," which applies to both preprints and
postprints: Even though they are all already 100% OA through institutional
subscriptions/licenses, papers that are also self-archived in ArXiv get
more citations. (In fields with distributed institutional self-archiving,
AA would of course not be an ArXiv effect but an OAIster effect.) This
advantage would no doubt vanish if toll-access and open-access were fully
integrated, but it is interesting that it is present, even in a 100%
OA field.

http://www.arxiv.org/
http://oaister.umdl.umich.edu/o/oaister/

(3) The Quality Bias, "QB," which is the fact that the higher-quality,
higher-impact authors tend to self-archive more overall, and that it is
particularly their higher-quality (hence higher-impact) papers that
authors tend selectively to self-archive more. This self-selection bias
is definitely one of the factors underlying the positive correlation
between OA and citation counts, but it is certainly not the only
factor. It will be interesting to estimate the size of QB, relative to
the other 3 factors, especially as OA grows from 0% to 100%. (The QB
component obviously has to shrink as the proportion of self-archiving
authors grows, since QB is based on self-selective differential
self-archiving of only the higher-quality work.)

(4) The true OA Advantage, OAA, which is probably by far the strongest in
fields that are nearer to 0% OA than to 100% OA because OAA is a *relative*
advantage (and a *competitive* one): In a non-OA field (unlike astro,
which is 100% OA), *all* factors give the advantage to the self-archived
article over the non-self-archived one (e.g., even postprints have the
"Early Advantage"). So even if the pure OAA is destined to shrink to
zero once 100% OA is reached, it is a *huge* advantage today, when OA
is far from 100%. It means that authors have a great deal of competitive
incentive to make their own articles OA now, before their competitors do.

In other words, it's really a Prisoner's Dilemma, hence a horse race,
once the odds and the causality are clearly understood! That is why we are
so busily generating the OA advantage data across all disciplines in
our collaborative ISI study in Southampton, Quebec and Oldenburg:

http://citebase.eprints.org/isi_study/
http://www.crsc.uqam.ca/lab/chawki/sociologie.htm
http://www.crsc.u

Re: A Search Engine for Searching Across Distributed Eprint Archives

2004-10-20 Thread Stevan Harnad
On Wed, 20 Oct 2004, Donat Agosti wrote:

> Something, which bothers me and doesn't show up in most of the
> discussion of open access, is the construction of search tools across
> digital publications (and potentially millions of pages of legacy
> information). In the end, this will be the real issue, not just reading
> another publication face to face.

The real issue -- and the 1st, 2nd, 3rd and Nth priority today -- is
Open Access (OA) *content*: The full-texts of the 2.5 million annual
articles published in the world's 24,000 peer-reviewed journals are
still not openly accessible online (only about 20% of them are).

It is merely distraction and dreaming to worry about search tools when the OA
content is not yet there for them to search!

Having said that, cross-archive search tools (for the little OA content
we have so far) already *do* exist (and they are already far more powerful
than their sparse content yet deserves!):

http://oaister.umdl.umich.edu/o/oaister/
http://citebase.eprints.org/
http://www.scirus.com/srsapp/

And (I promise you), providing more OA content is guaranteed to inspire
the creation of more and more such tools, with more and more powerful
capacities.

So please, don't worry about more powerful search tools when the cupboards are
still bare: Fill the cupboards and the search tools will come, hungrily!

> What do you think about that? It seems, that the big publishing houses
> are already thinking about that, and that they developed such facilities.

The big publishing houses' cupboards are *not* bare: They have the 100% Toll
Access content on which to provide ever more powerful search tools. Let's 
provide
100% Open Access content and then watch what happens!

> This of course is one of the most important tools, for data
> mining, extraction, or just finding the right piece of information. It
> also means, that we look beyond self-archived pdf documents to searchable
> documents with some mark up of their logic content included. Any ideas?

Two ideas:

(1) Provide the full-text Open Access content, and the tools for finding, mining
and extracting from it will come with the territory.

(2) The primary target is journal articles, which consist primarily of text. The
most powerful means of text-processing today is full-text inversion. (This is 
part
of the magic that google does.) Enhancing this with citation-linking (in place
of google's ordinary linking), plus some hub/authority analysis, citation and
download ranking, co-citation analysis, co-text (semantic/similarity) analysis,
and full-text boolean search, and I think you will have search capabilities to
surpass your wildest dreams.

The only missing element is the content. Please let's not forget that, and
lapse into Oneirology instead of Open Access Provision!

Stevan Harnad


Re: A Search Engine for Searching Across Distributed Eprint Archives

2004-10-20 Thread Donat Agosti
Dear Stevan

Attached a little report which appeared in today Science section of the
Neue Zuercher Zeitung:
http://www.nzz.ch/2004/10/20/ft/page-article9XKLV.html
about
http://www.oai.unizh.ch/symposium/program.html

I am sorry, I couldn't make it. There was a second meeting in Bern on
Biodiversity Issues, which has in fact a lot to do with the open access
initiative. This meeting though was organized by life science, and not
medical science, two branches of the Swiss Academy of Sciences

Something, which bothers me and doesn't show up in most of the
discussion of open access, is the construction of search tools across
digital publications (and potentially millions of pages of legacy
information). In the end, this will be the real issue, not just reading
another publication face to face.

What do you think about that? It seems, that the big publishing houses
are already thinking about that, and that they developed such
facilities. This of course is one of the most important tools, for data
mining, extraction, or just finding the right piece of information. It
also means, that we look beyond selfarchived pdf documents to searchable
documents with some mark up of their logic content included. Any ideas?

All the best, and thanks for all your efforts re open access

Donat

Dr. Donat Agosti
Research Associate, American Museum of Natural History and Smithsonian
Institution

Email: ago...@amnh.org
Web: http://anbase.org
CV: http://research.amnh.org/entomology/social_insects/agosticv_2003.html

Dalmaziquai 45
3005 Bern
Switzerland
+41-31-351 7152


Re: Eprints, Dspace, or Espace?

2004-10-20 Thread Stevan Harnad
On Wed, 20 Oct 2004, Philip Hunter wrote:

> The focus of each of the OAI-compliant archive-creating softwares is
> different, as you acknowledge, since some are designed to archive digital
> objects in general, not just eprints. The functionality of the different
> softwares differs on this account, and therefore there is a choice to
> be made between softwares.

There is indeed. But Philip seems to have missed the point: This is an
Open Access Forum, not an "Institutional Digital Asset Management Forum."

Institutional Digital Asset Management is indeed an important and worthy
issue. So is Research Funding, Public Health and World Hunger. But
those are not what the Open Access Initiative is about! The Open Access
Initiative is about providing toll-free, online, full-text access to
the 2.5 million articles that appear annually in the world's 24,000
peer-reviewed journals in order to make them accessible to all their
would-be users worldwide -- irrespective of whether their institutions
can afford to subscribe to the journal in which each article appears --
and thereby maximising the research impact of each article, its author,
its author's institution, and its author's research funder. It is not
about Institutional Digital Asset Management.

Budapest Open Access Initiative
http://www.soros.org/openaccess/read.shtml

"The literature that should be freely accessible online is that which
scholars give to the world without expectation of payment. Primarily,
this category encompasses their peer-reviewed journal articles...

"An old tradition and a new technology have converged to make possible
an unprecedented public good. The old tradition is the willingness
of scientists and scholars to publish the fruits of their research
in scholarly journals without payment, for the sake of inquiry
and knowledge. The new technology is the internet. The public good
they make possible is the world-wide electronic distribution of the
peer-reviewed journal literature and completely free and unrestricted
access to it by all scientists, scholars, teachers, students, and
other curious minds. Removing access barriers to this literature
will accelerate research, enrich education, share the learning of the
rich with the poor and the poor with the rich, make this literature
as useful as it can be...

My reply to the student's inquiry about which OAI archive-creating
software to use was based entirely on the fact that the inquiry was
addressed to me (and in the context of the American Scientist Open Access
Forum). I am not, and never have been, a spokesman for Institutional
Digital Asset Management (though I of course have nothing against that
project, only the highest admiration for it).

Nor was the GNU Eprints OAI-archive-creating software -- the first and
most widely used of the OAI archive-creating softwares -- written for the
sake of institutional digital asset management (although it can certainly
be used for that purpose too). It was written for the sake of
institutional Open Access self-archiving. And it was with respect to
that objective that I told the student that all the softwares he listed
were equivalent, and that what really mattered was the institution's
adopting an effective policy for the self-archiving of all of its authors'
journal article, so as to provide Open Access to it.

http://www.arl.org/sparc/pubs/enews/aug01.html#6

I would add only -- though it is but a hypothesis -- that an institutional
self-archiving policy that successfully generates Open Access to 100% of
institutional journal article output is probably the single most important
step an institution can take toward an eventual successful Institutional
Digital Asset Management policy too, but I make no strong claims about
this, as it is not my area of expertise, experience or interest.

http://software.eprints.org/handbook/departments.php

So, to repeat, although any of the OAI archive-creating softwares can
indeed also be used for Institutional Digital Asset Management too, it
is not their functional equivalence with respect to that application on
which I was commenting, particularly, but their functional equivalence
with respect to institutional Open Access content-provision, which is
the theme of this Forum, and the goal of the Open Access Initiative.

> All deposited papers have the same metadata tags? Your definition of an
> eprint is not up to speed. The Open Archives site FAQ reminds us that
> "the metadata harvesting protocol supports the notion of multiple
> metadata sets, allowing communities to expose metadata in formats that
> are specific to their applications and domains. The technical framework
> places no limitations on the nature of such parallel sets, other than
> that the metadata records be structured as XML data, which have a
> corresponding XML schema for validation."
>
> http://www.openarchives.org/documents/FAQ.html

The Open *Archives* In

Re: Eprints, Dspace, or Espace?

2004-10-20 Thread Philip Hunter
Stevan, you wrote:

> All the main OAI-compliant archive-creating softwares are functionally
> equivalent, because after all, what they do is quite simple: They make
> sure that all deposited papers have the same metadata tags, the obvious
> ones: author-name, article-title, date, journal-name, etc., so that they
> are interoperable as well as harvestable by OAI service providers:

The focus of each of the OAI-compliant archive-creating softwares is
different, as you acknowledge, since some are designed to archive digital
objects in general, not just eprints. The functionality of the different
softwares differs on this account, and therefore there is a choice to
be made between softwares.

All deposited papers have the same metadata tags? Your definition of an
eprint is not up to speed. The Open Archives site FAQ reminds us that
"the metadata harvesting protocol supports the notion of multiple
metadata sets, allowing communities to expose metadata in formats that
are specific to their applications and domains. The technical framework
places no limitations on the nature of such parallel sets, other than
that the metadata records be structured as XML data, which have a
corresponding XML schema for validation."

http://www.openarchives.org/documents/FAQ.html

> With DSpace (and SPARC) grew the "institutional repository" movement, and
> many more archive softwares, most of which have only loose ties with the
> OA movement, and are really intended for the showcasing and management
> of all of a university's digital holdings, not only, or especially,
> research journal articles and OA. As a consequence, "institutional
> repositories" (IRs) are (slowly) filling today with all kinds of material,
> very little of it being OA articles! And IRs tend to be focused more on
> the preservation and curation of university digital holdings than on
> providing immediate OA to all university research output so as to maximise its
> research impact, which is what OA is for.

Well perhaps the range of available softwares reflects what the user
community actually wants. Always a valid point to consider. :-)

Philip


Philip Hunter, UKOLN Research Officer.
UKOLN, University of Bath, Bath, BA2 7AY
Tel: +44 (0) 1225 323 668  Fax: +44 (0) 1225 826838
Email: p.j.hun...@ukoln.ac.uk  UKOLN: http://www.ukoln.ac.uk/
http://www.rdn.ac.uk/projects/eprints-uk/