Re: New Ranking of Central and Institutional Repositories

2008-02-12 Thread Isidro F . Aguillo
Dear all:

Thank you for the useful input. I agree with Steve about the importance of
backlinks, specially if we are able to convince people of doing "deep
linking". We are going to apply some of the advices in a new edition (a
second beta will be ready before the schedukled date of July), but we need
advice about methodological issues: For example it is not possible to
extract information filteres by domain from OAIster and the same applies to
other services. For visits or downloads there is no single source (alexa is,
of course, discarded).

Best regards,

Steve Hitchcock escribió:
> I agree with most of Arthur's points, especially with regard to activity
> and download measures, but I'm puzzled by his comments about link-based
> visibility. He may be criticising the method of calculation or its use in
> the overall factoring, but in principle links seem a relevant measure for
> repositories and one that should be factored in.
> 
> At 23:13 11/02/2008, Arthur Sale wrote:
> >  Isidro
> > 
> > As one of those that contributed to that discussion, may I be more
> > specific?
> > 
> > The impact of a repository should be measured by things other than some
> > of the measures that you use. PageRank and Size are both very weak
> > indicators. I give examples below.
> > 
> > VISIBILITY
> > Visibility in the way you measure is nothing to do with the purpose of
> > repositories, and only a minor factor in their impact. Let me give
> > examples:
> 
> The emphasis here is curious. Repositories don't seek links? If links-in
> are a minor factor now, perhaps not in the future. Links will grow with
> more and better content, but this effect won't be uniform of course. It
> will tell us something about repositories.
> 
> >* Inward links to the repository itself are relatively rare, and
> > probably negligible in the total. Almost no-one really goes to a
> > repository to search its content except locally - its value is in
> > federation. The exceptions are (1) central repositories such as CERN,
> > RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and
> > QUT. The component is hugely biased towards these repositories.
> If the measure highlights exemplary repositories, isn't that what it's
> meant to do, so long as the measure is not predicated on these
> repositories.? It reveals repositories demonstrating the effect being
> sought.
> 
> >* The majority of links to institutional repositories on the Web are
> > probably from depositor's home pages, linking to their research records.
> > In UTas we will gain 600-1000 such links once it is in the standard
> > staff member template. Is this visibility? Or does it measure university
> > size?
> This effect could be eliminated.
> 
> >* In a few cases, viewers may link to a paper. However to do this
> > they have to value the paper significantly, then copy the URL, and then
> > post it to a public website or blog. I expect this is a minority in the
> > total of links. Any data otherwise? In any case it is dependent on an
> > author's importance in the field, not the repository value.
> I guess some papers on Webmetrics could tell us something about this
> distinction between what have been called formal and informal links, e.g.
> I came across this recently:
> 
> Kousha, K. and Thelwall, M. (2007) The Web impact of open access social
> science research
> http://dx.doi.org/10.1016/j.lisr.2007.05.003
> preprint
> http://www.scit.wlv.ac.uk/~cm1993/papers/OpenAccessSocialSciencePreprint.d
> oc 
> 
> Blogs are a growing element of scholarly discourse and are a valid effect.
> If these links are not pointing towards repositories then it's the content
> problem again. and the content isn't always finding its way into IRs even
> when it is OA, e.g. above.
> 
> Link-based visibility should be a factor in evaluating repositories.
> 
> Steve Hitchcock
> IAM Group, School of Electronics and Computer Science
> University of Southampton, SO17 1BJ, UK
> Email: sh...@ecs.soton.ac.uk
> 
> > REAL VISIBILITY
> > Real visibility in the case of a repository consists in (a) whether it
> > provides a compliant OAI-PMH interface, and (b) whether that interface
> > is harvested by federated services, such as ROAR, OAIster, etc. One
> > might also add whether the repository is actively harvested as a flat
> > file or via OAI by Google and Google Scholar, Scopus, or Thomson.
> > Noithing else really matters in respect of visibility. All these are
> > measurable. PageRank is irrelevant, sorry.
> > 
> > SIZE
> > Size is a terrible measure. Australia is full of examples where the
> > repository has been populated by uploading zillions of old stub records
> > going back to the 1930s or before. The full text is mostly missing,
> > though sometimes a grant has funded image scanning of the document. This
> > is fullness for the sake of fullness. To give one example in your list,
> > the Australasian Digital Thesis Program has 110,000 records of this type
> > of old PhD theses. T

Harvard Faculty Vote on Open Access Self-Archiving Mandate Today

2008-02-12 Thread Stevan Harnad
 Fully Hyperlinked Version of this Posting:
 http://openaccess.eprints.org/index.php?/archives/361-guid.html

Optimizing Harvard's Proposed Open Access Self-Archiving Mandate

Harvard faculty are voting today on an Open Access (OA) Self-Archiving
Mandate Proposal.
http://www.thecrimson.com/article.aspx?ref=521835

The Harvard proposal is to try the copyright-retention strategy: Retain
copyright so faculty can (among other things) deposit their writings in
Harvard's OA Institutional Repository.

Let me try to say why I think this is the wrong strategy, whereas
something not so different from it would not only have much greater
probability of success, but would serve as a model that would generalize
much more readily to the worldwide academic community.

(1) Articles vs. Books. The objective is to make peer-reviewed research
journal articles OA. That is OA's primary target content. The policy has
to make a clear distinction between journal articles and books,
otherwise it is doomed to fuzziness and failure. The time is ripe for
making journal articles -- which are all, without exception, author
give-aways, written only for scholarly usage and impact, not for sales
royalty income -- Open Access, but it is not yet ripe for books in
general (although there are already some exceptions, ready to do the
same). Hence it would be a great and gratuitous handicap to try to apply
OA policy today in a blanket way to articles and books alike, covering
exceptions with an "opt-out" option instead of directly targeting the
exception-free journal article literature exclusively.

(2) Unrefereed Preprints vs. Peer-Reviewed Postprints. Again, the
objective is to make published, peer-reviewed research journal articles
("postprints") OA. Papers are only peer-reviewed after they have been
submitted, refereed, revised, and accepted for publication. Yet
Harvard's proposed copyright retention policy targets the draft that has
not yet been accepted for publication (the "preprint"): That means the
unrefereed raw manuscript. Not only does this risk enshrining
unrefereed, unpublished results in Harvard's OA IR, but it risks missing
OA's target altogether, which is refereed postprints, not unrefereed
preprints.

(3) Copyright Retention is Unnecessary for OA and Needlessly Handicaps
Both the Probability of Adoption of the Policy and the Probability of
Success If Adopted. There is no need to require retention of copyright
in order to provide OA. 62% of journals already officially endorse
authors making their postprints OA immediately upon acceptance for
publication by depositing them in their Institutional Repository, and a
further 30% already endorse making preprints OA. That already covers 92%
of Harvard's intended target. For the remaining 8% (and indeed for 38%,
because OA's primary target is postprints, not just preprints), they too
can be deposited immediately upon acceptance for publication, with
access set as "Closed Access" instead of Open Access. To provide for
worldwide research usage needs for such embargoed papers, both the
EPrints and the DSpace IR software now have an "email eprint request"
button that allows any would-be user who reaches a Closed Access
postprint to paste in his email address and click, which sends an
immediate email to the author, containing URL on which the author need
merely click to have an eprint automatically emailed to the requester.
(Mailing article reprints to requesters has been standard academic
practice for decades and is merely made more powerful and effective with
the help of email, an IR, and the semi-automatic button; it likewise
does not require permission or copyright retention.)

This means that it is already possible to adopt a universal,
exception-free mandate to deposit all postprints immediately upon
acceptance for publication, without the author's having to decide
whether or not to deposit the unrefereed preprint and whether or not to
retain copyright (hence whether or not to opt out).

This blanket mandate provides immediate OA to at least 62% of OA's
target content, and almost-immediate, almost-OA to the rest. This not
only provides for all immediate usage needs for 100% of research output,
worldwide, but it will soon usher in the natural and well-deserved death
of the remaining minority of access embargoes under the growing global
pressure from OA's and almost-OA's increasingly palpable benefits to
research and researchers. (With it will come copyright retention too, as
a matter of course.) It is also a policy with no legal problems and no
author risk.

Needlessly requiring authors instead to deposit their unrefereed
preprints and to commit themselves to retaining copyright today puts
both the consensus for adoption and, if adopted, the efficacy of the
Harvard policy itself at risk, because of author resistance either to
exposing unrefereed work publicly or to putting their work's acceptance
and publication by their journal of choice at risk. It also opens up an
opt-out loo

Re: Harvard Faculty Vote on Open Access Self-Archiving Mandate Today

2008-02-12 Thread Jean-Claude Gu�don
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "iso-8859-1" character set.  ]
[ Some characters may be displayed incorrectly. ]

Let me comment briefly on this.

1. The issue of books regularly recurs. Let us remember that we are
talking about research results, especially those financed by public
funds, and let us remember that, in the humanities and social
sciences, books remain the primary research currency and let us
finally remember that in many countries, the publication of research
monographs is subsidized ($1.5M/yr in Canada, for example). As a
result, they belong to the OA debate.

2. The issue of the proportion of journals that allow some form of
deposit: the figure given by Stevan deals with the percentage of
Romeo surveyed journals, not with the total number of peer-reviewed
journals in the world - we do not even know this number accurately.
It also seems that many SSH journals, perhaps because of the
fragility of many of their publishers , have more restrictive
attitudes than large commercial publishers.

3. As Peter Suber and I have commented in the past, permissions to
deposit are informal agreements, not formal contract. They could be
rescinded given the right circumstances. Acting on the hypothesis
that this will never happen is not realistic..

On the other hand, the Harvard debate will have a deep educational
impact on the SSH faculty. It will, if passed, bring about a whole
series of similar debates in other universities. In so doing, more
and more faculty members will begin to understand better the
publishing environment within which they are forced to operate. And
it does not threaten the self-archiving mandate in any way.

Finally, while I agree with Stevan that holding the copyright is not
essential (although it can be useful), it is a good way to start
grabbing the attention of many faculty members.

Jean-Claude Guédon



Le mardi 12 février 2008 à 13:42 +, Stevan Harnad a écrit :

 ** Apologies for Cross-Posting **

  Fully Hyperlinked Version of this Posting:
  http://openaccess.eprints.org/index.php?/archives/361-guid.html

Optimizing Harvard's Proposed Open Access Self-Archiving Mandate

Harvard faculty are voting today on an Open Access (OA) Self-Archiving
Mandate Proposal.
http://www.thecrimson.com/article.aspx?ref=521835

The Harvard proposal is to try the copyright-retention strategy: Retain
copyright so faculty can (among other things) deposit their writings in
Harvard's OA Institutional Repository.

Let me try to say why I think this is the wrong strategy, whereas
something not so different from it would not only have much greater
probability of success, but would serve as a model that would generalize
much more readily to the worldwide academic community.

(1) Articles vs. Books. The objective is to make peer-reviewed research
journal articles OA. That is OA's primary target content. The policy has
to make a clear distinction between journal articles and books,
otherwise it is doomed to fuzziness and failure. The time is ripe for
making journal articles -- which are all, without exception, author
give-aways, written only for scholarly usage and impact, not for sales
royalty income -- Open Access, but it is not yet ripe for books in
general (although there are already some exceptions, ready to do the
same). Hence it would be a great and gratuitous handicap to try to apply
OA policy today in a blanket way to articles and books alike, covering
exceptions with an "opt-out" option instead of directly targeting the
exception-free journal article literature exclusively.

(2) Unrefereed Preprints vs. Peer-Reviewed Postprints. Again, the
objective is to make published, peer-reviewed research journal articles
("postprints") OA. Papers are only peer-reviewed after they have been
submitted, refereed, revised, and accepted for publication. Yet
Harvard's proposed copyright retention policy targets the draft that has
not yet been accepted for publication (the "preprint"): That means the
unrefereed raw manuscript. Not only does this risk enshrining
unrefereed, unpublished results in Harvard's OA IR, but it risks missing
OA's target altogether, which is refereed postprints, not unrefereed
preprints.

(3) Copyright Retention is Unnecessary for OA and Needlessly Handicaps
Both the Probability of Adoption of the Policy and the Probability of
Success If Adopted. There is no need to require retention of copyright
in order to provide OA. 62% of journals already officially endorse
authors making their postprints OA immediately upon acceptance for
publication by depositing them in their Institutional Repository, and a
further 30% already endorse making preprints OA. That already covers 92%
of Harvard's intended target. For the remaining 8% (and indeed for 38%,
because OA's primary target is postprints, not just preprints), they too
can be deposited immediately upon acceptance for publication, with
access set as "Closed Access" ins

Re: New Ranking of Central and Institutional Repositories

2008-02-12 Thread Steve Hitchcock
I agree with most of Arthur's points, especially with regard to activity and
download measures, but I'm puzzled by his comments about link-based
visibility. He may be criticising the method of calculation or its use in
the overall factoring, but in principle links seem a relevant measure for
repositories and one that should be factored in.

At 23:13 11/02/2008, Arthur Sale wrote:
>  Isidro
> 
> As one of those that contributed to that discussion, may I be more
> specific?
> 
> The impact of a repository should be measured by things other than some of
> the measures that you use. PageRank and Size are both very weak
> indicators. I give examples below.
> 
> VISIBILITY
> Visibility in the way you measure is nothing to do with the purpose of
> repositories, and only a minor factor in their impact. Let me give
> examples:

The emphasis here is curious. Repositories don't seek links? If links-in are
a minor factor now, perhaps not in the future. Links will grow with more and
better content, but this effect won't be uniform of course. It will tell us
something about repositories.

>* Inward links to the repository itself are relatively rare, and
> probably negligible in the total. Almost no-one really goes to a
> repository to search its content except locally - its value is in
> federation. The exceptions are (1) central repositories such as CERN,
> RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and
> QUT. The component is hugely biased towards these repositories.
If the measure highlights exemplary repositories, isn't that what it's meant
to do, so long as the measure is not predicated on these repositories.? It
reveals repositories demonstrating the effect being sought.

>* The majority of links to institutional repositories on the Web are
> probably from depositor's home pages, linking to their research records.
> In UTas we will gain 600-1000 such links once it is in the standard staff
> member template. Is this visibility? Or does it measure university size?
This effect could be eliminated.

>* In a few cases, viewers may link to a paper. However to do this they
> have to value the paper significantly, then copy the URL, and then post it
> to a public website or blog. I expect this is a minority in the total of
> links. Any data otherwise? In any case it is dependent on an author's
> importance in the field, not the repository value.
I guess some papers on Webmetrics could tell us something about this
distinction between what have been called formal and informal links, e.g. I
came across this recently:

Kousha, K. and Thelwall, M. (2007) The Web impact of open access social
science research
http://dx.doi.org/10.1016/j.lisr.2007.05.003
preprint
http://www.scit.wlv.ac.uk/~cm1993/papers/OpenAccessSocialSciencePreprint.doc

Blogs are a growing element of scholarly discourse and are a valid effect.
If these links are not pointing towards repositories then it's the content
problem again. and the content isn't always finding its way into IRs even
when it is OA, e.g. above.

Link-based visibility should be a factor in evaluating repositories.

Steve Hitchcock
IAM Group, School of Electronics and Computer Science
University of Southampton, SO17 1BJ, UK
Email: sh...@ecs.soton.ac.uk

> REAL VISIBILITY
> Real visibility in the case of a repository consists in (a) whether it
> provides a compliant OAI-PMH interface, and (b) whether that interface is
> harvested by federated services, such as ROAR, OAIster, etc. One might
> also add whether the repository is actively harvested as a flat file or
> via OAI by Google and Google Scholar, Scopus, or Thomson. Noithing else
> really matters in respect of visibility. All these are measurable.
> PageRank is irrelevant, sorry.
> 
> SIZE
> Size is a terrible measure. Australia is full of examples where the
> repository has been populated by uploading zillions of old stub records
> going back to the 1930s or before. The full text is mostly missing, though
> sometimes a grant has funded image scanning of the document. This is
> fullness for the sake of fullness. To give one example in your list, the
> Australasian Digital Thesis Program has 110,000 records of this type of
> old PhD theses. The full-text simply says: contact the university for a
> photocopy. That's OK, but the weighting of size ought to be low - less
> than 20%.
> 
> If it is necessary to measure size, and it probably is, then I suggest a
> measure that counts the number of records with a publication date within
> the last five years. Choose 10 years if you want, but ancient
> record-keeping does not translate into impact.
> 
> ACTIVITY
> It is quite clear from ROAR that deposit activity is a major measure of
> impact. There are three easy measures to derive.
>* The number of acquisitions in the last 12 months. Easily discovered
> from the OAI interface.
>* The number of acquisitions with a publication date in the last 12
> months. Easily discovered from the OAI interface. Th

Re: New Ranking of Central and Institutional Repositories

2008-02-12 Thread Arthur Sale
 Isidro

As one of those that contributed to that discussion, may I be more
specific?

The impact of a repository should be measured by things other than
some of the measures that you use. PageRank and Size are both very
weak indicators. I give examples below.

VISIBILITY
Visibility in the way you measure is nothing to do with the purpose
of repositories, and only a minor factor in their impact. Let me give
examples:
 *  Inward links to the repository itself are relatively rare, and
probably negligible in the total. Almost no-one really goes to a
repository to search its content except locally - its value is in
federation. The exceptions are (1) central repositories such as
CERN, RepEc, ArXiv, etc, and (2) exemplar repositories such as
Southampton and QUT. The component is hugely biased towards these
repositories.
 *  The majority of links to institutional repositories on the Web
are probably from depositor's home pages, linking to their
research records. In UTas we will gain 600-1000 such links once
it is in the standard staff member template. Is this visibility?
Or does it measure university size?
 *  In a few cases, viewers may link to a paper. However to do this
they have to value the paper significantly, then copy the URL,
and then post it to a public website or blog. I expect this is a
minority in the total of links. Any data otherwise? In any case
it is dependent on an author's importance in the field, not the
repository value.


REAL VISIBILITY
Real visibility in the case of a repository consists in (a) whether
it provides a compliant OAI-PMH interface, and (b) whether that
interface is harvested by federated services, such as ROAR, OAIster,
etc. One might also add whether the repository is actively harvested
as a flat file or via OAI by Google and Google Scholar, Scopus, or
Thomson. Noithing else really matters in respect of visibility. All
these are measurable. PageRank is irrelevant, sorry.

SIZE
Size is a terrible measure. Australia is full of examples where the
repository has been populated by uploading zillions of old stub
records going back to the 1930s or before. The full text is mostly
missing, though sometimes a grant has funded image scanning of the
document. This is fullness for the sake of fullness. To give one
example in your list, the Australasian Digital Thesis Program has
110,000 records of this type of old PhD theses. The full-text simply
says: contact the university for a photocopy. That's OK, but the
weighting of size ought to be low - less than 20%.

If it is necessary to measure size, and it probably is, then I
suggest a measure that counts the number of records with a
publication date within the last five years. Choose 10 years if you
want, but ancient record-keeping does not translate into impact.

ACTIVITY
It is quite clear from ROAR that deposit activity is a major measure
of impact. There are three easy measures to derive.
 *  The number of acquisitions in the last 12 months. Easily
discovered from the OAI interface.
The number of acquisitions with a publication date in the last 12
months. Easily discovered from the OAI interface. This measures
currency as well as activity.
 *  Some repositories are sporadic, some are continuous, the latter
reflecting a deep-seated integration within the university's
activity. A simple measure would be to derive a statistic from
the traffic (see ROAR), such as
 +  number of days in last 12 months with a deposit event
 +  the Fourier spectrum of the last 12 months deposit events
having no component with a period longer than 7 days above
10% (I guess at what is significant and perhaps this can be
turned into a score).

RICH TEXT
This is a reasonable measure, though subject to error. For example we
sometimes put a full-text that gives instructions on how to ask for
access to the item concerned, or a bio of the creator of an artwork.


DOWNLOADS
I'd love to promote downloads as a measure of impact, but there is as
yet no federated way to access this data.

I'm happy to continue this dialogue.

Arthur Sale
Professor of Computer Science
University of Tasmania

> -Original Message-
> From: American Scientist Open Access Forum
> [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER.SIGMAX
> I.ORG] On Behalf Of Isidro F. Aguillo
> Sent: Monday, 11 February 2008 6:53 PM
> To: american-scientist-open-access-fo...@listserver.sigmaxi.org
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] New
> Ranking of Central and Institutional Repositories
>
> Dear all:
>
> Thanks for your interest in the Ranking of repositories, part
> of our larger effort for rnaking webpresence of universities
> and research centers. A few comments to your messages:
>
> - Currently the Ranking of repositories is a beta version. We
> will thank comments, suggestions and criticisms. Information
> about missed repositories are warmly welcomed. After f