Re: New Ranking of Central and Institutional Repositories
Dear all: Thank you for the useful input. I agree with Steve about the importance of backlinks, specially if we are able to convince people of doing "deep linking". We are going to apply some of the advices in a new edition (a second beta will be ready before the schedukled date of July), but we need advice about methodological issues: For example it is not possible to extract information filteres by domain from OAIster and the same applies to other services. For visits or downloads there is no single source (alexa is, of course, discarded). Best regards, Steve Hitchcock escribió: > I agree with most of Arthur's points, especially with regard to activity > and download measures, but I'm puzzled by his comments about link-based > visibility. He may be criticising the method of calculation or its use in > the overall factoring, but in principle links seem a relevant measure for > repositories and one that should be factored in. > > At 23:13 11/02/2008, Arthur Sale wrote: > > Isidro > > > > As one of those that contributed to that discussion, may I be more > > specific? > > > > The impact of a repository should be measured by things other than some > > of the measures that you use. PageRank and Size are both very weak > > indicators. I give examples below. > > > > VISIBILITY > > Visibility in the way you measure is nothing to do with the purpose of > > repositories, and only a minor factor in their impact. Let me give > > examples: > > The emphasis here is curious. Repositories don't seek links? If links-in > are a minor factor now, perhaps not in the future. Links will grow with > more and better content, but this effect won't be uniform of course. It > will tell us something about repositories. > > >* Inward links to the repository itself are relatively rare, and > > probably negligible in the total. Almost no-one really goes to a > > repository to search its content except locally - its value is in > > federation. The exceptions are (1) central repositories such as CERN, > > RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and > > QUT. The component is hugely biased towards these repositories. > If the measure highlights exemplary repositories, isn't that what it's > meant to do, so long as the measure is not predicated on these > repositories.? It reveals repositories demonstrating the effect being > sought. > > >* The majority of links to institutional repositories on the Web are > > probably from depositor's home pages, linking to their research records. > > In UTas we will gain 600-1000 such links once it is in the standard > > staff member template. Is this visibility? Or does it measure university > > size? > This effect could be eliminated. > > >* In a few cases, viewers may link to a paper. However to do this > > they have to value the paper significantly, then copy the URL, and then > > post it to a public website or blog. I expect this is a minority in the > > total of links. Any data otherwise? In any case it is dependent on an > > author's importance in the field, not the repository value. > I guess some papers on Webmetrics could tell us something about this > distinction between what have been called formal and informal links, e.g. > I came across this recently: > > Kousha, K. and Thelwall, M. (2007) The Web impact of open access social > science research > http://dx.doi.org/10.1016/j.lisr.2007.05.003 > preprint > http://www.scit.wlv.ac.uk/~cm1993/papers/OpenAccessSocialSciencePreprint.d > oc > > Blogs are a growing element of scholarly discourse and are a valid effect. > If these links are not pointing towards repositories then it's the content > problem again. and the content isn't always finding its way into IRs even > when it is OA, e.g. above. > > Link-based visibility should be a factor in evaluating repositories. > > Steve Hitchcock > IAM Group, School of Electronics and Computer Science > University of Southampton, SO17 1BJ, UK > Email: sh...@ecs.soton.ac.uk > > > REAL VISIBILITY > > Real visibility in the case of a repository consists in (a) whether it > > provides a compliant OAI-PMH interface, and (b) whether that interface > > is harvested by federated services, such as ROAR, OAIster, etc. One > > might also add whether the repository is actively harvested as a flat > > file or via OAI by Google and Google Scholar, Scopus, or Thomson. > > Noithing else really matters in respect of visibility. All these are > > measurable. PageRank is irrelevant, sorry. > > > > SIZE > > Size is a terrible measure. Australia is full of examples where the > > repository has been populated by uploading zillions of old stub records > > going back to the 1930s or before. The full text is mostly missing, > > though sometimes a grant has funded image scanning of the document. This > > is fullness for the sake of fullness. To give one example in your list, > > the Australasian Digital Thesis Program has 110,000 records of this type > > of old PhD theses. T
Harvard Faculty Vote on Open Access Self-Archiving Mandate Today
Fully Hyperlinked Version of this Posting: http://openaccess.eprints.org/index.php?/archives/361-guid.html Optimizing Harvard's Proposed Open Access Self-Archiving Mandate Harvard faculty are voting today on an Open Access (OA) Self-Archiving Mandate Proposal. http://www.thecrimson.com/article.aspx?ref=521835 The Harvard proposal is to try the copyright-retention strategy: Retain copyright so faculty can (among other things) deposit their writings in Harvard's OA Institutional Repository. Let me try to say why I think this is the wrong strategy, whereas something not so different from it would not only have much greater probability of success, but would serve as a model that would generalize much more readily to the worldwide academic community. (1) Articles vs. Books. The objective is to make peer-reviewed research journal articles OA. That is OA's primary target content. The policy has to make a clear distinction between journal articles and books, otherwise it is doomed to fuzziness and failure. The time is ripe for making journal articles -- which are all, without exception, author give-aways, written only for scholarly usage and impact, not for sales royalty income -- Open Access, but it is not yet ripe for books in general (although there are already some exceptions, ready to do the same). Hence it would be a great and gratuitous handicap to try to apply OA policy today in a blanket way to articles and books alike, covering exceptions with an "opt-out" option instead of directly targeting the exception-free journal article literature exclusively. (2) Unrefereed Preprints vs. Peer-Reviewed Postprints. Again, the objective is to make published, peer-reviewed research journal articles ("postprints") OA. Papers are only peer-reviewed after they have been submitted, refereed, revised, and accepted for publication. Yet Harvard's proposed copyright retention policy targets the draft that has not yet been accepted for publication (the "preprint"): That means the unrefereed raw manuscript. Not only does this risk enshrining unrefereed, unpublished results in Harvard's OA IR, but it risks missing OA's target altogether, which is refereed postprints, not unrefereed preprints. (3) Copyright Retention is Unnecessary for OA and Needlessly Handicaps Both the Probability of Adoption of the Policy and the Probability of Success If Adopted. There is no need to require retention of copyright in order to provide OA. 62% of journals already officially endorse authors making their postprints OA immediately upon acceptance for publication by depositing them in their Institutional Repository, and a further 30% already endorse making preprints OA. That already covers 92% of Harvard's intended target. For the remaining 8% (and indeed for 38%, because OA's primary target is postprints, not just preprints), they too can be deposited immediately upon acceptance for publication, with access set as "Closed Access" instead of Open Access. To provide for worldwide research usage needs for such embargoed papers, both the EPrints and the DSpace IR software now have an "email eprint request" button that allows any would-be user who reaches a Closed Access postprint to paste in his email address and click, which sends an immediate email to the author, containing URL on which the author need merely click to have an eprint automatically emailed to the requester. (Mailing article reprints to requesters has been standard academic practice for decades and is merely made more powerful and effective with the help of email, an IR, and the semi-automatic button; it likewise does not require permission or copyright retention.) This means that it is already possible to adopt a universal, exception-free mandate to deposit all postprints immediately upon acceptance for publication, without the author's having to decide whether or not to deposit the unrefereed preprint and whether or not to retain copyright (hence whether or not to opt out). This blanket mandate provides immediate OA to at least 62% of OA's target content, and almost-immediate, almost-OA to the rest. This not only provides for all immediate usage needs for 100% of research output, worldwide, but it will soon usher in the natural and well-deserved death of the remaining minority of access embargoes under the growing global pressure from OA's and almost-OA's increasingly palpable benefits to research and researchers. (With it will come copyright retention too, as a matter of course.) It is also a policy with no legal problems and no author risk. Needlessly requiring authors instead to deposit their unrefereed preprints and to commit themselves to retaining copyright today puts both the consensus for adoption and, if adopted, the efficacy of the Harvard policy itself at risk, because of author resistance either to exposing unrefereed work publicly or to putting their work's acceptance and publication by their journal of choice at risk. It also opens up an opt-out loo
Re: Harvard Faculty Vote on Open Access Self-Archiving Mandate Today
[ The following text is in the "utf-8" character set. ] [ Your display is set for the "iso-8859-1" character set. ] [ Some characters may be displayed incorrectly. ] Let me comment briefly on this. 1. The issue of books regularly recurs. Let us remember that we are talking about research results, especially those financed by public funds, and let us remember that, in the humanities and social sciences, books remain the primary research currency and let us finally remember that in many countries, the publication of research monographs is subsidized ($1.5M/yr in Canada, for example). As a result, they belong to the OA debate. 2. The issue of the proportion of journals that allow some form of deposit: the figure given by Stevan deals with the percentage of Romeo surveyed journals, not with the total number of peer-reviewed journals in the world - we do not even know this number accurately. It also seems that many SSH journals, perhaps because of the fragility of many of their publishers , have more restrictive attitudes than large commercial publishers. 3. As Peter Suber and I have commented in the past, permissions to deposit are informal agreements, not formal contract. They could be rescinded given the right circumstances. Acting on the hypothesis that this will never happen is not realistic.. On the other hand, the Harvard debate will have a deep educational impact on the SSH faculty. It will, if passed, bring about a whole series of similar debates in other universities. In so doing, more and more faculty members will begin to understand better the publishing environment within which they are forced to operate. And it does not threaten the self-archiving mandate in any way. Finally, while I agree with Stevan that holding the copyright is not essential (although it can be useful), it is a good way to start grabbing the attention of many faculty members. Jean-Claude Guédon Le mardi 12 février 2008 à 13:42 +, Stevan Harnad a écrit : ** Apologies for Cross-Posting ** Fully Hyperlinked Version of this Posting: http://openaccess.eprints.org/index.php?/archives/361-guid.html Optimizing Harvard's Proposed Open Access Self-Archiving Mandate Harvard faculty are voting today on an Open Access (OA) Self-Archiving Mandate Proposal. http://www.thecrimson.com/article.aspx?ref=521835 The Harvard proposal is to try the copyright-retention strategy: Retain copyright so faculty can (among other things) deposit their writings in Harvard's OA Institutional Repository. Let me try to say why I think this is the wrong strategy, whereas something not so different from it would not only have much greater probability of success, but would serve as a model that would generalize much more readily to the worldwide academic community. (1) Articles vs. Books. The objective is to make peer-reviewed research journal articles OA. That is OA's primary target content. The policy has to make a clear distinction between journal articles and books, otherwise it is doomed to fuzziness and failure. The time is ripe for making journal articles -- which are all, without exception, author give-aways, written only for scholarly usage and impact, not for sales royalty income -- Open Access, but it is not yet ripe for books in general (although there are already some exceptions, ready to do the same). Hence it would be a great and gratuitous handicap to try to apply OA policy today in a blanket way to articles and books alike, covering exceptions with an "opt-out" option instead of directly targeting the exception-free journal article literature exclusively. (2) Unrefereed Preprints vs. Peer-Reviewed Postprints. Again, the objective is to make published, peer-reviewed research journal articles ("postprints") OA. Papers are only peer-reviewed after they have been submitted, refereed, revised, and accepted for publication. Yet Harvard's proposed copyright retention policy targets the draft that has not yet been accepted for publication (the "preprint"): That means the unrefereed raw manuscript. Not only does this risk enshrining unrefereed, unpublished results in Harvard's OA IR, but it risks missing OA's target altogether, which is refereed postprints, not unrefereed preprints. (3) Copyright Retention is Unnecessary for OA and Needlessly Handicaps Both the Probability of Adoption of the Policy and the Probability of Success If Adopted. There is no need to require retention of copyright in order to provide OA. 62% of journals already officially endorse authors making their postprints OA immediately upon acceptance for publication by depositing them in their Institutional Repository, and a further 30% already endorse making preprints OA. That already covers 92% of Harvard's intended target. For the remaining 8% (and indeed for 38%, because OA's primary target is postprints, not just preprints), they too can be deposited immediately upon acceptance for publication, with access set as "Closed Access" ins
Re: New Ranking of Central and Institutional Repositories
I agree with most of Arthur's points, especially with regard to activity and download measures, but I'm puzzled by his comments about link-based visibility. He may be criticising the method of calculation or its use in the overall factoring, but in principle links seem a relevant measure for repositories and one that should be factored in. At 23:13 11/02/2008, Arthur Sale wrote: > Isidro > > As one of those that contributed to that discussion, may I be more > specific? > > The impact of a repository should be measured by things other than some of > the measures that you use. PageRank and Size are both very weak > indicators. I give examples below. > > VISIBILITY > Visibility in the way you measure is nothing to do with the purpose of > repositories, and only a minor factor in their impact. Let me give > examples: The emphasis here is curious. Repositories don't seek links? If links-in are a minor factor now, perhaps not in the future. Links will grow with more and better content, but this effect won't be uniform of course. It will tell us something about repositories. >* Inward links to the repository itself are relatively rare, and > probably negligible in the total. Almost no-one really goes to a > repository to search its content except locally - its value is in > federation. The exceptions are (1) central repositories such as CERN, > RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and > QUT. The component is hugely biased towards these repositories. If the measure highlights exemplary repositories, isn't that what it's meant to do, so long as the measure is not predicated on these repositories.? It reveals repositories demonstrating the effect being sought. >* The majority of links to institutional repositories on the Web are > probably from depositor's home pages, linking to their research records. > In UTas we will gain 600-1000 such links once it is in the standard staff > member template. Is this visibility? Or does it measure university size? This effect could be eliminated. >* In a few cases, viewers may link to a paper. However to do this they > have to value the paper significantly, then copy the URL, and then post it > to a public website or blog. I expect this is a minority in the total of > links. Any data otherwise? In any case it is dependent on an author's > importance in the field, not the repository value. I guess some papers on Webmetrics could tell us something about this distinction between what have been called formal and informal links, e.g. I came across this recently: Kousha, K. and Thelwall, M. (2007) The Web impact of open access social science research http://dx.doi.org/10.1016/j.lisr.2007.05.003 preprint http://www.scit.wlv.ac.uk/~cm1993/papers/OpenAccessSocialSciencePreprint.doc Blogs are a growing element of scholarly discourse and are a valid effect. If these links are not pointing towards repositories then it's the content problem again. and the content isn't always finding its way into IRs even when it is OA, e.g. above. Link-based visibility should be a factor in evaluating repositories. Steve Hitchcock IAM Group, School of Electronics and Computer Science University of Southampton, SO17 1BJ, UK Email: sh...@ecs.soton.ac.uk > REAL VISIBILITY > Real visibility in the case of a repository consists in (a) whether it > provides a compliant OAI-PMH interface, and (b) whether that interface is > harvested by federated services, such as ROAR, OAIster, etc. One might > also add whether the repository is actively harvested as a flat file or > via OAI by Google and Google Scholar, Scopus, or Thomson. Noithing else > really matters in respect of visibility. All these are measurable. > PageRank is irrelevant, sorry. > > SIZE > Size is a terrible measure. Australia is full of examples where the > repository has been populated by uploading zillions of old stub records > going back to the 1930s or before. The full text is mostly missing, though > sometimes a grant has funded image scanning of the document. This is > fullness for the sake of fullness. To give one example in your list, the > Australasian Digital Thesis Program has 110,000 records of this type of > old PhD theses. The full-text simply says: contact the university for a > photocopy. That's OK, but the weighting of size ought to be low - less > than 20%. > > If it is necessary to measure size, and it probably is, then I suggest a > measure that counts the number of records with a publication date within > the last five years. Choose 10 years if you want, but ancient > record-keeping does not translate into impact. > > ACTIVITY > It is quite clear from ROAR that deposit activity is a major measure of > impact. There are three easy measures to derive. >* The number of acquisitions in the last 12 months. Easily discovered > from the OAI interface. >* The number of acquisitions with a publication date in the last 12 > months. Easily discovered from the OAI interface. Th
Re: New Ranking of Central and Institutional Repositories
Isidro As one of those that contributed to that discussion, may I be more specific? The impact of a repository should be measured by things other than some of the measures that you use. PageRank and Size are both very weak indicators. I give examples below. VISIBILITY Visibility in the way you measure is nothing to do with the purpose of repositories, and only a minor factor in their impact. Let me give examples: * Inward links to the repository itself are relatively rare, and probably negligible in the total. Almost no-one really goes to a repository to search its content except locally - its value is in federation. The exceptions are (1) central repositories such as CERN, RepEc, ArXiv, etc, and (2) exemplar repositories such as Southampton and QUT. The component is hugely biased towards these repositories. * The majority of links to institutional repositories on the Web are probably from depositor's home pages, linking to their research records. In UTas we will gain 600-1000 such links once it is in the standard staff member template. Is this visibility? Or does it measure university size? * In a few cases, viewers may link to a paper. However to do this they have to value the paper significantly, then copy the URL, and then post it to a public website or blog. I expect this is a minority in the total of links. Any data otherwise? In any case it is dependent on an author's importance in the field, not the repository value. REAL VISIBILITY Real visibility in the case of a repository consists in (a) whether it provides a compliant OAI-PMH interface, and (b) whether that interface is harvested by federated services, such as ROAR, OAIster, etc. One might also add whether the repository is actively harvested as a flat file or via OAI by Google and Google Scholar, Scopus, or Thomson. Noithing else really matters in respect of visibility. All these are measurable. PageRank is irrelevant, sorry. SIZE Size is a terrible measure. Australia is full of examples where the repository has been populated by uploading zillions of old stub records going back to the 1930s or before. The full text is mostly missing, though sometimes a grant has funded image scanning of the document. This is fullness for the sake of fullness. To give one example in your list, the Australasian Digital Thesis Program has 110,000 records of this type of old PhD theses. The full-text simply says: contact the university for a photocopy. That's OK, but the weighting of size ought to be low - less than 20%. If it is necessary to measure size, and it probably is, then I suggest a measure that counts the number of records with a publication date within the last five years. Choose 10 years if you want, but ancient record-keeping does not translate into impact. ACTIVITY It is quite clear from ROAR that deposit activity is a major measure of impact. There are three easy measures to derive. * The number of acquisitions in the last 12 months. Easily discovered from the OAI interface. The number of acquisitions with a publication date in the last 12 months. Easily discovered from the OAI interface. This measures currency as well as activity. * Some repositories are sporadic, some are continuous, the latter reflecting a deep-seated integration within the university's activity. A simple measure would be to derive a statistic from the traffic (see ROAR), such as + number of days in last 12 months with a deposit event + the Fourier spectrum of the last 12 months deposit events having no component with a period longer than 7 days above 10% (I guess at what is significant and perhaps this can be turned into a score). RICH TEXT This is a reasonable measure, though subject to error. For example we sometimes put a full-text that gives instructions on how to ask for access to the item concerned, or a bio of the creator of an artwork. DOWNLOADS I'd love to promote downloads as a measure of impact, but there is as yet no federated way to access this data. I'm happy to continue this dialogue. Arthur Sale Professor of Computer Science University of Tasmania > -Original Message- > From: American Scientist Open Access Forum > [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM@LISTSERVER.SIGMAX > I.ORG] On Behalf Of Isidro F. Aguillo > Sent: Monday, 11 February 2008 6:53 PM > To: american-scientist-open-access-fo...@listserver.sigmaxi.org > Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] New > Ranking of Central and Institutional Repositories > > Dear all: > > Thanks for your interest in the Ranking of repositories, part > of our larger effort for rnaking webpresence of universities > and research centers. A few comments to your messages: > > - Currently the Ranking of repositories is a beta version. We > will thank comments, suggestions and criticisms. Information > about missed repositories are warmly welcomed. After f