Re: Ranking Web of Repositories: July 2010 Edition
-Original Message- From: American Scientist Open Access Forum [mailto:AMERICAN-SCIENTIST- open-access-fo...@listserver.sigmaxi.org] On Behalf Of Leslie Carr Sent: 12 July 2010 13:17 To: american-scientist-open-access-fo...@listserver.sigmaxi.org Subject: Re: Ranking Web of Repositories: July 2010 Edition On 12 Jul 2010, at 06:25, Leslie Chan wrote: Why wait for Microsoft? What has the the open source community be doing on this front? What about OpenOffice? Any good open source NLM DTD conversion tools out there? Why has it taken so long? If there was something for open office then it would be trivial for repositories to apply it to Microsoft Word documents. -- Les Carr Coincidentally, a colleague has just alerted me to the PKP Lemon8 project that does just this conversion, though I'm not sure it has an API. http://pkp.sfu.ca/lemon8 Rob. -- Rob Ingram Technical Developer (RSP) Centre for Research Communications University of Nottingham Greenfield Medical Library A.31 Queen's Medical Centre Nottingham NG7 2UH T: +44 (0) 115 84 68602 F: +44 (0) 115 82 30549 rob.ing...@nottingham.ac.uk http://rsp.ac.uk http://crc.nottingham.ac.uk This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
Re: Ranking Web of Repositories: July 2010 Edition
On 7/11/10 6:49 AM, Leslie Carr l...@ecs.soton.ac.uk wrote: On 10 Jul 2010, at 15:37, Peter Suber wrote: For more detail on rich media or rich files, see the Webometrics page on methodology: Only the number of text files in Acrobat format (.pdf) ... are consideredThis is a bug, not a feature. A more useful ranking would try to count full-text scholarly or peer-reviewed articles regardless of format. I know that's hard to do. But it's a mistake to use any format as a surrogate for that status, and especially a format as flawed as PDF. Even if Webometrics wanted to reward some formats more than others, it should not reward PDF. I think it should. The overwhelming majority of academic papers are distributed online as PDF; the overwhelming majority of things in repositories that are not PDF are not academic papers. This is rather circular. The view that academic papers should be fixed in form and format is rather out of sync with the emergence of new forms of scholarly expression enabled by the web. Here is an interesting commentary in a recent THE: Academics in the humanities and social sciences need to question whether the current narrowly conceived conventions of academic publication are in our best interests. If reality is multifaceted, then writing that responds to it needs to be multifaceted, too. Academics should be encouraged to explore a heterogeneous range of formats, reaching different audiences and finding new ways to write about research. http://www.timeshighereducation.co.uk/story.asp?storyCode=411466sectioncode=26 I think this discussion raises a fundamental question about the design of IRs and their support for scholarship. IRs must do better to capture the diversity of scholarly contribution and formats, and make them count in meaningful way. The format is optimized for print or reading, not for use or reuse. PDFs are slow to load and often not even readable in bandwidth-poor parts of the world. They crash many browsers. They often lack working links; when they do have links, they require users to open in the same window rather than in a separate window, losing the file that took so long to load. Users can't deep-link to subsections. Publishers can lock them to prevent cutting and pasting. Publishers can insert scripts to make them unreadable offline or after a certain time. PDFs impede text processing by users, text mining by software, handicapped access (read-aloud software), and mark-up by third parties. This is an argument about what software/data formats researchers *should* use; affecting their authoring and editorial processes is probably beyond the scope of what we can expect from this league table. This points to the problem with league tables in general. Much like the league tables in the Journal Citation Report with journal ranking, such tables gloss over what are important to different disciplinary needs and authoring processes, and privilege quantitative measures that encourage spurious ranking and comparison. Do we really need more output based comparisons? PubMed Central scores low in the Webometric rankings because it has no PDFs. It does have PDFs - it might ingest articles in XML, but it certainly exports them in PDF. Enquiring of Google (site:www.ncbi.nlm.nih.gov filetype:pdf) shows that it has about 6,690,000 PDFs. So PMC is being penalized by the ranking system because it is dynamic? But PMC is one of the most populated and useful OA repositories in the world. This is something that needs investigating. If I had to guess why it ranks so low, it might be because no-one is linking INTO pubmed; rather they are linking to the original publishers. How should we define the most useful? Should download and other usage stats be taken into consideration, instead of only in-bound links? The format it uses instead of PDF, the NLM DTD coded in XML, is vastly superior to PDF for every scholarly purpose. I haven't had time to code my articles in XML. But since even HTML is superior to PDF for purposes of access and reuse, I self-archive in HTML rather than PDF whenever I can. For the record, I completely agree with you about PDF / HTML / XHTML. If only Microsoft Word (and LaTeX) had decent export facilities that produced good semantic HTML. Why wait for Microsoft? What has the the open source community be doing on this front? What about OpenOffice? Any good open source NLM DTD conversion tools out there? Why has it taken so long? Leslie (Chan) -- Les Carr
Re: Ranking Web of Repositories: July 2010 Edition
Dear all: In fact we have already take into account some of your comments in the last editions of the ranking. Let me explain: - The ranking is based on a ratio 1:1 between ACTIVITY and VISIBILITY, so it is as important as publishing a lot of OA papers doing it in a way others (worldwide) can recover, use and link them. The ratio 1:1 means the weight of each is 50%. As stated in previous messages, Visibility is measured counting the total number of external inlinks. - Regarding activity, we decided to follow your advices so the value is calculated giving more or less the same value to these three variables: * Number of papers, usually full text articles, using as a proxy the number of items from Google Scholar * Number of web pages: ALL the webpages (usually html or similar ones, but also other formats) of the website * Number of documents: A subset of the former, those files in rich format like pdf, ps, doc or ppt. It is probably true that pdf is not the best format and perhaps we should consider other formats, but people are not using other formats. The number of files in OpenOffice formats, XML, or others are negligible, useless for ranking purposes. - PMC. Our policy is not to rank repositories without its own domain or subdomain. There are technical reasons but also visibility ones. The address of PMC is absurdly complex: www.ncbi.nlm.nih.gov/pmc Regarding UK PMC they are included in the ranking but its position is delayed because they do not use suffixes in their file's names. They have hundreds of thousands of Adobe Acrobat (pdf) files without making them as *.pdf. This avoid an efficient filtering by file type by major search engines. Best regards, El 11/07/2010 15:21, Peter Suber escribió: Hi Les:  You're arguing that Webometrics should count PDFs, and I fully agree.  I was only arguing that Webometrics should not *limit* its count to PDFs.  Sorry if I didn't make that clear. BTW, I'd make the analogous case to publishers.  Publish in PDF if you like, but never publish in PDF-only.  If you offer PDF editions, then also offer XML or HTML editions.    Best,    Peter Peter Suber www.bit.ly/suber -- On Sun, Jul 11, 2010 at 6:49 AM, Leslie Carr l...@ecs.soton.ac.uk wrote: On 10 Jul 2010, at 15:37, Peter Suber wrote: For more detail on rich media or rich files, see the Webometrics page on methodology:  Only the number of text files in Acrobat format (.pdf) ... are consideredThis is a bug, not a feature.  A more useful ranking would try to count full-text scholarly or peer-reviewed articles regardless of format.  I know that's hard to do.  But it's a mistake to use any format as a surrogate for that status, and especially a format as flawed as PDF. Even if Webometrics wanted to reward some formats more than others, it should not reward PDF. I think it should. The overwhelming majority of academic papers are distributed online as PDF; the overwhelming majority of things in repositories that are not PDF are not academic papers. The format is optimized for print or reading, not for use or reuse.  PDFs are slow to load and often not even readable in bandwidth-poor parts of the world.  They crash many browsers.  They often lack working links; when they do have links, they require users to open in the same window rather than in a separate window, losing the file that took so long to load.  Users can't deep-link to subsections.  Publishers can lock them to prevent cutting and pasting.  Publishers can insert scripts to make them unreadable offline or after a certain time.  PDFs impede text processing by users, text mining by software, handicapped access (read-aloud software), and mark-up by third parties. This is an argument about what software/data formats researchers *should* use; affecting their authoring and editorial processes is probably beyond the scope of what we can expect from this league table. PubMed Central scores low in the Webometric rankings because it has no PDFs. It does have PDFs - it might ingest articles in XML, but it certainly exports them in PDF. Enquiring of Google (site:www.ncbi.nlm.nih.gov filetype:pdf) shows that it has about 6,690,000 PDFs. But PMC is one of the most populated and useful OA repositories in the world. This is something that needs investigating. If I had to guess why it ranks so low, it might be because no-one is linking INTO pubmed; rather they are linking to the original publishers. The format it uses instead of PDF, the NLM DTD coded in XML, is vastly superior to PDF for every scholarly purpose. I haven't had time to code my articles in XML.  But since even HTML is superior to PDF for purposes of access and reuse, I self-archive in HTML rather than PDF whenever I can. For the
Re: Ranking Web of Repositories: July 2010 Edition
El 12/07/2010 7:25, Leslie Chan escribió: On 7/11/10 6:49 AM, Leslie Carrl...@ecs.soton.ac.uk wrote: On 10 Jul 2010, at 15:37, Peter Suber wrote: For more detail on rich media or rich files, see the Webometrics page on methodology: Only the number of text files in Acrobat format (.pdf) ... are consideredThis is a bug, not a feature. A more useful ranking would try to count full-text scholarly or peer-reviewed articles regardless of format. I know that's hard to do. But it's a mistake to use any format as a surrogate for that status, and especially a format as flawed as PDF. Even if Webometrics wanted to reward some formats more than others, it should not reward PDF. I think it should. The overwhelming majority of academic papers are distributed online as PDF; the overwhelming majority of things in repositories that are not PDF are not academic papers. This is rather circular. The view that academic papers should be fixed in form and format is rather out of sync with the emergence of new forms of scholarly expression enabled by the web. Here is an interesting commentary in a recent THE: Academics in the humanities and social sciences need to question whether the current narrowly conceived conventions of academic publication are in our best interests. If reality is multifaceted, then writing that responds to it needs to be multifaceted, too. Academics should be encouraged to explore a heterogeneous range of formats, reaching different audiences and finding new ways to write about research. http://www.timeshighereducation.co.uk/story.asp?storyCode=411466sectioncode=26 I think this discussion raises a fundamental question about the design of IRs and their support for scholarship. IRs must do better to capture the diversity of scholarly contribution and formats, and make them count in meaningful way. Dear Leslie: You are completely right, others formats should be used, far better than others currently available and of course open source/open access. Now try to convince 1 billion Internet users to do that. NOBODY (well, a few thousands) is using these other formats today (yet). I have the figures. The format is optimized for print or reading, not for use or reuse. PDFs are slow to load and often not even readable in bandwidth-poor parts of the world. They crash many browsers. They often lack working links; when they do have links, they require users to open in the same window rather than in a separate window, losing the file that took so long to load. Users can't deep-link to subsections. Publishers can lock them to prevent cutting and pasting. Publishers can insert scripts to make them unreadable offline or after a certain time. PDFs impede text processing by users, text mining by software, handicapped access (read-aloud software), and mark-up by third parties. This is an argument about what software/data formats researchers *should* use; affecting their authoring and editorial processes is probably beyond the scope of what we can expect from this league table. This points to the problem with league tables in general. Much like the league tables in the Journal Citation Report with journal ranking, such tables gloss over what are important to different disciplinary needs and authoring processes, and privilege quantitative measures that encourage spurious ranking and comparison. Do we really need more output based comparisons? I have only a few numbers to support the need of league tables. QS, the former editors of the THES Ranking of Universities, stated they received 18 million visitors per year, our Web Ranking is close to 5 million and probably the Shanghai ranking reach similar or even higher levels. PubMed Central scores low in the Webometric rankings because it has no PDFs. It does have PDFs - it might ingest articles in XML, but it certainly exports them in PDF. Enquiring of Google (site:www.ncbi.nlm.nih.gov filetype:pdf) shows that it has about 6,690,000 PDFs. So PMC is being penalized by the ranking system because it is dynamic? Nobody is saying that. PMC is excluded because it have not its own domain or subdomain. You can disagree but I dislike my papers being url-authored by the library. But PMC is one of the most populated and useful OA repositories in the world. This is something that needs investigating. If I had to guess why it ranks so low, it might be because no-one is linking INTO pubmed; rather they are linking to the original publishers. How should we define the most useful? Should download and other usage stats be taken into consideration, instead of only in-bound links? As soon as (standardized) user statistics become available they will be used. Good indicators need to be useful but also feasible. The format it uses instead of PDF, the NLM DTD coded in XML, is vastly superior to PDF for every scholarly purpose. I haven't had time to code my articles in XML. But since even
Re: Ranking Web of Repositories: July 2010 Edition
On 12 Jul 2010, at 06:25, Leslie Chan wrote: This is rather circular. The view that academic papers should be fixed in form and format is rather out of sync with the emergence of new forms of scholarly expression enabled by the web. I don't wish to argue that academic writing SHOULD BE fixed in format, merely to observe that IT IS predominantly so. Academics should be encouraged to explore a heterogeneous range of formats, reaching different audiences and finding new ways to write about research. When they do, we'll find a way to measure it :-) If you believe they are in a significant way, let's do it! I think this discussion raises a fundamental question about the design of IRs and their support for scholarship. IRs must do better to capture the diversity of scholarly contribution and formats, and make them count in meaningful way. I wholeheartedly concur. Do we really need more output based comparisons? We need a range of comparisons of many sorts to get as full a picture as possible. How should we define the most useful? Should download and other usage stats be taken into consideration, instead of only in-bound links? If we had access to those statistics, by all means lets use them. Why wait for Microsoft? What has the the open source community be doing on this front? What about OpenOffice? Any good open source NLM DTD conversion tools out there? Why has it taken so long? If there was something for open office then it would be trivial for repositories to apply it to Microsoft Word documents. -- Les Carr
Re: Ranking Web of Repositories: July 2010 Edition
Hello, Personally I am feeling uncomfortable with this ranking because, to my mind, it is uncomplete and unprecise. It is uncomplete because the repositories hosted on the subdirectory are not ranked (e.g : www.xxx.zz/repository) for technical reasons, even if, as Isidro noted, the number of these repositories is far lower than the non-repositories listed in ROAR and OpenDOAR. It is unprecise because it is based on web automatic commands that are very sensitive to noise. For example, it is the case for the visibility indicator (external inlinks). As far as I understand from Isidro explanations, a part of this indicator is calculated with the yahoo linkdomain function : linkdomain:http://my_site âsite:my_site  I tested this function on a few repositories ranked including our one. More than 90% (and, in some cases, I guess more than 99% inlinks) are not significant because they come from : -   automatic spam web site (e.g: www.find-pdf.com, www.mypdffiles.com,... or automatic site such as http://www.123people.fr ) -   automatic links from OAI harvesters -   automatic links that comes from other domains of the university (e.g. : auto-citation through automatic personnal authorâs pages)... -   automatic repetition of the same link : in some forums, a link on the main page will be duplicated automatically on all archive pages so, with one manual significant link you can have several hundred of unsignificant automatic links. -   ⦠The other indicators (size, rich files, scholar) may also be hazardous for similar reasons. According to Isidro, all these points affect the numbers but not (much) the ranking. This should be confirmed... Kind regards, Fred   Isidro F. Aguillo a écrit : Dear all: In fact we have already take into account some of your comments in the last editions of the ranking. Let me explain: - The ranking is based on a ratio 1:1 between ACTIVITY and VISIBILITY, so it is as important as publishing a lot of OA papers doing it in a way others (worldwide) can recover, use and link them. The ratio 1:1 means the weight of each is 50%. As stated in previous messages, Visibility is measured counting the total number of external inlinks. - Regarding activity, we decided to follow your advices so the value is calculated giving more or less the same value to these three variables: * Number of papers, usually full text articles, using as a proxy the number of items from Google Scholar * Number of web pages: ALL the webpages (usually html or similar ones, but also other formats) of the website * Number of documents: A subset of the former, those files in rich format like pdf, ps, doc or ppt. It is probably true that pdf is not the best format and perhaps we should consider other formats, but people are not using other formats. The number of files in OpenOffice formats, XML, or others are negligible, useless for ranking purposes. - PMC. Our policy is not to rank repositories without its own domain or subdomain. There are technical reasons but also visibility ones. The address of PMC is absurdly complex: www.ncbi.nlm.nih.gov/pmc Regarding UK PMC they are included in the ranking but its position is delayed because they do not use suffixes in their file's names. They have hundreds of thousands of Adobe Acrobat (pdf) files without making them as *.pdf. This avoid an efficient filtering by file type by major search engines. Best regards, -- Fred Merceur Ifremer / Bibliothèque La Pérouse frederic.merc...@ifremer.fr Tél : 02-98-49-88-69 Fax : 02-98-49-88-84 Archimer, Ifremer's Institutional Repository Avano, a marine and aquatic OAI harvester Bibliothèque La Pérouse Avant d'imprimer, pensez à l'environnement!
Re: Ranking Web of Repositories: July 2010 Edition
Hi Les:  You're arguing that Webometrics should count PDFs, and I fully agree.  I was only arguing that Webometrics should not *limit* its count to PDFs.  Sorry if I didn't make that clear. BTW, I'd make the analogous case to publishers.  Publish in PDF if you like, but never publish in PDF-only.  If you offer PDF editions, then also offer XML or HTML editions.    Best,    Peter Peter Suber www.bit.ly/suber -- On Sun, Jul 11, 2010 at 6:49 AM, Leslie Carr l...@ecs.soton.ac.uk wrote: On 10 Jul 2010, at 15:37, Peter Suber wrote: For more detail on rich media or rich files, see the Webometrics page on methodology:  Only the number of text files in Acrobat format (.pdf) ... are consideredThis is a bug, not a feature.  A more useful ranking would try to count full-text scholarly or peer-reviewed articles regardless of format.  I know that's hard to do.  But it's a mistake to use any format as a surrogate for that status, and especially a format as flawed as PDF. Even if Webometrics wanted to reward some formats more than others, it should not reward PDF. I think it should. The overwhelming majority of academic papers are distributed online as PDF; the overwhelming majority of things in repositories that are not PDF are not academic papers. The format is optimized for print or reading, not for use or reuse.  PDFs are slow to load and often not even readable in bandwidth-poor parts of the world.  They crash many browsers.  They often lack working links; when they do have links, they require users to open in the same window rather than in a separate window, losing the file that took so long to load.  Users can't deep-link to subsections.  Publishers can lock them to prevent cutting and pasting.  Publishers can insert scripts to make them unreadable offline or after a certain time.  PDFs impede text processing by users, text mining by software, handicapped access (read-aloud software), and mark-up by third parties. This is an argument about what software/data formats researchers *should* use; affecting their authoring and editorial processes is probably beyond the scope of what we can expect from this league table. PubMed Central scores low in the Webometric rankings because it has no PDFs. It does have PDFs - it might ingest articles in XML, but it certainly exports them in PDF. Enquiring of Google (site:www.ncbi.nlm.nih.gov filetype:pdf) shows that it has about 6,690,000 PDFs. But PMC is one of the most populated and useful OA repositories in the world. This is something that needs investigating. If I had to guess why it ranks so low, it might be because no-one is linking INTO pubmed; rather they are linking to the original publishers. The format it uses instead of PDF, the NLM DTD coded in XML, is vastly superior to PDF for every scholarly purpose. I haven't had time to code my articles in XML.  But since even HTML is superior to PDF for purposes of access and reuse, I self-archive in HTML rather than PDF whenever I can. For the record, I completely agree with you about PDF / HTML / XHTML. If only Microsoft Word (and LaTeX) had decent export facilities that produced good semantic HTML. -- Les Carr
Re: Ranking Web of Repositories: July 2010 Edition
On Thu, Jul 8, 2010 at 8:59 AM, Leslie Carr l...@ecs.soton.ac.uk wrote: [...] If you assume that a repository is full of locally-authored research literature then you will find all sorts of counter-examples in one area or another. The Rich Media criterion goes some way to filtering out non-documents, but whether the items are scholarly or local or equivalent to those in other repositories is very difficult to ascertain. For more detail on rich media or rich files, see the Webometrics page on methodology: Â Only the number of text files in Acrobat format (.pdf) ... are considered. http://repositories.webometrics.info/methodology_rep.html This is a bug, not a feature. Â A more useful ranking would try to count full-text scholarly or peer-reviewed articles regardless of format. Â I know that's hard to do. Â But it's a mistake to use any format as a surrogate for that status, and especially a format as flawed as PDF. Even if Webometrics wanted to reward some formats more than others, it should not reward PDF. Â The format is optimized for print or reading, not for use or reuse. Â PDFs are slow to load and often not even readable in bandwidth-poor parts of the world. Â They crash many browsers. Â They often lack working links; when they do have links, they require users to open in the same window rather than in a separate window, losing the file that took so long to load. Â Users can't deep-link to subsections. Â Publishers can lock them to prevent cutting and pasting. Â Publishers can insert scripts to make them unreadable offline or after a certain time. Â PDFs impede text processing by users, text mining by software, handicapped access (read-aloud software), and mark-up by third parties. Â PubMed Central scores low in the Webometric rankings because it has no PDFs. Â But PMC is one of the most populated and useful OA repositories in the world. Â The format it uses instead of PDF, the NLM DTD coded in XML, is vastly superior to PDF for every scholarly purpose. I haven't had time to code my articles in XML. Â But since even HTML is superior to PDF for purposes of access and reuse, I self-archive in HTML rather than PDF whenever I can. Â Â Â Peter Peter Suber www.bit.ly/suber
Re: Ranking Web of Repositories: July 2010 Edition
Dear Stevan: A lot of interesting stuff to think about. We are already working on some of those proposals but it is not easy. However perhaps you will like this page we prepared for the University rankings related to UK universities commitment to OA: http://www.webometrics.info/openac.html Thanks for your useful comments, El 08/07/2010 18:34, Stevan Harnad escribió: On 2010-07-08, at 4:43 AM, Isidro F. Aguillo wrote: Dear Hélène: Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web. Dear Isidro, If I may intervene with some comments too, as this discussion has some wider implications: Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and (2) the ranking of the top 800 *institutional* repositories to be based on the fact that the latter are institutional repositories whereas the former are all repositories (central, i.e., multi-institutional, as well as institutional). Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the ranking becomes rather vague. Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository. All true, but perhaps the significance and usefulness of the rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings are displayed. Otherwise your weightings become like the h-index -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by... But here is a more important aspect. If I were the president of INRIA I will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA. But how much INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus! What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary, for INRIA (and all institutions), *where* their papers are OA, compared to the necessary condition *that* they are OA (and hence freely accessible, usaeble, harvestable). Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their full-text institutional paper output is indeed deposited and OA. INRIA would have no reason to be disappointed if the locus from which its content is searched, retrieved and linked is some other, multi-institutional harvester. INRIA still gets the credit and benefits from all the links, downloads and citations of INRIA content! (Having said that, locus of deposit *does* matter, very much, for deposit mandates, Deposit mandates are necessary in order to generate OA content. And, for strategic reasons that are elaborated in my reply to Chris Armbruster, it makes a big practical difference for success in agreeing on the adoption of a mandate that both institutional and funder mandates should require convergent *institutional* deposit, rather than divergent and competing institutional vs. institution-extermal deposit. Here too, your repository rankings would be much more helpful and informative if they gave a greater weight to the relative size of each institutional repository's content and eliminated multi-institutional repositories from the institutional repository rankings -- or at least allowed institutional repositories to be ranked independently on content vs links. I think you are perhaps being misled here by the analogy with your sister rankings http://www.webometrics.info/ RWWU of universities rather than their repositories In university rankings, the links to the university site itself matter a lot. But in repository rankings links matter much less than *how much institutional content is accessible*. For the degree of usage of that content, harvester sites may be more relevant measures, and, after all, downloads and citations, unlike links, carry their credits (to the authors and institutions) with them no matter where the transaction happens to occur... Regarding the other comments we are going to correct those with
Re: Ranking Web of Repositories: July 2010 Edition
On 9 Jul 2010, at 08:12, Isidro F. Aguillo wrote: However perhaps you will like this page we prepared for the University rankings related to UK universities commitment to OA: http://www.webometrics.info/openac.html Thanks for preparing the page - it is very informative and helpful in answering questions about the interpretation of the IR ranking relating to the discrepancy between the relative ordering of institutions in the IR list and other (independent) research rankings. As you point out, much of the difference is explained by the relative openness of each institution's literature. Since 50% of the score is devoted to in-links, and there is little motivation to link to an empty bibliographic record, a high proportion of OA papers will tend to attract more links, more traffic and hence a more impactful repository. Some institutions have therefore benefited from their efforts to deposit OA papers, becoming more visible and hence more highly rated. Others are seeing the opposite effect - institutions that would normally be at the top of any research list are much lower down than expected. Some of these institutions don't have very effective repositories and some do but hide them behind firewalls. Either way the net effect is the same - not much visible public literature to attract links or traffic. I hope that the effect of this league table will be to encourage institutions to redouble their efforts in regard to Open Access. I also hope that it will be possible to have further public dialogue so that the process can be increasingly open and the community can better understand, verify and trust your metrics. Thanks again for your contribution! -- Les Carr On 9 Jul 2010, at 08:12, Isidro F. Aguillo wrote: Dear Stevan: A lot of interesting stuff to think about. We are already working on some of those proposals but it is not easy. However perhaps you will like this page we prepared for the University rankings related to UK universities commitment to OA: http://www.webometrics.info/openac.html Thanks for your useful comments, El 08/07/2010 18:34, Stevan Harnad escribió: On 2010-07-08, at 4:43 AM, Isidro F. Aguillo wrote: Dear Hélène: Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web. Dear Isidro, If I may intervene with some comments too, as this discussion has some wider implications: Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and (2) the ranking of the top 800 *institutional* repositories to be based on the fact that the latter are institutional repositories whereas the former are all repositories (central, i.e., multi-institutional, as well as institutional). Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the ranking becomes rather vague. Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository. All true, but perhaps the significance and usefulness of the rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings are displayed. Otherwise your weightings become like the h-index -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by... But here is a more important aspect. If I were the president of INRIA I will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA. But how much INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus! What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary, for INRIA (and all institutions), *where* their papers are OA, compared to the necessary condition *that* they are OA (and hence freely accessible, usaeble, harvestable). Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their full-text institutional paper output is indeed deposited and OA. INRIA would have no reason to be disappointed if the
Re: Ranking Web of Repositories: July 2010 Edition
Isidro, Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories! Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it. Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages. May I remind you that: Correction for the top 800 ranking: The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS . Correction for the ranking of institutional repositories: Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word institution. Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 institutional repositories, CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, California Institute of Technology Online Archive of California (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the Top 800 Institutional seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones. I hope that this will help in improving the next release of your increasingly useful ranking! Best wishes Hélène Bosc - Original Message - From: Stevan Harnad har...@ecs.soton.ac.uk To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Tuesday, July 06, 2010 6:07 PM Subject: Fwd: Ranking Web of Repositories: July 2010 Edition Begin forwarded message: From: Isidro F. Aguillo isidro.agui...@cchs.csic.es List-Post: goal@eprints.org List-Post: goal@eprints.org Date: July 6, 2010 11:13:58 AM EDT To: sigmetr...@listserv.utk.edu Subject: [SIGMETRICS] Ranking Web of Repositories: July 2010 Edition Ranking Web of Repositories: July 2010 Edition The second edition of 2010 Ranking Web of Repositories has been published the same day OR2010 started here in Madrid. The ranking is available from the following URL: http://repositories.webometrics.info/ The main novelty is the substantial increase in the number of repositories analyzed (close to 1000). The Top 800 are ranked according to their web presence and visibility. As usual thematic repositories (CiteSeer, RePEc, Arxiv) leads the Ranking, but the French research institutes (CNRS, INRIA, SHS) using HAL are very close. Two issues have changed from previous editions from a methodologicall point of view:, the use of Bing's engine data has been discarded due to irregularities in the figures obtained and MS Excel files has been excluded again. At the end of July the new edition of the Rankings of universities, research centers and hospitals will be published. Comments, suggestions and additional information are greatly appreciated. -- === Isidro F. Aguillo, HonPhD Cybermetrics Lab (3C1) IPP-CCHS-CSIC Albasanz, 26-28 28037 Madrid. Spain Editor of the Rankings Web ===
Re: Ranking Web of Repositories: July 2010 Edition
Dear Hélène: Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web. Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository. But here is a more important aspect. If I were the president of INRIA I will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA. Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is faking its institutional repository with contents authored by external scholars. Best regards, El 07/07/2010 23:03, Hélène.Bosc escribió: Isidro, Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories! Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it. Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages. May I remind you that: Correction for the top 800 ranking: The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS . Correction for the ranking of institutional repositories: Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word institution. Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 institutional repositories, CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, California Institute of Technology Online Archive of California (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the Top 800 Institutional seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones. I hope that this will help in improving the next release of your increasingly useful ranking! Best wishes Hélène Bosc - Original Message - From: Stevan Harnad har...@ecs.soton.ac.uk To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Tuesday, July 06, 2010 6:07 PM Subject: Fwd: Ranking Web of Repositories: July 2010 Edition Begin forwarded message: From: Isidro F. Aguillo isidro.agui...@cchs.csic.es Date: July 6, 2010 11:13:58 AM EDT To: sigmetr...@listserv.utk.edu Subject: [SIGMETRICS] Ranking Web of Repositories: July 2010 Edition Ranking Web of Repositories: July 2010 Edition The second edition of 2010 Ranking Web of Repositories has been published the same day OR2010 started here in Madrid. The ranking is available from the following URL: http://repositories.webometrics.info/ The main novelty is the substantial increase in the number of repositories analyzed (close to 1000). The Top 800 are ranked according to their web presence and visibility. As usual thematic repositories (CiteSeer, RePEc, Arxiv) leads the Ranking, but the French research institutes (CNRS, INRIA, SHS) using HAL are very close. Two issues have changed from previous editions from a methodologicall point of view:, the use of Bing's engine data has been discarded due to irregularities in the figures obtained and MS Excel files has been excluded again. At the end of July the new edition of the Rankings of universities, research centers and hospitals will
Re: Ranking Web of Repositories: July 2010 Edition
Hélène, Institution is indeed not a very precise concept, but the repository ranking will not be improved if one were to spend much time trying to decide which repository is institutional and which is not (e.g. how about also deleting No 10 because it is only a departmental repository?). Also, it is a bad idea to define repositories as institutional only if they restrict themselves to the output of a single institution. We already have too many repository managers who succumb to this kind of institutionalist logic - and reject OA content only because it is not from their own institution. The CSIC has a sound methodology for ranking repositories, and it not their job to define exclusively what is an IR and what not. And in cyberspace it is much more interesting to compare repositories according to domains and services they offer... Moreover, it would help if we could move beyond the often narrow understanding of what an institutional repository is and what not acknowledge more clearly that a strategy of privileging institutional repositories as such has not helped. The value sustainability of IRs (individually, as isolated instances, if not embedeed in a national system) is rather limited for both scholarship and open access. Hence, it is very welcome that more determined efforts are underway at building viable networks of research repositories and integrate IRs in national systems (e.g. Ireland as latest instance). For a sustained argument, please see: Armbruster/Romary (2010) Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication. (accepted for publication in IJDLS) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1506905 Romary/Armbruster (2010) Beyond Institutional Repositories. IJDLS 1(1)44-61 http://ssrn.com/abstract=1425692 Regards, Chris -Ursprüngliche Nachricht- Von: American Scientist Open Access Forum im Auftrag von Hélène.Bosc Gesendet: Mi 7/7/2010 23:03 An: american-scientist-open-access-fo...@listserver.sigmaxi.org Betreff:     Re: Ranking Web of Repositories: July 2010 Edition Isidro, Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories! Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it. Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages. May I remind you that: Correction for the top 800 ranking: The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS . Correction for the ranking of institutional repositories: Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word institution. Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 institutional repositories, CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, California Institute of Technology Online Archive of California (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the Top 800 Institutional seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones. I hope that this will help in improving the next release of your increasingly useful ranking! Best wishes Hélène Bosc - Original Message - From: Stevan Harnad har...@ecs.soton.ac.uk To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Tuesday, July 06, 2010 6:07 PM Subject: Fwd: Ranking Web of Repositories: July 2010 Edition Begin forwarded message: From: Isidro F. Aguillo isidro.agui...@cchs.csic.es List-Post: goal@eprints.org List-Post: goal
Re: Ranking Web of Repositories: July 2010 Edition
On 8 Jul 2010, at 09:43, Isidro F. Aguillo wrote: Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is faking its institutional repository with contents authored by external scholars. This (and the HAL-based problems) are interpretive issues that bedevil services that analyse repositories. If you assume that a repository is full of locally-authored research literature then you will find all sorts of counter-examples in one area or another. The Rich Media criterion goes some way to filtering out non-documents, but whether the items are scholarly or local or equivalent to those in other repositories is very difficult to ascertain. -- Les Carr El 07/07/2010 23:03, Hélène.Bosc escribió: Isidro, Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories! Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it. Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages. May I remind you that: Correction for the top 800 ranking: The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS . Correction for the ranking of institutional repositories: Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word institution. Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 institutional repositories, CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, California Institute of Technology Online Archive of California (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the Top 800 Institutional seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones. I hope that this will help in improving the next release of your increasingly useful ranking! Best wishes Hélène Bosc - Original Message - From: Stevan Harnad har...@ecs.soton.ac.uk To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Tuesday, July 06, 2010 6:07 PM Subject: Fwd: Ranking Web of Repositories: July 2010 Edition Begin forwarded message: From: Isidro F. Aguillo isidro.agui...@cchs.csic.es Date: July 6, 2010 11:13:58 AM EDT To: sigmetr...@listserv.utk.edu Subject: [SIGMETRICS] Ranking Web of Repositories: July 2010 Edition Ranking Web of Repositories: July 2010 Edition The second edition of 2010 Ranking Web of Repositories has been published the same day OR2010 started here in Madrid. The ranking is available from the following URL: http://repositories.webometrics.info/ The main novelty is the substantial increase in the number of repositories analyzed (close to 1000). The Top 800 are ranked according to their web presence and visibility. As usual thematic repositories (CiteSeer, RePEc, Arxiv) leads the Ranking, but the French research institutes (CNRS, INRIA, SHS) using HAL are very close. Two issues have changed from previous editions from a methodologicall point of view:, the use of Bing's engine data has been discarded due to irregularities in the figures obtained and MS Excel files has been excluded again. At the end of July the new edition of the Rankings of universities, research centers and hospitals will be published. Comments, suggestions and additional information are greatly appreciated. -- === Isidro F. Aguillo, HonPhD Cybermetrics Lab (3C1) IPP-CCHS-CSIC Albasanz, 26-28 28037 Madrid. Spain Editor of the
Re: Ranking Web of Repositories: July 2010 Edition
On 2010-07-08, at 4:43 AM, Isidro F. Aguillo wrote: Dear Hélène: Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web. Dear Isidro, If I may intervene with some comments too, as this discussion has some wider implications: Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and (2) the ranking of the top 800 *institutional* repositories to be based on the fact that the latter are institutional repositories whereas the former are all repositories (central, i.e., multi-institutional, as well as institutional). Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the ranking becomes rather vague. Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository. All true, but perhaps the significance and usefulness of the rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings are displayed. Otherwise your weightings become like the h-index -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by... But here is a more important aspect. If I were the president of INRIA I will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA. But how much INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus! What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary, for INRIA (and all institutions), *where* their papers are OA, compared to the necessary condition *that* they are OA (and hence freely accessible, usaeble, harvestable). Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their full-text institutional paper output is indeed deposited and OA. INRIA would have no reason to be disappointed if the locus from which its content is searched, retrieved and linked is some other, multi-institutional harvester. INRIA still gets the credit and benefits from all the links, downloads and citations of INRIA content! (Having said that, locus of deposit *does* matter, very much, for deposit mandates, Deposit mandates are necessary in order to generate OA content. And, for strategic reasons that are elaborated in my reply to Chris Armbruster, it makes a big practical difference for success in agreeing on the adoption of a mandate that both institutional and funder mandates should require convergent *institutional* deposit, rather than divergent and competing institutional vs. institution-extermal deposit. Here too, your repository rankings would be much more helpful and informative if they gave a greater weight to the relative size of each institutional repository's content and eliminated multi-institutional repositories from the institutional repository rankings -- or at least allowed institutional repositories to be ranked independently on content vs links. I think you are perhaps being misled here by the analogy with your sister rankings http://www.webometrics.info/ RWWU of universities rather than their repositories In university rankings, the links to the university site itself matter a lot. But in repository rankings links matter much less than *how much institutional content is accessible*. For the degree of usage of that content, harvester sites may be more relevant measures, and, after all, downloads and citations, unlike links, carry their credits (to the authors and institutions) with them no matter where the transaction happens to occur... Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is faking its institutional repository with contents authored by external scholars. I have called Gail McMillan at Virginia Tech about this, and she has explained it to me. The question was never whether Virginia Tech was faking! They simply host content over and above Virginia Tech content -- for example, OA journals whose content originates
Re: Ranking Web of Repositories: July 2010 Edition
Dear Chris Armbruster and Isidro Aguilo, Since Stevan Harnad has the advantage of being able to read and respond to messages first, I have nothing further to add. Had I replied first, I would have made some of the arguments he made in support of my view on the rankings, but it would have been done in a less clear and complete way.  Best wishes. Hélène Bosc - Original Message - From: Armbruster, Chris To: american-scientist-open-access-fo...@listserver.sigmaxi.org Sent: Thursday, July 08, 2010 11:44 AM Subject: Re: Ranking Web of Repositories: July 2010 Edition Hélène, Institution is indeed not a very precise concept, but the repository ranking will not be improved if one were to spend much time trying to decide which repository is institutional and which is not (e.g. how about also deleting No 10 because it is only a departmental repository?). Also, it is a bad idea to define repositories as institutional only if they restrict themselves to the output of a single institution. We already have too many repository managers who succumb to this kind of institutionalist logic - and reject OA content only because it is not from their own institution. The CSIC has a sound methodology for ranking repositories, and it not their job to define exclusively what is an IR and what not. And in cyberspace it is much more interesting to compare repositories according to domains and services they offer... Moreover, it would help if we could move beyond the often narrow understanding of what an institutional repository is and what not acknowledge more clearly that a strategy of privileging institutional repositories as such has not helped. The value sustainability of IRs (individually, as isolated instances, if not embedeed in a national system) is rather limited for both scholarship and open access. Hence, it is very welcome that more determined efforts are underway at building viable networks of research repositories and integrate IRs in national systems (e.g. Ireland as latest instance). For a sustained argument, please see: Armbruster/Romary (2010) Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication. (accepted for publication in IJDLS) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1506905 Romary/Armbruster (2010) Beyond Institutional Repositories. IJDLS 1(1)44-61 http://ssrn.com/abstract=1425692 Regards, Chris -Ursprüngliche Nachricht- Von: American Scientist Open Access Forum im Auftrag von Hélène.Bosc Gesendet: Mi 7/7/2010 23:03 An: american-scientist-open-access-fo...@listserver.sigmaxi.org Betreff:     Re: Ranking Web of Repositories: July 2010 Edition Isidro, Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories! Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it. Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages. May I remind you that: Correction for the top 800 ranking: The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS . Correction for the ranking of institutional repositories: Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word institution. Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 institutional repositories, CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, California Institute of Technology Online Archive of California (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the Top 800 Institutional seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size