I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a good place to start and seems like a really bad driver for software development.
Maybe we should be talking to our fellow implementers and building on the work of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a compilation of _best_ practice. Cheers stuart -----Original Message----- From: Tim Donohue [mailto:tdono...@duraspace.org] Sent: Wednesday, 3 September 2014 8:49 a.m. To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories Hello Isidro, DuraSpace (the stewarding organization behind DSpace and Fedora repository software) was planning to send you a compiled list of the concerns with your proposal. As you can tell from the previous email thread, many of the users of DSpace have similar concerns. Rather than bombard you with all of them individually (which you could see from browsing the thread), we hoped to draft up a response summarizing the concerns of the DSpace community. Below you'll find an initial draft of the summarized concerns. The rule numbering below is based on the numbering at: http://repositories.webometrics.info/en/node/26 --- Concerns with the Proposal from Ranking Web of Repositories * Rule #2 (IRs that don't use the institutional domain will be excluded) would cause the exclusion of some IRs which are hosted by DSpace service providers. As an example, some DSpaceDirect.org users have URLs https://[something].dspacedirect.org which would cause their exclusion as it is a non-institutional domain. Many other DSpace hosting providers have similar non-institutional domain URLs by default. * Rule #4 (Repositories using ports other than 80 or 8080) would wrongly exclude all DSpace sites which use HTTPS (port 443). Many institutions choose to run DSpace via HTTPS instead of HTTP. * Rule #5 (IRs that use the name of the software in the hostname would be excluded) may also affect IRs which are hosted by service providers (like DSpaceDirect). Again, some DSpaceDirect customers have URLs which use *.dspacedirect.org (includes "dspace"). This rule would also exclude MIT's IR which is the original "DSpace" (and has used the same URL for the last 10+ years): http://dspace.mit.edu/ * Rule #6 (IRs that use more than 4 directory levels for the URL address of the full texts will be excluded.) may accidentally exclude a large number of DSpace sites. The common download URLs for full text in DSpace are both are at least 4 directory levels deep: - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename] - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename] NOTE: "prefix" and "id" are parts of an Item's Handle (http://hdl.handle.net/), which is the persistent identifier assigned to the item via the Handle System. So, this is how a persistent URL like http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace. * Rule #7 (IRs that use more than 3 different numeric (or useless) codes in their URLs will be excluded.). It is unclear how they would determine this, and what the effect may be on DSpace sites worldwide. Again, looking at the common DSpace URL paths above, if a file had a "numeric" name, it may be excluded as DSpace URLs already include 2-3 numeric codes by default ([prefix],[id], and [sequence] are all numeric). * Rule #8 (IRs with more than 50% of the records not linking to OA full text versions..). Again, unclear how they would determine this, and whether the way they are doing so would accidentally exclude some major DSpace sites. For example, there are major DSpace sites which include a larger number of Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to the world, but may be fully accessible everyone "on campus". --- Another, perhaps more serious concern, is on the timeline you propose. You suggest a timeline of January 2015 when these newly proposed rules would be in place. Yet, if these rules were to go in place, some rules may require changes to the DSpace software itself (as I laid out above, some rules may not mesh well with DSpace software as it is, unless I'm misunderstanding the rule itself). Unfortunately, based on our DSpace open source release timelines, we have ONE new release (DSpace 5.0) planned between now and January 2015. Even if we were able to implement some of these recommended changes at a software level, the vast majority (likely >80-90%) of DSpace instances would likely NOT be able to upgrade to the latest DSpace version before your January deadline (as the 5.0 release is scheduled for Nov/Dec). Therefore, as is, your January 2015 ranking may accidentally exclude a large number of DSpace sites from your rankings, and DSpace is still the most widely used Institutional Repository software in the world. So, in general, I think our response is that these proposed rules/guidelines are a bit concerning to many users of DSpace (as you can see from this long thread of concerns from various people and institutions). We worry that a larger number of DSpace instances would be accidentally excluded from the rankings, which makes the final ranking less useful to users of DSpace overall. I know DuraSpace would be open to discussing this with you and your colleagues. Perhaps there's a middle ground here, or a way to slowly "roll out" some of your recommended changes. This could allow DSpace developers more time to enhance DSpace software itself, and allow users of DSpace more time to upgrade to ensure they are included in the Rankings. (Note: we've similarly had discussions with the Google Scholar team to help gradually add improvements to DSpace to better meet their indexing needs...so it seems like the same could occur with the Webometrics team.) I've copied our DuraSpace Chief Strategy Officer, Jonathan Markow, on this message as well. Tim Donohue Technical Lead for DSpace & DSpaceDirect DuraSpace.org | DSpace.org | DSpaceDirect.org ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ Dspace-general mailing list dspace-gene...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-general ------------------------------------------------------------------------------ Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette