I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a 
good place to start and seems like a really bad driver for software development.

Maybe we should be talking to our fellow implementers and building on the work 
of http://www.w3.org/Provider/Style/URI.html, http://www.w3.org/TR/cooluris/, 
http://www.openarchives.org/OAI/openarchivesprotocol.html, etc. to build a 
compilation of _best_ practice.

Cheers
stuart

-----Original Message-----
From: Tim Donohue [mailto:tdono...@duraspace.org] 
Sent: Wednesday, 3 September 2014 8:49 a.m.
To: Isidro F. Aguillo; dspace-tech@lists.sourceforge.net
Cc: Jonathan Markow; dspace-gene...@lists.sourceforge.net
Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories

Hello Isidro,

DuraSpace (the stewarding organization behind DSpace and Fedora repository 
software) was planning to send you a compiled list of the concerns with your 
proposal. As you can tell from the previous email thread, many of the users of 
DSpace have similar concerns. Rather than bombard you with all of them 
individually (which you could see from browsing the thread), we hoped to draft 
up a response summarizing the concerns of the DSpace community.

Below you'll find an initial draft of the summarized concerns. The rule 
numbering below is based on the numbering at: 
http://repositories.webometrics.info/en/node/26

--- Concerns with the Proposal from Ranking Web of Repositories

* Rule #2 (IRs that don't use the institutional domain will be excluded) would 
cause the exclusion of some IRs which are hosted by DSpace service providers. 
As an example, some DSpaceDirect.org users have URLs 
https://[something].dspacedirect.org which would cause their exclusion as it is 
a non-institutional domain. Many other DSpace hosting providers have similar 
non-institutional domain URLs by default.

* Rule #4 (Repositories using ports other than 80 or 8080) would wrongly 
exclude all DSpace sites which use HTTPS (port 443). Many institutions choose 
to run DSpace via HTTPS instead of HTTP.

* Rule #5 (IRs that use the name of the software in the hostname would be 
excluded) may also affect IRs which are hosted by service providers (like 
DSpaceDirect). Again, some DSpaceDirect customers have URLs which use 
*.dspacedirect.org (includes "dspace"). This rule would also exclude MIT's IR 
which is the original "DSpace" (and has used the same URL for the last 10+ 
years): http://dspace.mit.edu/

* Rule #6 (IRs that use more than 4 directory levels for the URL address of the 
full texts will be excluded.) may accidentally exclude a large number of DSpace 
sites. The common download URLs for full text in DSpace are both are at least 4 
directory levels deep:

    - XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
    - JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]

NOTE: "prefix" and "id" are parts of an Item's Handle (http://hdl.handle.net/), 
which is the persistent identifier assigned to the item via the Handle System. 
So, this is how a persistent URL like
http://hdl.handle.net/1721.1/26706 redirects to an Item in MIT's DSpace.

* Rule #7 (IRs that use more than 3 different numeric (or useless) codes in 
their URLs will be excluded.). It is unclear how they would determine this, and 
what the effect may be on DSpace sites worldwide. Again, looking at the common 
DSpace URL paths above, if a file had a "numeric" 
name, it may be excluded as DSpace URLs already include 2-3 numeric codes by 
default ([prefix],[id], and [sequence] are all numeric).

* Rule #8 (IRs with more than 50% of the records not linking to OA full text 
versions..). Again, unclear how they would determine this, and whether the way 
they are doing so would accidentally exclude some major DSpace sites. For 
example, there are major DSpace sites which include a larger number of 
Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to 
the world, but may be fully accessible everyone "on campus".

---

Another, perhaps more serious concern, is on the timeline you propose. 
You suggest a timeline of January 2015 when these newly proposed rules would be 
in place. Yet, if these rules were to go in place, some rules may require 
changes to the DSpace software itself (as I laid out above, some rules may not 
mesh well with DSpace software as it is, unless I'm misunderstanding the rule 
itself).

Unfortunately, based on our DSpace open source release timelines, we have ONE 
new release (DSpace 5.0) planned between now and January 2015. 
Even if we were able to implement some of these recommended changes at a 
software level, the vast majority (likely >80-90%) of DSpace instances would 
likely NOT be able to upgrade to the latest DSpace version before your January 
deadline (as the 5.0 release is scheduled for Nov/Dec). 
Therefore, as is, your January 2015 ranking may accidentally exclude a large 
number of DSpace sites from your rankings, and DSpace is still the most widely 
used Institutional Repository software in the world.

So, in general, I think our response is that these proposed rules/guidelines 
are a bit concerning to many users of DSpace (as you can see from this long 
thread of concerns from various people and institutions). We worry that a 
larger number of DSpace instances would be accidentally excluded from the 
rankings, which makes the final ranking less useful to users of DSpace overall.

I know DuraSpace would be open to discussing this with you and your colleagues. 
Perhaps there's a middle ground here, or a way to slowly "roll out" some of 
your recommended changes. This could allow DSpace developers more time to 
enhance DSpace software itself, and allow users of DSpace more time to upgrade 
to ensure they are included in the Rankings. (Note: we've similarly had 
discussions with the Google Scholar team to help gradually add improvements to 
DSpace to better meet their indexing needs...so it seems like the same could 
occur with the Webometrics team.)

I've copied our DuraSpace Chief Strategy Officer, Jonathan Markow, on this 
message as well.

Tim Donohue
Technical Lead for DSpace & DSpaceDirect DuraSpace.org | DSpace.org | 
DSpaceDirect.org

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Dspace-general mailing list
dspace-gene...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to