Re: Making empirical data code available

Dan Stowell Fri, 17 Feb 2012 06:07:59 -0800

Hi all,

IMO the most important things are to have:

- permanently accessible URLs or other references (e.g. DOIs). Forthis it helps to choose a website or archive that you believe will lasta long time. Github is great for software sharing but I don't see anyreason to expect their URL scheme to be fixed in stone. I don't know ofa public service that attaches DOIs to arbitrary datasets (shame), but Iuse archive.org for publishing datasets (e.g.<http://www.archive.org/details/beatboxset1>) - it is a USlibrary-oriented service whose explicit mission is to preserve digitaldata for a very long time.

- clear licensing that allows sharing (ideally open data such as CC,and open source code such as GPL or BSD).

The advantage of open licensing is that if github or archive.org goesbust, long after I have moved on to other interests, other people canre-host my data and code. I don't see any particularly compelling reasonto gather things into one archive, though it does seem to help in acommunity-building kind of sense.


Dan


On 17/02/2012 13:53, Derek M Jones wrote:

Neil,

There are some efforts underway to do this. I'm familiar with
http://datacite.org/ and http://figshare.com. A couple of SE groups
have started data and model problem repositories, such as
http://promisedata.org.


Thanks for the links. figshare looks interesting.

The challenge is getting everyone on board. For now, I don't see a
compelling reason to use these places.


People could just as easily use git-hub, https://github.com/
which is used by a lot of researchers to make their code freely
available (git-hub make their money from people paying for hosting
of privately avaialble code).

Your paper "Automated topic naming to support cross-project analysis
of software maintenance activities" is in my pile of interesting ones
to read in more detail. You can read about my own interest in naming
in www.knosof.co.uk/cbook/sent792.pdf

I suspect it won't happen until journals and conferences begin to
insist on it. There is a reason why retraction rates are so low in CS
and SE: no way to reproduce results to confirm.

Cameron Neylon is a good point man on the issues around Science 2.0
and open access (http://cameronneylon.net/)


Neil Ernst
http://neilernst.net

On 2012-02-16, at 7:15, Derek M Jones wrote:

Lindsay,

A couple of researchers I have contacted to obtain data
told me that they have either lost it or did not make an
effort to keep it.

Having someplace that people could automatically upload their
data to might help preserve more of it, as well as making
life easier for other by cutting down on search time.

A while back I was asked to prepare an area on the PPIG website
where people could upload data for public consumption (surrounded by
appropriate caveats of course). The data I was preparing for didn't
ever turn up so the area remains hidden, but I can certainly expose
this in some way if people wish to use it.


--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd blog:shape-of-code.coding-guidelines.com
Source code analysis http://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an
exempt charity in England& Wales and a charity registered in Scotland
(SC 038302).



--
Dan Stowell
Postdoctoral Research Assistant
Centre for Digital Music
Queen Mary, University of London
Mile End Road, London E1 4NS
http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm
http://www.mcld.co.uk/

Re: Making empirical data code available

Reply via email to