Making empirical data+code available

2012-02-16 Thread Derek M Jones

All,

Continuing on the theme of empirical research.

There is a growing trend for researchers to make their
experimental data available.

Promise is probably one of the more well known sites:
http://promisedata.org/

What is also needed is the code used to analyze it.
I have been having a hard time trying to get the numbers
reported in some papers from the data that has been made
available.

You can find mine here (only the 2011 experiment has all the code
needed to perform the analysis; I'm working on fixing that):
http://www.knosof.co.uk/dev-experiment.html

I hope list members will reply with where their own data can be
downloaded.

--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  blog:shape-of-code.coding-guidelines.com
Source code analysishttp://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).



RE: Making empirical data code available

2012-02-16 Thread Lindsay Marshall
A while back I was asked to prepare an area on the PPIG website where people 
could upload data for public consumption (surrounded by appropriate caveats of 
course). The data I was preparing for didn't ever turn up so the area remains 
hidden, but I can certainly expose this in some way if people wish to use it.

L.



-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).



Re: Making empirical data code available

2012-02-16 Thread Derek M Jones

Lindsay,

A couple of researchers I have contacted to obtain data
told me that they have either lost it or did not make an
effort to keep it.

Having someplace that people could automatically upload their
data to might help preserve more of it, as well as making
life easier for other by cutting down on search time.


A while back I was asked to prepare an area on the PPIG website where people 
could upload data for public consumption (surrounded by appropriate caveats of 
course). The data I was preparing for didn't ever turn up so the area remains 
hidden, but I can certainly expose this in some way if people wish to use it.


--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  blog:shape-of-code.coding-guidelines.com
Source code analysishttp://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).



Re: Making empirical data+code available

2012-02-16 Thread Richard O'Keefe

On 17/02/2012, at 2:53 AM, Derek M Jones wrote:
> You can find mine here (only the 2011 experiment has all the code
> needed to perform the analysis; I'm working on fixing that):
> http://www.knosof.co.uk/dev-experiment.html

This is a wonderful thing you have done.
I note that these days, when I see a lot of subjects (well, 30)
with a bunch of discrete attributes, correspondence analysis is
one of the things I reach for to get some insight.
There's the corresp() function in library(MASS)
and Fionn Murtagh's code to go with his correspondence analysis
book is available over the web.

While playing with the data, I was struck by two prominent
lines I kept seeing:
table(loc_written)
0 1 2 3 4 5 6 7 
2 3 3 8 1 7 3 3 
  ^   ^

I don't suppose it has any significance at all for your results,
but I wonder why the loc_written data were so clumpy.


-- 
The Open University is incorporated by Royal Charter (RC 000391), an exempt 
charity in England & Wales and a charity registered in Scotland (SC 038302).



Re: Making empirical data+code available

2012-02-16 Thread Derek M Jones

Richard,


There's the corresp() function in library(MASS)
and Fionn Murtagh's code to go with his correspondence analysis
book is available over the web.


This is very common practice with R books.


While playing with the data, I was struck by two prominent
lines I kept seeing:
table(loc_written)
0 1 2 3 4 5 6 7
2 3 3 8 1 7 3 3
   ^   ^

I don't suppose it has any significance at all for your results,
but I wonder why the loc_written data were so clumpy.


That 8 caught my eye, it should be 7 (a typo).
I checked the other numbers and they are correct.

What this is saying is that developers don't have a clue how many lines
of code they have read/written (see extract of question below).
In places they are not even consistent and there is a poor correlation
with experience (0s indicate no answer given, which should really be
NA).

---
How many lines of code would you estimate you have \fBwritten\fR in
different languages over your career:
.RS
.IP i)
50,000
.IP ii)
75,000
.IP iii)
100,000
.IP iv)
150,000
.IP v)
200,000
.IP vi)
275,000
.IP vii)
350,000+
.RE
.IP b)
How many lines of code would you estimate you have \fBread\fR in
different languages over your career:
.RS
.IP i)
75,000
.IP ii)
100,000
.IP iii)
150,000
.IP iv)
200,000
.IP v)
300,000
.IP vi)
500,000
.IP vii)
800,000+





--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  blog:shape-of-code.coding-guidelines.com
Source code analysishttp://www.knosof.co.uk

--
The Open University is incorporated by Royal Charter (RC 000391), an exempt charity 
in England & Wales and a charity registered in Scotland (SC 038302).