Re: [Rd] [RFC] A case for freezing CRAN

Michael Weylandt Wed, 19 Mar 2014 19:47:27 -0700


On Mar 19, 2014, at 22:17, Gavin Simpson <ucfa...@gmail.com> wrote:


> Michael,
> 
> I think the issue is that Jeroen wants to take that responsibility out
> of the hands of the person trying to reproduce a work. If it used R
> 3.0.x and packages A, B and C then it would be trivial to to install
> that version of R and then pull down the stable versions of A B and C
> for that version of R. At the moment, one might note the packages used
> and even their versions, but what about the versions of the packages
> that the used packages rely upon & so on? What if developers don't
> state know working versions of dependencies?

Doesn't sessionInfo() give all of this?

If you want to be very worried about every last bit, I suppose it should also 
include options(), compiler flags, compiler version, BLAS details, etc.  (Good 
talk on the dregs of a floating point number and how hard it is to reproduce 
them across processors http://www.youtube.com/watch?v=GIlp4rubv8U)

> 
> The problem is how the heck do you know which versions of packages are
> needed if developers don't record these dependencies in sufficient
> detail? The suggested solution is to freeze CRAN at intervals
> alongside R releases. Then you'd know what the stable versions were.

Only if you knew which R release was used. 

> 
> Or we could just get package developers to be more thorough in
> documenting dependencies. Or R CMD check could refuse to pass if a
> package is listed as a dependency but with no version qualifiers. Or
> have R CMD build add an upper bound (from the current, at build-time
> version of dependencies on CRAN) if the package developer didn't
> include and upper bound. Or... The first is unliekly to happen
> consistently, and no-one wants *more* checks and hoops to jump through
> :-)
> 
> To my mind it is incumbent upon those wanting reproducibility to build
> the tools to enable users to reproduce works.

But the tools already allow it with minimal effort. If the author can't even 
include session info, how can we be sure the version of R is known. If we can't 
know which version of R, can we ever change R at all? Etc to absurdity. 

My (serious) point is that the tools are in place, but ramming them down folks' 
throats by intentionally keeping them on older versions by default is too much. 

> When you write a paper
> or release a tool, you will have tested it with a specific set of
> packages. It is relatively easy to work out what those versions are
> (there are tools in R for this). What is required is an automated way
> to record that info in an agreed upon way in an approved
> file/location, and have a tool that facilitates setting up a package
> library sufficient with which to reproduce a work. That approval
> doesn't need to come from CRAN or R Core - we can store anything in
> ./inst.

I think the package version and published paper cases are different. 

For the latter, the recipe is simple: if you want the same results, use the 
same  software (as noted by sessionInfoPlus() or equiv)

For the former, I think you start straying into this NP complete problem: 
http://people.debian.org/~dburrows/model.pdf 

Yes, a good config can (and should be recorded) but isn't that exactly what 
sessionInfo() gives?

> 
> Reproducibility is a very important part of doing "science", but not
> everyone using CRAN is doing that. Why force everyone to march to the
> reproducibility drum? I would place the onus elsewhere to make this
> work.
> 

Agreed: reproducibility is the onus of the author, not the reader


> Gavin
> A scientist, very much interested in reproducibility of my work and others.

Michael
In finance, where we call it "Auditability" and care very much as well :-)


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [RFC] A case for freezing CRAN

Reply via email to