Ross,

Reproducibility is extremely important (I do work on that subject myself),
but I do hope you will be splashing some sort of warning along the lines of
"This analysis was run using BioConductor package XXX v Y.Y.Y which
required an out of cycle bug fix. Such out of cycle fixes often indicate
major bugs which can invalidate results generated by the package in certain
situations. Consider rerunning the analysis under version Y.Y.(Y+k) to
ensure that the results do not depend on the fixed bug."

I view reproducibility not as an end-goal itself, but rather as serving two
purposes: validating and forensic. By validating I mean that we want to
know that results we are reading are correct (valid). A result being
reproducible suggests (though does not guarantee!!) that it is and thus
reproducible results are much more trustworthy than those that aren't. By
forensic I mean that if we know or suspect for some external reason that a
reproducible result IS invalid, the reproducibility allows us to
investigate how the result was arrived at to try and detect the error in
reasoning or computations (or bad luck in stochastic cases) that gave rise
to the result.

A warning such as the one above would assist in both arenas.

Just my $.02

~G


On Mon, Jul 8, 2013 at 11:01 PM, Ross Lazarus <
ross.laza...@channing.harvard.edu> wrote:

> Thanks Dan and Martin for the advice.
>
> It's interesting that this doesn't seem to have come up much before given
> our shared obsession with reproducible analyses - I'm sure you'd agree that
> (eg) any third party source update should mean a new, explicit dependency
> chain for a pipeline - truly reproducible analysis implies consistently
> reproducible bugs unfortunately IMHO.
>
> Dan's solution looks do-able - it will give the incantation needed to
> package up absolutely specific BioC dependencies and when we write an
> installation script, it can clone, compile and install. That will get us
> what we need assuming your svn server is happy with the extra load? With
> the new toolshed automated dependency mechanism, every Galaxy instance
> installing any Galaxy tool that needs a BioC package will hit your svn
> server to get a revision specific clone, which may be effective but is a
> little inefficient. On that score, for any given BioC package going
> forward, given that we won't be freezing anything other than *released*
> packages, a mechanism that provides easy to discover and stable urls for
> archived released package versions would help lower the load on your svn
> server - although if we're the only ones wanting this, I understand that
> it's going to be a very low priority for the bioc team.
>
> We'll test this out!
>
>
> On Tue, Jul 9, 2013 at 2:53 PM, Dan Tenenbaum <dtene...@fhcrc.org> wrote:
>
> > On Mon, Jul 8, 2013 at 9:14 PM, Ross Lazarus
> > <ross.laza...@channing.harvard.edu> wrote:
> > > Hi, Bioconductor devs,
> > >
> > > In very rare cases (eg
> > >
> >
> http://article.gmane.org/gmane.science.biology.informatics.conductor/35266/match=update+edgeR
> > )
> > > where BioC package authors have released an urgent bug fix within a
> given
> > > BioC update cycle, the usual automated biocLite installation process
> does
> > > not appear to support recreating a very specific R/Bioc environment
> > > containing a precisely specified package release (say the previous
> edgeR
> > > 2.2.0 fixed by 2.2.5). You might argue a user should never do this, but
> > > since we want truly reproducible analyses (the context is the new
> > toolshed
> > > dependency control mechanisms in Galaxy), we need to control *all*
> > > dependencies for a given release of a (eg edgeR) wrapper at this very
> > fine
> > > level of granularity, acknowledging that reproducible =/= valid.
> > >
> > > Take that edgeR update as a test case. I know about
> > > http://bioconductor.org/checkResults/ and eg
> > > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/edgeR/but I
> > > have been unable to figure out how to track down the 2.2.0 edgeR
> archive
> > > (or for that matter the 2.2.5 bugfix) - I'm sure it is in svn
> somewhere.
> > > Any advice on how I can identify a long term reliable svn or other url
> to
> > > script the download of a specific (even if know buggy) historical
> archive
> > > of (eg) edgeR 2.2.0?
> > >
> >
> > In the checked-out edgeR working directory, do:
> >
> > svn log --diff DESCRIPTION > diff.txt
> >
> > (this requires subversion >= 1.7)
> >
> > Then look in diff.txt for "Version: 2.2.0". This ends up being revision
> > 54800.
> >
> > Then you can check that out to a different directory with
> > svn co -r54800
> > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/edgeR/
> >
> > Dan
> >
> >
> > For more on the topic you bring up, see the thread started by
> > https://stat.ethz.ch/pipermail/bioconductor/2013-March/051224.html
> >
> >
> >
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioc-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to