Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

Tim Triche, Jr. Wed, 06 Mar 2013 11:13:14 -0800

seen QuasR (and/or gmapR, Rsubread, etc.)?  one can run BowTie, gsnap, etc.
from R


this certainly makes it easier for me to remember how I did some ChIP-seq
or BS-seq or RNA-seq processing a year ago, when it turns out I need to add
a sample or samples and carry on with an existing analysis pipeline




On Wed, Mar 6, 2013 at 10:17 AM, Cook, Malcolm <m...@stowers.org> wrote:

> Thanks David, I've looked into them both a bit, and I don't think the
> provide an approach for R (or Perl, for that matter) library management,
> which is the wicket I'm trying to get less sticky now.
>
> They could be useful to manage the various installations of version of R
> and analysis files (we're talking allot of NextGenSequencing, so, bowtie,
> tophat, and friends) quite nicely similarly in service of an approach to
> enabling reproducible results.
>
> THanks for you thoughts, and, if you know of others similar to
> dotkit/modules I'd be keen to here of them.
>
> ~Malcolm
>
>
>  .-----Original Message-----
>  .From: Lapointe, David [mailto:david.lapoi...@umassmed.edu]
>  .Sent: Wednesday, March 06, 2013 7:46 AM
>  .To: Cook, Malcolm; 'Paul Gilbert'
>  .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; '
> r-discuss...@listserv.stowers.org'
>  .Subject: RE: [BioC] [Rd] enabling reproducible research & R package
> management & install.package.version & BiocLite
>  .
>  .There are utilities ( e.g. dotkit, and modules) which facilitate version
> management, basically creating on the fly PATH and env setups, if
>  .you are comfortable keeping all that around.
>  .
>  .David
>  .
>  .-----Original Message-----
>  .From: bioconductor-boun...@r-project.org [mailto:
> bioconductor-boun...@r-project.org] On Behalf Of Cook, Malcolm
>  .Sent: Tuesday, March 05, 2013 6:08 PM
>  .To: 'Paul Gilbert'
>  .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; '
> r-discuss...@listserv.stowers.org'
>  .Subject: Re: [BioC] [Rd] enabling reproducible research & R package
> management & install.package.version & BiocLite
>  .
>  .Paul,
>  .
>  .I think your balanced and reasoned approach addresses all my current
> concerns.  Nice!  I will likely adopt your methods.  Let me
>  .ruminate.  Thanks for this.
>  .
>  .~ Malcolm
>  .
>  . .-----Original Message-----
>  . .From: Paul Gilbert [mailto:pgilbert...@gmail.com]
>  . .Sent: Tuesday, March 05, 2013 4:34 PM
>  . .To: Cook, Malcolm
>  . .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; '
> r-discuss...@listserv.stowers.org'
>  . .Subject: Re: [Rd] [BioC] enabling reproducible research & R package
> management & install.package.version & BiocLite  .
>  . .(More on the original question further below.)  .
>  . .On 13-03-05 09:48 AM, Cook, Malcolm wrote:
>  . .> All,
>  . .>
>  . .> What got me started on this line of inquiry was my attempt at  .>
> balancing the advantages of performing a periodic (daily or
>  .weekly)  .> update to the 'release' version of locally installed
> R/Bioconductor  .> packages on our institute-wide installation of R with
>  .the  .> disadvantages of potentially changing the result of an analyst's
>  .> workflow in mid-project.
>  . .
>  . .I have implemented a strategy to try to address this as follows:
>  . .
>  . .1/ Install a new version of R when it is released, and packages in the
> R  .version's site-library with package versions as available at the
>  .time  .the R version is installed. Only upgrade these package versions
> in the  .case they are severely broken.
>  . .
>  . .2/ Install the same packages in site-library-fresh and upgrade these
>  .package versions on a regular basis (e.g. daily).
>  . .
>  . .3/ When a new version of R is released, freeze but do not remove the
> old  .R version, at least not for a fairly long time, and freeze
>  ..site-library-fresh for the old version. Begin with the new version as
> in  .1/ and 2/. The old version remains available, so "reverting" is
>  .trivial.
>  . .
>  . .
>  . .The analysts are then responsible for choosing the R version they use,
>  .and the library they use. This means they do not have to
>  .change R and  .package version mid-project, but they can if they wish. I
> think the  .above two libraries will cover most cases, but it is
>  .possible that a few  .projects will need their own special library with
> a combination of  .package versions. In this case the user could
>  .create their own library,  .or you might prefer some more official
> mechanism.
>  . .
>  . .The idea of the above strategy is to provide the stability one might
>  .want for an ongoing project, and the possibility of an upgraded
>  .package  .if necessary, but not encourage analysts to remain
> indefinitely with old  .versions (by say, putting new packages in an old R
>  .version library).
>  . .
>  . .This strategy has been implemented in a set of make files in the
> project  .RoboAdmin available at http://automater.r-forge.r-
>  .project.org/. It can  .be done entirely automatically with a cron job.
> Constructive comments  .are always appreciated.
>  . .
>  . .(IT departments sometimes think that there should be only one version
> of  .everything available, which they test and approve. So
>  .the initial  .reaction to this approach could be negative. I think they
> have not  .really thought about the advantages. They usually
>  .cannot test/approve an  .upgrade without user input, and timing is often
> extremely complicate  .because of ongoing user needs. This
>  .strategy is simply shifting  .responsibility and timing to the users, or
> user departments, that can  .actually do the testing and
>  .approving.)  .
>  . .Regarding NFS mounts, it is relatively robust. There can be occasional
>  .problems, especially for users that have a habit of keeping an
>  .R session  .open for days at a time and using site-library-fresh
> packages. In my  .experience this did not happen often enough to worry
>  .about a "blackout  .period".
>  . .
>  . .Regarding the original question, I would like to think it could be
>  .possible to keep enough information to reproduce the exact
>  .environment,  .but I think for potentially sensitive numerical problems
> that is  .optimistic. As others have pointed out, results can
>  .depend not only on R  .and package versions, configuration, OS versions,
> and library and  .compiler versions, but also on the
>  .underlying hardware. You might have  .some hope using something like an
> Amazon core instance. (BTW, this  .problem is not specific
>  .to R.)  .
>  . .It is true that restricting to a fixed computing environment at your
>  .institution may ease things somewhat, but if you occasionally
>  .upgrade  .hardware or the OS then you will probably lose reproducibility.
>  . .
>  . .An alternative that I recommend is that you produce a set of tests
> that  .confirm the results of any important project. These can be
>  .conveniently  .put in the tests/ directory of an R package, which is
> then maintained  .local, not on CRAN, and built/tested whenever a
>  .new R and packages are  .installed. (Tools for this are also available
> at the above indicated web
>  . .site.) This approach means that you continue to reproduce the old
>  .results, or if not, discover differences/problems in the old or new
>  ..version of R and/or packages that may be important to you. I have been
>  .successfully using a variant of this since about 1993, using R
>  .and  .package tests/ since they became available.
>  . .
>  . .Paul
>  . .
>  . .>
>  . .> I just got the "green light" to institute such periodic updates that
>  .> I have been arguing is in our collective best interest.  In return,
>  ..> I promised my best effort to provide a means for preserving or  .>
> reverting to a working R library configuration.
>  . .>
>  . .> Please note that the reproducibility I am most eager to provide is
>  .> limited to reproducibility within the computing environment of
>  .our  .> institute, which perhaps takes away some of the dragon's nests,
>  .> though certainly not all.
>  . .>
>  . .> There are technical issues of updating package installations on an
>  .> NFS mount that might have files/libraries open on it from
>  .running R  .> sessions.  I am interested in learning of approaches for
>  .> minimizing/eliminating exposure to these issue as well.  The  .>
>  .first/best approach seems to be to institute a 'black out' period
>  . .> when users should expect the installed library to change.   Perhaps
>  . .> there are improvements to this????
>  . .>
>  . .> Best,
>  . .>
>  . .> Malcolm
>  . .>
>  . .>
>  . .> .-----Original Message----- .From: Mike Marchywka  .> [mailto:
> marchy...@hotmail.com] .Sent: Tuesday, March 05, 2013 5:24  .>
>  .AM .To: amac...@virginia.edu; Cook, Malcolm .Cc:
>  . .> r-devel@r-project.org; bioconduc...@r-project.org;  .>
> r-discuss...@listserv.stowers.org .Subject: RE: [Rd] [BioC] enabling  .>
>  .reproducible research & R package management &  .>
> install.package.version & BiocLite . . .I hate to ask what go this  .>
> thread started
>  .but it sounds like someone was counting on .exact  .> numeric
> reproducibility or was there a bug in a specific release? In  .> actual
>  ..fact, the best way to determine reproducibility is run the  .> code in
> a variety of .packages. Alternatively, you can do everything  .> in
>  .java and not assume .that calculations commute or associate as the  .>
> code is modified but it seems .pointless. Sensitivity
>  .determination  .> would seem to lead to more reprodicible results .than
> trying to keep  .> a specific set of code quirks. . .I also seem to
>  .recall that FPU may  .> have random lower order bits in some cases,
> .same code/data give  .> different results. Alsways assume FP is
>  .stochastic and plan .on  .> anlayzing the "noise." . .
> .----------------------------------------
>  . .> .> From: amac...@virginia.edu .> Date: Mon, 4 Mar 2013 16:28:48  .>
> -0500 .> To: m...@stowers.org .> CC: r-devel@r-project.org;
>  ..> bioconduc...@r-project.org; r-discuss...@listserv.stowers.org .>  .>
> Subject: Re: [Rd] [BioC] enabling reproducible research & R
>  .package  .> management & install.package.version & BiocLite .> .> On
> Mon, Mar 4,  .> 2013 at 4:13 PM, Cook, Malcolm
>  .<m...@stowers.org> wrote: .> .> > *  .> where do the dragons lurk .> >
> .> .> webs of interconnected  .> dynamically loaded libraries,
>  .identical versions of .> R compiled  .> with different BLAS/LAPACK
> options, etc. Go with the VM if you .>  .> really, truly, want this level
>  .of exact reproducibility. .> .> An  .> alternative (and arguably more
> useful) strategy would be to cache .>  .> results of each
>  .computational step, and report when results differ  .> upon .>
> re-execution with identical inputs; if you cache sessionInfo  .> along
>  .with .> each result, you can identify which package(s) changed,  .> and
> begin to hunt .> down why the change occurred (possibly for
>  .the  .> better); couple this with .> the concept of keeping both code
> *and*  .> results in version control, then you .> can move forward
>  .with a  .> (re)analysis without being crippled by out-of-date .>
> software. .> .>  .> -Aaron .> .> -- .> Aaron J. Mackey, PhD .> Assistant
>  .Professor .>  .> Center for Public Health Genomics .> University of
> Virginia .>  .> amac...@virginia.edu .>
>  .http://www.cphg.virginia.edu/mackey .> .>  .> [[alternative HTML
> version deleted]] .> .>  .>
>  .______________________________________________ .>  .>
> R-devel@r-project.org mailing list .>  .>
>  .https://stat.ethz.ch/mailman/listinfo/r-devel .
>  . .>
>  . .> ______________________________________________ R-devel@r-project.org .> 
> mailing list
>  .https://stat.ethz.ch/mailman/listinfo/r-devel
>  . .>
>  .
>  ._______________________________________________
>  .Bioconductor mailing list
>  .bioconduc...@r-project.org
>  .https://stat.ethz.ch/mailman/listinfo/bioconductor
>  .Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> bioconduc...@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite

Reply via email to