I'll add a section to the BiocParallel docs.

Valerie

On 06/04/2015 07:55 AM, Kasper Daniel Hansen wrote:
Yes, based on the documentation that particular random stream generator
would work with mclapply.

This is absolutely a subject which ought to be covered in the BiocParallel
documentation.

And commenting on another set of recommendations: please NEVER used
set.seed inside a function.  Unfortunately, because of the way R works,
this is a really bad idea.  As is functions with arguments like (set.seed =
FALSE).  Users need to be educated about this.  The main issue with using
set.seed is when your work is wrapped into other peoples code, for example
with an external bootstrap or similar.  I understand the desire for
reproducibility, but the design of the random generator in R is such that
this should really be left to the user.

Kasper

On Thu, Jun 4, 2015 at 10:39 AM, Vincent Carey <st...@channing.harvard.edu>
wrote:

It does appear to me that the doRNG vignette sec 1.1 describes a solution
to the problem posed.  It is less clear to me that this method is readily
adopted with BiocParallel unless registerDoPar is in use....  Should we
address this topic explicitly in the vignette?

On Thu, Jun 4, 2015 at 9:50 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

Note you're not guaranteed that two random streams starting with different
seeds will be (approximately) independent, so the suggestion on SO makes
the numbers reproducible but technically wrong.

If you want true independence you either need to use a parallel version of
the random number generator or you do what I suggested.  Because of how
mclapply works (via fork) it is not clear to me that it is possible to use
a parallel version of the random number generator, but I am not sure about
this.  The snippet from the documentation quoted above suggests I am
wrong.

Best,
Kasper

On Wed, Jun 3, 2015 at 11:25 PM, Vladislav Petyuk <pet...@gmail.com>
wrote:

There are different ways set.seed can be used.  The way it is suggested
on
the aforementioned stackoverflow post is basically a two stage process.
First seed is provided by a user (set.seed(1)).  That is user can change
the outcome from run to run.  Based on that seed, a vector of randomized
seeds is generated (seeds <- sample.int(length(input), replace=TRUE)).
Those seeds are basically arguments to the function under
mclapply/lapply
that help to control random number generation for each iteration
(set.seed
(seeds[idx])).
There are two different roles of set.seed. First left the user to
control
random number generation and the second (within the function) makes sure
that it is the same for individual iterations regardless how the loop is
executed.
Does that make sense?

On Wed, Jun 3, 2015 at 7:07 PM, Yu, Guangchuang <g...@connect.hku.hk>
wrote:

There is one possible solution posted in


http://stackoverflow.com/questions/30610375/how-to-run-permutations-using-mclapply-in-a-reproducible-way-regardless-of-numbe/30627984#30627984
.

As Kasper suggested, it's not a proper way to use set.seed inside a
package.

I suggest using a parameter for example seed=FALSE to disable the
set.seed
and if user want the result reproducible, e.g. in demonstration, set
seed=TRUE explicitly and set.seed will be run inside the function.

Bests,
Guangchuang

On Wed, Jun 3, 2015 at 8:42 PM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:

For this situation, generate the permutation indexes outside of the
mclapply, and the do mclapply over a list with the indices.

And btw., please don't use set.seed inside a package; that control
should
completely be left to the user.

Best,
Kasper

On Wed, Jun 3, 2015 at 7:08 AM, Vincent Carey <
st...@channing.harvard.edu>
wrote:

This document indicates how to achieve reproducibility independent
of
the
underlying physical environment.

http://cran.r-project.org/web/packages/doRNG/vignettes/doRNG.pdf

Let me know if that satisfies the question.

On Wed, Jun 3, 2015 at 5:32 AM, Yu, Guangchuang <
g...@connect.hku.hk>
wrote:

Der Vincent,

RNGkind("L'Ecuyer-CMRG") works as using mc.set.seed=FALSE.

When mc.cores changes, the output is not reproducible.

I think this issue is also of concern within the Bioconductor
community
as parallel version of permutation test is commonly used now.

Best Regards,

Guangchuang



On Wed, Jun 3, 2015 at 5:17 PM, Vincent Carey <
st...@channing.harvard.edu>
wrote:

Hi, this question belongs on R-help, but perhaps




https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/RngStream.html

will be useful.

Best regards

On Wed, Jun 3, 2015 at 3:11 AM, Yu, Guangchuang <
g...@connect.hku.hk>
wrote:

Dear all,

I have an issue of setting seed value when using parallel
package.

library("parallel")
library("digest")

set.seed(0)
m <- mclapply(1:10, function(x) sample(1:10),
+               mc.cores=2)
digest(m, 'crc32')
[1] "4827c80c"

set.seed(0)
m <- mclapply(1:10, function(x) sample(1:10),
+               mc.cores=2)
digest(m, 'crc32')
[1] "e95b9134"

By default, set.seed() will be ignored since mclapply will set
the
seed
internally.

If we use mc.set.seed=FALSE to disable this feature. It works as
indicated
below:

set.seed(0)
m <- mclapply(1:10, function(x) sample(1:10),
+               mc.cores=2, mc.set.seed = FALSE)
digest(m, 'crc32')
[1] "6bbada78"

set.seed(0)
m <- mclapply(1:10, function(x) sample(1:10),
+               mc.cores=2, mc.set.seed = FALSE)
digest(m, 'crc32')
[1] "6bbada78"

The problems is that the results are also depending on the
number
of
cores.

set.seed(0)
m <- mclapply(1:10, function(x) sample(1:10),
+               mc.cores=4, mc.set.seed = FALSE)
digest(m, 'crc32')
[1] "a22e0aab"


Any idea?

Best Regards,
Guangchuang
--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu, PhD Candidate
State Key Laboratory of Emerging Infectious Diseases
School of Public Health
The University of Hong Kong
Hong Kong SAR, China
www: http://ygc.name
-~----------~----~----~----~------~----~------~--~---

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu, PhD Candidate
State Key Laboratory of Emerging Infectious Diseases
School of Public Health
The University of Hong Kong
Hong Kong SAR, China
www: http://ygc.name
-~----------~----~----~----~------~----~------~--~---


         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





--
--~--~---------~--~----~------------~-------~--~----~
Guangchuang Yu, PhD Candidate
State Key Laboratory of Emerging Infectious Diseases
School of Public Health
The University of Hong Kong
Hong Kong SAR, China
www: http://ygc.name
-~----------~----~----~----~------~----~------~--~---

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel




        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: voben...@fredhutch.org
Phone: (206) 667-3158

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to