Re: [Bioc-devel] set.seed and BiocParallel

Martin Morgan Wed, 13 Mar 2019 02:21:50 -0700

Aaron points in the right direction with generating random number streams in 
the serial part of the program, then sending these to the workers in a 
consistent way. Use ?nextRNGStream to generate the streams for each replicate, 
and .Random.seed on the thread. Probably this generates a BiocCheck warning, 
but so long as the top-level generation of streams on the manager is under 
control of the user (e.g., there is no need for your function to call 
`set.seed()`; if the user wants reproducibility they can do that themselves in 
their own code) this will be ok.


Martin

On 3/12/19, 10:37 PM, "Bioc-devel on behalf of Aaron Lun" 
<[email protected] on behalf of 
[email protected]> wrote:

    I think Kylie is saying that she wants to use the same seed for each 
    feature across different runs, but the seed can be different across 
    features - which would make more sense.
    
    Multi-worker reproducibility is an issue that we discussed before (the 
    link goes into the middle of the thread):
    
    https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014505.html
    
    The key thing is that, in addition to reproducibility, there is the 
    issue of correctness with guaranteed independent streams.
    
    Some food for thought: in the vast majority of my parallelized 
    applications, the heavy lifting (including the RNG'ing) is done in C++. 
    If this is also the case for you, consider using the dqrng package to 
    provide the C++ PRNG. I usually generate all my seeds in the serial part 
    of the code, and then distribute seeds to the jobs where each job is set 
    to a different "stream" value so that the sequence of random numbers is 
    always different, regardless of the seed. As the serial seed generation 
    is under the control of set.seed(), this provides correctness and 
    reproducibility no matter how the jobs are distributed across workers.
    
    -A
    
    On 12/03/2019 17:42, Kasper Daniel Hansen wrote:
    > But why do you want the same seed for the different features? That is not
    > the right way to use stochastic methods.
    > 
    > Best,
    > Kasper
    > 
    > On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie <[email protected]>
    > wrote:
    > 
    >> Hi all,
    >>
    >> I remember similar questions coming up before, but couldn’t track any 
down
    >> that directly pertain to my situation.
    >>
    >> Suppose I want to use bplapply() in a function to fit models to many
    >> features, and I am applying over features. The models are stochastic, 
and I
    >> want the results to be reproducible, and preferably use the same RNG seed
    >> for each feature. So I could do:
    >>
    >> fitModels <- function(object, seed=1, BPPARAM=bpparam()) {
    >> bplapply(object, function(x) {
    >> set.seed(seed)
    >> fitModel(x)
    >> }, BPPARAM=BPPARAM)
    >> }
    >>
    >> But the BioC guidelines say not to use set.seed() inside function code,
    >> and I’ve seen other questions answered saying not to use “seed” as a
    >> function parameter in this way.
    >>
    >> Is it preferable to check and modify .Random.seed directly, or is there
    >> some other standard way of doing this?
    >>
    >> Thanks,
    >> Kylie
    >>
    >> ~~~
    >> Kylie Ariel Bemis
    >> Khoury College of Computer Sciences
    >> Northeastern University
    >> kuwisdelu.github.io<https://kuwisdelu.github.io>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>
    >>          [[alternative HTML version deleted]]
    >>
    >> _______________________________________________
    >> [email protected] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >>
    > 
    >   [[alternative HTML version deleted]]
    > 
    > _______________________________________________
    > [email protected] mailing list
    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >
    
    _______________________________________________
    [email protected] mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] set.seed and BiocParallel

Reply via email to