Aaron points in the right direction with generating random number streams in the serial part of the program, then sending these to the workers in a consistent way. Use ?nextRNGStream to generate the streams for each replicate, and .Random.seed on the thread. Probably this generates a BiocCheck warning, but so long as the top-level generation of streams on the manager is under control of the user (e.g., there is no need for your function to call `set.seed()`; if the user wants reproducibility they can do that themselves in their own code) this will be ok.
Martin On 3/12/19, 10:37 PM, "Bioc-devel on behalf of Aaron Lun" <[email protected] on behalf of [email protected]> wrote: I think Kylie is saying that she wants to use the same seed for each feature across different runs, but the seed can be different across features - which would make more sense. Multi-worker reproducibility is an issue that we discussed before (the link goes into the middle of the thread): https://stat.ethz.ch/pipermail/bioc-devel/2019-January/014505.html The key thing is that, in addition to reproducibility, there is the issue of correctness with guaranteed independent streams. Some food for thought: in the vast majority of my parallelized applications, the heavy lifting (including the RNG'ing) is done in C++. If this is also the case for you, consider using the dqrng package to provide the C++ PRNG. I usually generate all my seeds in the serial part of the code, and then distribute seeds to the jobs where each job is set to a different "stream" value so that the sequence of random numbers is always different, regardless of the seed. As the serial seed generation is under the control of set.seed(), this provides correctness and reproducibility no matter how the jobs are distributed across workers. -A On 12/03/2019 17:42, Kasper Daniel Hansen wrote: > But why do you want the same seed for the different features? That is not > the right way to use stochastic methods. > > Best, > Kasper > > On Tue, Mar 12, 2019 at 5:20 PM Bemis, Kylie <[email protected]> > wrote: > >> Hi all, >> >> I remember similar questions coming up before, but couldn’t track any down >> that directly pertain to my situation. >> >> Suppose I want to use bplapply() in a function to fit models to many >> features, and I am applying over features. The models are stochastic, and I >> want the results to be reproducible, and preferably use the same RNG seed >> for each feature. So I could do: >> >> fitModels <- function(object, seed=1, BPPARAM=bpparam()) { >> bplapply(object, function(x) { >> set.seed(seed) >> fitModel(x) >> }, BPPARAM=BPPARAM) >> } >> >> But the BioC guidelines say not to use set.seed() inside function code, >> and I’ve seen other questions answered saying not to use “seed” as a >> function parameter in this way. >> >> Is it preferable to check and modify .Random.seed directly, or is there >> some other standard way of doing this? >> >> Thanks, >> Kylie >> >> ~~~ >> Kylie Ariel Bemis >> Khoury College of Computer Sciences >> Northeastern University >> kuwisdelu.github.io<https://kuwisdelu.github.io> >> >> >> >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > [[alternative HTML version deleted]] > > _______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
