Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Chris Olivier Mon, 08 Jan 2018 10:30:17 -0800

Is it explicitly defined somewhere that random number generators should
always return a deterministic set of numbers given the same seed, or is
that just a side-effect of some hardware not having a better way to
generate random numbers so they use a user-defined seed to kick off the
randomization starting point?


On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hello MXNet devs,
>
> I wanted to see what people thought about the follow section of code, which
> I think has some subtle pros/cons:
> https://github.com/apache/incubator-mxnet/blob/
> d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>
> Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
> debugging non-determinism in his model training.
>
> This functionality is well documented here:
> https://mxnet.incubator.apache.org/api/python/ndarray.
> html#mxnet.random.seed
> but I don't think the current api meets all use cases due to this section:
>
> "Random number generators in MXNet are device specific. Therefore, random
> numbers generated from two devices can be different even if they are seeded
> using the same seed."
>
> I'm guessing this is a feature that makes distributed training easier in
> MXNet, you wouldn't want to train the same model on each GPU.  However the
> downside of this is that if you run unit tests on a multi-gpu system, or in
> a training environment where you don't have control over which GPU you use,
> you can't count on deterministic behaviour which you can assert results
> against.  I have a feeling there are non-unit test use cases where you'd
> also want deterministic behaviour independent of which gpu you happen to
> have your code scheduled to run on.
>
> How do others feel about this?  Would it make sense to have some optional
> args in the seed call to have the seed-per-device functionality turned off?
>
> -Kellen
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Reply via email to