Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

kellen sunderland Tue, 09 Jan 2018 07:59:47 -0800

Sorry if I'm misunderstanding your question here Chris.

On Tue, Jan 9, 2018 at 4:58 PM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:


> I think the convention is that random generators in most modern languages
> are always seeded, and always deterministic.  If a user seed isn't
> supplied, implementations generally provide their own seed, which they
> attempt to make unique.  Often they generate a seed that takes into account
> the current time.  This is at least the case for many mainstream languages.
>
> Java implementation: https://docs.oracle.com/javase/8/docs/api/
> java/util/Random.html
> Remarks: "If two instances of Random are created with the same seed, and
> the same sequence of method calls is made for each, they will generate and
> return identical sequences of numbers."
>
> C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
> Remarks: "Providing an identical seed value to different Random objects
> causes each instance to produce identical sequences of random numbers. This
> is often done when testing apps that rely on random number generators."
>
> On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <cjolivie...@gmail.com>
> wrote:
>
>> wait wait — i don’t think that random number generators should return
>> deterministic lists of numbers. i’m asking if something says it’s supposed
>> to. i know they tend to, but my understanding is that they tend to because
>> of the challenge of generating true random numbers from hardware.  IMHO
>> the
>> ideal random number generator would not return a determinaiticnset if
>> numbers regardless of seed.
>>
>> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <pedro.larroy.li...@gmail.com
>> >
>> wrote:
>>
>> > For enabling parallel deterministic testing we can set an environment
>> > variable and set the same seed on different devices for those cases
>> > where we want it, leaving the default as it is. I think this would be
>> > an easy solution that wouldn't change any behaviour in training on
>> > multi-gpu.
>> >
>> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
>> > <kellen.sunderl...@gmail.com> wrote:
>> > > Thanks Asmus, yes this is also the approach I would be in favour of.
>> I
>> > > think we should optionally allow the user to specify if they want
>> > > deterministic behaviour independent of the GPU they run on.  If MXNet
>> is
>> > > going to support more arbitrary linear algabra operations I could see
>> a
>> > lot
>> > > of use cases for this.  For example I want deterministic noise fed
>> into a
>> > > deep-RL simulation so that I can compare a few different algorithms
>> > without
>> > > variance, and do it in parallel on my machine (that happens to have
>> two
>> > > GPUs).
>> > >
>> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
>> > <asmushet...@yahoo.de.invalid>
>> > > wrote:
>> > >
>> > >>  The issue is tricky. Number generators should return deterministic
>> sets
>> > >> of numbers as Chris said, but that usually only applies to
>> > non-distributed
>> > >> systems. And to some extend, we have already a distributed system as
>> > soon
>> > >> as one cpu and one gpu is involved.
>> > >> For the usual setup like distributed training, using different seeds
>> on
>> > >> different devices is a must. You distribute a process that involves
>> > random
>> > >> number generation and that means that you absolutely have to ensure
>> that
>> > >> the sequences on the devices do not correlate. So this behaviour is
>> > >> intended and correct. We also can not guarantee that random number
>> > >> generation is deterministic when running on CPU vs. running on GPU.
>> > >> So what we are dealing here is generating repeatable results, when
>> the
>> > >> application/code section is running on a single GPU out of a bigger
>> set
>> > of
>> > >> available GPUs, but we do not have control on which one. The crucial
>> > line
>> > >> in mxnet is this one (resource.cc):
>> > >>
>> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
>> > >> kRandMagic;
>> > >> Here I think it would make sense to add a switch that optionally
>> makes
>> > >> this setting independent of ctx.dev_id. But we would have to document
>> > >> really well that this is solely meant for specific types of
>> > debugging/unit
>> > >> testing.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
>> > >> cjolivie...@gmail.com> Folgendes geschrieben:
>> > >>
>> > >>  Is it explicitly defined somewhere that random number generators
>> should
>> > >> always return a deterministic set of numbers given the same seed, or
>> is
>> > >> that just a side-effect of some hardware not having a better way to
>> > >> generate random numbers so they use a user-defined seed to kick off
>> the
>> > >> randomization starting point?
>> > >>
>> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
>> > >> kellen.sunderl...@gmail.com> wrote:
>> > >>
>> > >> > Hello MXNet devs,
>> > >> >
>> > >> > I wanted to see what people thought about the follow section of
>> code,
>> > >> which
>> > >> > I think has some subtle pros/cons:
>> > >> > https://github.com/apache/incubator-mxnet/blob/
>> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>> > >> >
>> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent
>> some
>> > time
>> > >> > debugging non-determinism in his model training.
>> > >> >
>> > >> > This functionality is well documented here:
>> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
>> > >> > html#mxnet.random.seed
>> > >> > but I don't think the current api meets all use cases due to this
>> > >> section:
>> > >> >
>> > >> > "Random number generators in MXNet are device specific. Therefore,
>> > random
>> > >> > numbers generated from two devices can be different even if they
>> are
>> > >> seeded
>> > >> > using the same seed."
>> > >> >
>> > >> > I'm guessing this is a feature that makes distributed training
>> easier
>> > in
>> > >> > MXNet, you wouldn't want to train the same model on each GPU.
>> However
>> > >> the
>> > >> > downside of this is that if you run unit tests on a multi-gpu
>> system,
>> > or
>> > >> in
>> > >> > a training environment where you don't have control over which GPU
>> you
>> > >> use,
>> > >> > you can't count on deterministic behaviour which you can assert
>> > results
>> > >> > against.  I have a feeling there are non-unit test use cases where
>> > you'd
>> > >> > also want deterministic behaviour independent of which gpu you
>> happen
>> > to
>> > >> > have your code scheduled to run on.
>> > >> >
>> > >> > How do others feel about this?  Would it make sense to have some
>> > optional
>> > >> > args in the seed call to have the seed-per-device functionality
>> turned
>> > >> off?
>> > >> >
>> > >> > -Kellen
>> > >> >
>> > >>
>> > >>
>> >
>>
>
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Reply via email to