sxjscience edited a comment on issue #16705: Dropout inconsistency bug URL: https://github.com/apache/incubator-mxnet/issues/16705#issuecomment-549683831 With the help of @xidulu , we have located the root cause of the issue: The bug is triggered because we have multiple parallel GPU random resources: https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/resource.cc#L93-L94 When we create a new Dropout Node, we will attach a random resource to the node: https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/operator/nn/dropout.cc#L148-L164 Since there are multiple random resources, we select one in a round-robin fashion. Each resource has it's specific seed, which results in the inconsistent behavior. https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/resource.cc#L344-L351 The simplest fix is to use 1 GPU random generator. Thus, setting `os.environ['MXNET_GPU_PARALLEL_RAND_COPY'] = '1'` will fix this problem: ```python import os os.environ['MXNET_GPU_PARALLEL_RAND_COPY'] = '1' import mxnet as mx import numpy as np import random from numpy.testing import assert_allclose base_y_np = None for nrepeat in [1, 2, 3, 4]: seed = 123 mx.random.seed(seed) np.random.seed(seed) random.seed(seed) x = mx.nd.ones((3, 3), ctx=mx.gpu()) for _ in range(nrepeat): y = mx.nd.Dropout(x, cudnn_off=True) with mx.autograd.record(): y = mx.nd.Dropout(x, cudnn_off=True) y_np = y.asnumpy() if base_y_np is None: base_y_np = y_np else: assert_allclose(base_y_np, y_np)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services