wenyangchu commented on issue #11341: Deterministic cudnn algorithms URL: https://github.com/apache/incubator-mxnet/issues/11341#issuecomment-399475683 Hi @DickJC123 , Thanks for your reply, I did an implementation this week due to my urgent need. I just put it into a pullrequest meant for discussion for now: @https://github.com/apache/incubator-mxnet/pull/11361 Please check the last 2 commits. For your questions: 1. If MXNET_PREFER_DETERMINISM is set and it can not find a deterministic algorithm, I suppose it has to have a fatal error because user's need is not to be able to be satisfied. 2. I think it is a good idea to have it for the entire platform but I will try to solve it with cudnn first because it is the most used one I suppose? I do not see other obvious issue in other backends yet maybe anyone else can suggest where can be not deterministic? I have tested CPU version with intel MKL in a limited scenarios and it was deterministic for training. I think we need advice and tests to figure out which part of any other backends is not deterministic. 3. I think it is good to have a global determinism control if feasible. If it is possible to have control over individual layers, I think it is also very good to have. In the pullrequest I added deterministic parameter (default = False) to Maxpooling: nn.MaxPool2D(pool_size=(3,3), strides=(2,2) ,deterministic=True) Added env parameters to select Deterministic algorithms for Conv back propagation algorithm os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"] = "3" Old: #Value of 1 chooses the best algo in a limited workspace #Value of 2 chooses the fastest algo whose memory requirements may be larger than the default workspace threshold Added: #Value of 3 choose the deterministic best algo in a limited workspace #Value of 4 chooses the deterministic fastest algo whose memory requirements may be larger than the default workspace threshold They could be replaced by a global deterministic flag. 4. As you see above, I actually think it is good to let user to select deterministic algorithm according to constraints: speed or memory size. The problem of this solution is that, if cudnn chooses different deterministic algos, it can fail repeatability. I think it is good to have another mechanism to let user to select cudnn algorithm directly if available.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services