ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface URL: https://github.com/apache/incubator-mxnet/pull/16716#issuecomment-551936503 @reminisce Huh, I did not know about that way of trying to bulk the imperative execution. You are right that if it worked well then that would solve this issue. Unfortunately, I tested it with this script: ``` import mxnet as mx import time arrays = [mx.nd.ones((100,100), ctx=mx.gpu()) for _ in range(500)] for a in arrays: a[:] = 0 mx.nd.waitall() start = time.time() for _ in range(10): for a in arrays: a[:] = 0 mx.nd.waitall() end = time.time() print("normal: Elapsed ", end - start) mx.nd.waitall() start = time.time() with mx.engine.bulk(len(arrays)): for _ in range(10): for a in arrays: a[:] = 0 mx.nd.waitall() end = time.time() print("bulk: Elapsed ", end - start) mx.nd.waitall() start = time.time() for _ in range(10): mx.nd.reset_arrays(*arrays, num_arrays=len(arrays)) mx.nd.waitall() end = time.time() print("reset_arrays: Elapsed ", end - start) ``` and got those results: ``` # python test.py normal: Elapsed 0.8372836112976074 bulk: Elapsed 0.6354436874389648 reset_arrays: Elapsed 0.016309261322021484 ``` (I also tried with the `with mx.engine.bulk()` line inside the `for _ in range(10)` loop, the results were similar). Looking at the profile, bulking does work (as in, it removes the synchronization between ops), but it introduces HUGE gaps between bulks (over 65 ms in this example). If we can fix that, then the `reset_arrays` approach will not be needed.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services