ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface URL: https://github.com/apache/incubator-mxnet/pull/16716#issuecomment-551230288 @reminisce No. That is not how it works and please do not undo performance optimizations because of such false presumptions (especially since this is not a user facing code, `zero_grad` is, so what "feels more natural" should not really matter). Let me give you a little bit of data and context - when pretraining BERT, zeroing of gradient arrays (which happened once per few iterations) took ~5% of time, because of this approach to launch each zeroing as a new operator. It is ridiculously high overhead. The actual GPU cost of this zeroing was minuscule, but the cost of the operator launch and sync at the end is 10+x that. About the fact that the new implementation uses cudaMemsetAsync per array - frankly that was a "good enough" fix for that (BERT) usecase. The actual full fix to this would mean writing an aggregated operator that would zero all the arrays inside a single kernel - I bet this would increase the performance of zeroing gradients by additional 2x or more. This comment: ``` However, numpy serves as the first step to our longer-term evolvement and we should consider to solve it in a different way. ``` together with ``` First of all, it does not feel as natural as using for-loop for assigning zeros to a list of ndarrays. ``` scare me, frankly. Doing numpy-like fully imperative execution has no chance of being actually fast and leaning into that direction will not make MXNet any better. Gluon seems like a good compromise - sure, do imperative if you just debug but hybridize when you are actually serious about getting something deployable. And the fix proposed in this PR: if this is NumPy-like array then do slow thing (especially since as I understand it numpy style will be promoted going forward) is super bad practice.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services