[GitHub] [incubator-mxnet] ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface

GitBox Thu, 07 Nov 2019 11:34:49 -0800

ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in 
gluon numpy interface
URL: https://github.com/apache/incubator-mxnet/pull/16716#issuecomment-551230288
 
 
   @reminisce No. That is not how it works and please do not undo performance 
optimizations because of such false presumptions (especially since this is not 
a user facing code, `zero_grad` is, so what "feels more natural" should not 
really matter).
   
   Let me give you a little bit of data and context - when pretraining BERT, 
zeroing of gradient arrays (which happened once per few iterations) took ~5% of 
time, because of this approach to launch each zeroing as a new operator. It is 
ridiculously high overhead. The actual GPU cost of this zeroing was minuscule, 
but the cost of the operator launch and sync at the end is 10+x that.
   
   About the fact that the new implementation uses cudaMemsetAsync per array - 
frankly that was a "good enough" fix for that (BERT) usecase. The actual full 
fix to this would mean writing an aggregated operator that would zero all the 
arrays inside a single kernel - I bet this would increase the performance of 
zeroing gradients by additional 2x or more.
   
   This comment:
   ```
   However, numpy serves as the first step to our longer-term evolvement and we 
should consider to solve it in a different way.
   ```
   together with
   ```
   First of all, it does not feel as natural as using for-loop for assigning 
zeros to a list of ndarrays. 
   ```
   scare me, frankly. Doing numpy-like fully imperative execution has no chance 
of being actually fast and leaning into that direction will not make MXNet any 
better. Gluon seems like a good compromise - sure, do imperative if you just 
debug but hybridize when you are actually serious about getting something 
deployable. And the fix proposed in this PR: if this is NumPy-like array then 
do slow thing (especially since as I understand it numpy style will be promoted 
going forward) is super bad practice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface

Reply via email to