[GitHub] DickJC123 edited a comment on issue #14029: Out of memory error in 3d Conv for matrix splits > 10, CUDNN strange behaviour

GitBox Thu, 31 Jan 2019 11:08:14 -0800

DickJC123 edited a comment on issue #14029: Out of memory error in 3d Conv for 
matrix splits > 10, CUDNN strange behaviour
URL: 
https://github.com/apache/incubator-mxnet/issues/14029#issuecomment-459466018
 
 
   As pointed out earlier, going from 10 to 11 is the threshhold for when cudnn 
thinks the fft implementation is fastest.  That algo apparently has a huge 
workspace requirement, probably related to being 3D.  There is no cudnn bug 
here.  You have a couple of remedies:
   
   1. Set MXNET_CUDNN_AUTOTUNE_DEFAULT=1.  That will result in all convolutions 
in your model being chosen by cudnnFind(), subject to the limitation that the 
workspace is less than 1GB.  The detrimental fft will be avoided because its 
workspace is too large (although the model may run slower).
   2. Leave MXNET_CUDNN_AUTOTUNE_DEFAULT=0, but control the 3d convolution 
locally, e.g. with
   Convolution(..., cudnn_tune='fastest', ...).  Only the problem Convolution 
will have its algo determined by cudnnFind(), subject to a workspace limitation 
of 1GB.  If you don't like the 1GB, then override it locally with e.g. 
workspace=2048 to set the workspace to 2GB.
   
   There is currently no way to limit algos by workspace size without also 
running cudnnFind().  We could add this functionality in a backward-compatible 
way by adding a new supported value to MXNET_CUDNN_AUTOTUNE_DEFAULT:
       - Values: 0, 1, 2, **or 3** (default=1)
       - The default value of cudnn auto tuning for convolution layers.
       - Value of 0 means there is no auto tuning to pick the convolution algo
       - Performance tests are run to pick the convolution algo when value is 1 
or 2
       - Value of 1 chooses the best algo in a limited workspace
       - Value of 2 chooses the fastest algo whose memory requirements may be 
larger than the default workspace threshold
       - **Value of 3 means there is no auto tuning to pick the convolution 
algo, but the algo cannot have a workspace requirement greater than the limit.**
   
   There would be a locally set equivalent to this in the Convolution 
parameters:
       - cudnn_tune='off'                                     # use cudnnGet(), 
no workspace limit, even if set locally
       - **cudnn_tune='off_limited_workspace'   # use cudnnGet() subject to 1GB 
or locally-set limit**
       - cudnn_tune='limited_workspace'            # use cudnnFind() subject to 
1GB or locally-set limit
       - cudnn_tune='fastest'                              # use cudnnFind(), 
no workspace limit, even if set locally
   
   While we're at it, I'm not fond of the compiled in default workspace size of 
1GB.  I'd suggest adding an environment variable:
   
   MXNET_CUDNN_WORKSPACE_LIMIT_DEFAULT   # If not set, then limit = 1024 (MB)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] DickJC123 edited a comment on issue #14029: Out of memory error in 3d Conv for matrix splits > 10, CUDNN strange behaviour

Reply via email to