Dear all, 

I am facing extreme delays in mxnet initialization on distributed GPU training 
using horovod. Once I launch my code (for debugging, just 2 gpus), it populates 
the GPUs up to some memory level, and then it does not start training until 
after 30 minutes (yes, that is minutes, and am using 12 cpu cores per rank). 
The truth is that the computation graph of these latest models is very 
complicated, but I cannot believe that this can be the issue.  I hybridize the 
gluon models (net and loss function) prior to training with 
```net.hybridize(static_alloc=True,static_shape=True)```. The problem is not 
resolved by defining the cache as described in [issue 
3239](https://github.com/apache/incubator-mxnet/issues/3239#issuecomment-265103568).
 

Any pointers/help mostly appreciated.





---
[Visit 
Topic](https://discuss.mxnet.io/t/very-slow-initialisation-of-gpu-distributed-training/6357/1)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.io/email/unsubscribe/c3fb422ab93e8751a5252e3ac65d80138dd45a96ce494bd8f1202a06b793477a).

Reply via email to