Hi @ChaiBapchya, thank you very much for your reply. 

The model is a building change detection model (like a siamese UNet) with 
semantic segmentation as the output. The building blocks are unfortunately 
complicated (in the sense complicated computation graph - and complication is 
bad :( - I apologize for that). I am few weeks before submitting for 
publication and doing some last tests, I will be able to share code afterwards. 

The problem is model dependent, everything works fine with standard models. 
Also, it seems the problem is not horovod dependent, because even a standard 
classification model (outside horovod),  takes few minutes to launch - in 
contrast an identical network in backbone, with resnet building blocks launches 
almost immediately. I did this test yesterday. 

I just exported the model into a json file (115733 lines), I don't know if it 
gives more insight, but says in last 3 lines: 

```
  ],                                                                     
  "heads": [[15266, 0, 0], [15231, 0, 0], [15203, 0, 0], [15270, 0, 0]], 
  "attrs": {"mxnet_version": ["int", 10600]}                             
}                                                                        
```

The environment is HPC local environment, I do my debugging tests by requesting 
2 GPUs (P100), 12 processors (Xeon) per process, and 128GB of memory. It seems 
these models require a lot of CPU memory as well.    mxnet version: 
```cu101-1.6.0.dist-info```, cuda 10.1.168

I can provide full system info, but I think the question is: can I load from a 
file/in memory  to avoid going through this operation every time? I think it's 
a GPU issue. When I load the models on cpu they fire up almost instantly. 

I also get this warning, that I don't know if it is relative: 

```
In [9]: outs = config['net'](xx,xx)                                             
                                                                                
                     
[17:40:48] src/imperative/cached_op.cc:192: Disabling fusion due to altered 
topological order of inputs.                                                    
                                        
```

Again, thank you very much for your time. 
Regards





---
[Visit 
Topic](https://discuss.mxnet.io/t/very-slow-initialisation-of-gpu-distributed-training/6357/3)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.io/email/unsubscribe/4bd25c8c5f683fc40eace09b3c1f9a0af4977c8d5ec31e0a1dfb770f0258c904).

Reply via email to