preethamam commented on issue #12982:
URL: 
https://github.com/apache/incubator-mxnet/issues/12982#issuecomment-1227472233

   > [This issue in PyTorch](https://github.com/pytorch/pytorch/issues/1637) 
may be relevant. Specifically, [this 
comment](https://github.com/pytorch/pytorch/issues/1637#issuecomment-338268158) 
about either disabling IOMMU or changing to software IOMMU. I've had issues 
with the threadripper requiring `iommu=soft`; I'd recommend giving that a shot.
   
   I do have the same issue. I have 4 Nvidia RTX 2080Ti GPUs and whenever I use 
`nn.DataParallel` the Linux and Windows 11 systems crash after a few seconds 
when the data is pushed to the GPUs. In addition, I tried `NCCL_P2P_DISABLE=1` 
but none of the solutions helped me. Including the `iommu=soft` or `iommu=off'. 
I went through the comments and their suggestions none worked for me.
   
   I have AMD Threadripper 2950X
   RAM: 128 GB
   4 Nvidia RTX 2080Ti GPUs
   
   Any help is much appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

Reply via email to