zhreshold commented on issue #11872: "socket.error: [Errno 111] Connection refused" while training with multiple workers URL: https://github.com/apache/incubator-mxnet/issues/11872#issuecomment-408217525 Temporary solutions: 1. Increase shared memory if it's too small, you can use `df -h /dev/shm` to check the shared memory size and usage: edit `/etc/sysctl.conf`, add a line or edit `add a line kernel.shmmax = 4,294,967,296` for example to use maximum 4G shared mem. 2. Reduce `num_workers`, if you set `num_workers = 0`, no multiprocess worker will be used, but it's slower.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services