Hi,

Currently, we have two methods for single-machine communication:
parameter server and NCCL ring reduction. Both of these methods have
some downsides. Parameter server does not differentiate between NVLink
connections and PCI-E, so it ends up using the higher latency and
slower PCI-E connections as frequently as it does NVLink. NCCL uses
the ring reduce algorithm, which has higher theoretical latency than
other algorithms. NCCL also requires users to install another
dependency in order to use it. I am working on a topology-aware
approach that can
address these limitations. Design proposal is on cwiki:
https://cwiki.apache.org/confluence/display/MXNET/Single+machine+All+Reduce+Topology-aware+Communication

Please feel free to let me know if you have any suggestions.

Regards,
Carl

Reply via email to