Nice design document. From where does it come the default value of MXNET_KVSTORE_GPUARRAY_BOUND of 10M? Do you generate a tree for each GPU?
Pedro. On Mon, Jun 18, 2018 at 2:30 PM Carl Yang <carl14...@gmail.com> wrote: > Hi, > > Currently, we have two methods for single-machine communication: > parameter server and NCCL ring reduction. Both of these methods have > some downsides. Parameter server does not differentiate between NVLink > connections and PCI-E, so it ends up using the higher latency and > slower PCI-E connections as frequently as it does NVLink. NCCL uses > the ring reduce algorithm, which has higher theoretical latency than > other algorithms. I am working on a topology-aware approach that can > address these limitations. Design proposal is on cwiki: > > https://cwiki.apache.org/confluence/display/MXNET/Single+machine+All+Reduce+Topology-aware+Communication > > Please feel free to let me know if you have any suggestions. > > Regards, > Carl >