Re: Extend MXNET distributed training with MPI AllReduce

2018-03-26 Thread Nan Zhu
Hi, Patric It's pretty nice work! A question: how the future code structure would look like when putting this allreduce module as an submodule? We will have two communication submodules? Is there any plan to give an unified abstraction for communication so that a single communication submodule

Extend MXNET distributed training with MPI AllReduce

2018-03-26 Thread Zhao, Patric
Hi MXNET owners/developers, As you known, the AllReduce and Parameter Severs are two very popular distributed training modes in DL. Currently, MXNET only supports parameter server mode and is lack of AllReduce mode. Other frameworks, like tensorflow, pytorch, caffe, etc, can work with

Re: Extend MXNET distributed training with MPI AllReduce

2018-03-26 Thread Chris Olivier
great! nice work! On Mon, Mar 26, 2018 at 6:31 PM Zhao, Patric wrote: > Hi MXNET owners/developers, > > As you known, the AllReduce and Parameter Severs are two very popular > distributed training modes in DL. > > Currently, MXNET only supports parameter server mode and

Re: MXNet C++ package improvements

2018-03-26 Thread Pedro Larroy
Thank you for your feedback and comments on the document. Seems we have to iterate more. I want to point out that using the pimpl idiom we wouldn't have the problem of breaking backward compatibility, since the internal C++ API is not really exposed. Only the public facing one of the pimpl facade

Re: MXNet C++ package improvements

2018-03-26 Thread Pedro Larroy
About MinGW, most windows users will use our binary versions which we can compile with the supported platform tool by Microsoft which is visual studio. Do we have data on how many people is using MinGW? We can't support every possible toolchain available. I'm not sure the GNU tools for windows