@yizhi, please go ahead. I'm on NIPS this week, probably don't have enough time to dive deep into the codes
On Tue, Dec 5, 2017 at 1:51 PM, YiZhi Liu <javeli...@gmail.com> wrote: > to my understanding, this is not about fault-tolerance, i.e., restart when > worker/server fail, right? > > I can help to review. ping @Mu for advice. > > 2017-12-05 13:13 GMT-08:00 CodingCat <coding...@apache.org>: > > > ping > > > > On Sat, Dec 2, 2017 at 10:04 AM, CodingCat <coding...@apache.org> wrote: > > > > > ping > > > > > > On Fri, Dec 1, 2017 at 12:18 AM, Nan Zhu <zhunanmcg...@gmail.com> > wrote: > > > > > >> Hi, all > > >> > > >> I have been working on integrating MXNet with Spark in a more > > >> full-fledged manner. > > >> > > >> One of the most critical pre-conditions is to make parameter server in > > >> mxnet support multiple workers per process. I created the PR in > > >> https://github.com/dmlc/ps-lite/pull/121 (OK, sorry for being > late....I > > >> should have finished it earlier) > > >> > > >> This PR includes some refactoring of those too long methods, to > > highlight > > >> the changes > > >> > > >> 1. https://github.com/dmlc/ps-lite/pull/112 includes the changes > > related > > >> to refactoring > > >> > > >> 2. https://github.com/CodingCat/ps-lite/pull/3/files includes the > > >> changes related to the key functionality > > >> > > >> 3. https://github.com/dmlc/ps-lite/pull/121 contains everything > (Please > > >> review this one) > > >> > > >> > > >> I am not sure who is the current owner of ps-lite, please help to > share > > >> your thoughts on the implementation. Only after this PR is merged and > > >> ps-lite version is synced in mxnet repo, I can file the successive PRs > > in > > >> mxnet > > >> > > >> Thank you very much! > > >> > > >> Nan > > >> > > > > > > > > > > > > -- > Yizhi Liu > DMLC member > Amazon Web Services > Vancouver, Canada >