@yizhi, please go ahead. I'm on NIPS this week, probably don't have enough
time to dive deep into the codes

On Tue, Dec 5, 2017 at 1:51 PM, YiZhi Liu <javeli...@gmail.com> wrote:

> to my understanding, this is not about fault-tolerance, i.e., restart when
> worker/server fail, right?
>
> I can help to review. ping @Mu for advice.
>
> 2017-12-05 13:13 GMT-08:00 CodingCat <coding...@apache.org>:
>
> > ping
> >
> > On Sat, Dec 2, 2017 at 10:04 AM, CodingCat <coding...@apache.org> wrote:
> >
> > > ping
> > >
> > > On Fri, Dec 1, 2017 at 12:18 AM, Nan Zhu <zhunanmcg...@gmail.com>
> wrote:
> > >
> > >> Hi, all
> > >>
> > >> I have been working on integrating MXNet with Spark in a more
> > >> full-fledged manner.
> > >>
> > >> One of the most critical pre-conditions is to make parameter server in
> > >> mxnet support multiple workers per process. I created the PR in
> > >> https://github.com/dmlc/ps-lite/pull/121 (OK, sorry for being
> late....I
> > >> should have finished it earlier)
> > >>
> > >> This PR includes some refactoring of those too long methods, to
> > highlight
> > >> the changes
> > >>
> > >> 1. https://github.com/dmlc/ps-lite/pull/112 includes the changes
> > related
> > >> to refactoring
> > >>
> > >> 2. https://github.com/CodingCat/ps-lite/pull/3/files includes the
> > >> changes related to the key functionality
> > >>
> > >> 3. https://github.com/dmlc/ps-lite/pull/121 contains everything
> (Please
> > >> review this one)
> > >>
> > >>
> > >> I am not sure who is the current owner of ps-lite, please help to
> share
> > >> your thoughts on the implementation. Only after this PR is merged and
> > >> ps-lite version is synced in mxnet repo, I can file the successive PRs
> > in
> > >> mxnet
> > >>
> > >> Thank you very much!
> > >>
> > >> Nan
> > >>
> > >
> > >
> >
>
>
>
> --
> Yizhi Liu
> DMLC member
> Amazon Web Services
> Vancouver, Canada
>

Reply via email to