to my understanding, this is not about fault-tolerance, i.e., restart when
worker/server fail, right?

I can help to review. ping @Mu for advice.

2017-12-05 13:13 GMT-08:00 CodingCat <coding...@apache.org>:

> ping
>
> On Sat, Dec 2, 2017 at 10:04 AM, CodingCat <coding...@apache.org> wrote:
>
> > ping
> >
> > On Fri, Dec 1, 2017 at 12:18 AM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
> >
> >> Hi, all
> >>
> >> I have been working on integrating MXNet with Spark in a more
> >> full-fledged manner.
> >>
> >> One of the most critical pre-conditions is to make parameter server in
> >> mxnet support multiple workers per process. I created the PR in
> >> https://github.com/dmlc/ps-lite/pull/121 (OK, sorry for being late....I
> >> should have finished it earlier)
> >>
> >> This PR includes some refactoring of those too long methods, to
> highlight
> >> the changes
> >>
> >> 1. https://github.com/dmlc/ps-lite/pull/112 includes the changes
> related
> >> to refactoring
> >>
> >> 2. https://github.com/CodingCat/ps-lite/pull/3/files includes the
> >> changes related to the key functionality
> >>
> >> 3. https://github.com/dmlc/ps-lite/pull/121 contains everything (Please
> >> review this one)
> >>
> >>
> >> I am not sure who is the current owner of ps-lite, please help to share
> >> your thoughts on the implementation. Only after this PR is merged and
> >> ps-lite version is synced in mxnet repo, I can file the successive PRs
> in
> >> mxnet
> >>
> >> Thank you very much!
> >>
> >> Nan
> >>
> >
> >
>



-- 
Yizhi Liu
DMLC member
Amazon Web Services
Vancouver, Canada

Reply via email to