[ https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363567#comment-16363567 ]
Janardhan commented on SYSTEMML-2083: ------------------------------------- The light weight parameter server interface is [ps-lite|[https://github.com/dmlc/ps-lite|https://github.com/dmlc/ps-lite].] ] as a simple example. In simple terms, let's say we have (7 min read) {code:java} to caculate weights, with help of gradients.{code} 1. How parameter server looks? contains workers, server and data. !image-2018-02-14-12-18-48-932.png! 2. What worker do? takes a little data & *calculates gradients* from it & sends them to server. !image-2018-02-14-12-21-00-932.png! 3. What server do? get the gradients from workers and *calculates weights*. !image-2018-02-14-12-22-39-736.png! > Language and runtime for parameter servers > ------------------------------------------ > > Key: SYSTEMML-2083 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2083 > Project: SystemML > Issue Type: Epic > Reporter: Matthias Boehm > Priority: Major > Labels: gsoc2018 > Attachments: image-2018-02-14-12-18-48-932.png, > image-2018-02-14-12-21-00-932.png > > > SystemML already provides a rich set of execution strategies ranging from > local operations to large-scale computation on MapReduce or Spark. In this > context, we support both data-parallel (multi-threaded or distributed > operations) as well as task-parallel computation (multi-threaded or > distributed parfor loops). This epic aims to complement the existing > execution strategies by language and runtime primitives for parameter > servers, i.e., model-parallel execution. We use the terminology of > model-parallel execution with distributed data and distributed model to > differentiate them from the existing data-parallel operations. Target > applications are distributed deep learning and mini-batch algorithms in > general. These new abstractions will help making SystemML a unified framework > for small- and large-scale machine learning that supports all three major > execution strategies in a single framework. > > A major challenge is the integration of stateful parameter servers and their > common push/pull primitives into an otherwise functional (and thus, > stateless) language. We will approach this challenge via a new builtin > function \{{paramserv}} which internally maintains state but at the same time > fits into the runtime framework of stateless operations. > Furthermore, we are interested in providing (1) different runtime backends > (local and distributed), (2) different parameter server modes (synchronous, > asynchronous, hogwild!, stale-synchronous), (3) different update frequencies > (batch, multi-batch, epoch), as well as (4) different architectures for > distributed data (1 parameter server, k workers) and distributed model (k1 > parameter servers, k2 workers). -- This message was sent by Atlassian JIRA (v7.6.3#76005)