[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-22 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363526#comment-16363526
 ] 

Matthias Boehm edited comment on SYSTEMML-2083 at 2/22/18 7:59 AM:
---

awesome [~chamath] - that sounds very good. As the next steps for both of you 
[~chamath] and [~mpgovinda], I would recommend the following:

1) SytemML: Familiarize yourself with SystemML (e.g., read the paper referenced 
above), the documentation, its algorithms (including the nn library), and maybe 
run a simple linear regression algorithm over dense and sparse matrices with 
the following data generators and algorithm scripts:
https://github.com/apache/systemml/blob/master/scripts/datagen/genRandData4LinearRegression.dml
https://github.com/apache/systemml/blob/master/scripts/algorithms/LinearRegCG.dml

2) Understanding the Problem: Unless you're already familiar with typical 
parameter server architectures, I would recommend to start from a recent paper 
and its references (e.g., 
https://ds3lab.org/wp-content/uploads/2017/07/sigmod2017_jiang.pdf, which does 
a good job in summarizing existing systems). Ultimately, we want to build 
compiler and runtime support for multiple different update strategies. So ask 
yourself if you would be interested in contributing to the internals of 
SystemML.

3) Project Discussion: Subsequently, we would discuss the actual project in 
more detail. This epic is large enough for allowing multiple interesting sub 
projects on which individual students can work. Based on your ideas, 
collaboration preferences, and technical interests, we can cut these projects 
accordingly. The goal is to work toward a high-quality project proposal in an 
interactive manner. 

4) GSoC Application: According to the GSoc timeline, you would then submit by 
March 27 your proposal. For more details, please see 
http://community.apache.org/gsoc.html. 


was (Author: mboehm7):
awesome [~chamath] - that sounds very good. As the next steps for both of you 
[~chamath] and [~mpgovinda], I would recommend the following:

1) SytemML: Familiarize yourself with SystemML (e.g., read the paper referenced 
above), the documentation, its algorithms (including the nn library), and maybe 
run a simple linear regression algorithm over dense and sparse matrices with 
the following data generators and algorithm scripts:
https://github.com/apache/systemml/blob/master/scripts/datagen/genRandData4LinearRegression.dml
https://github.com/apache/systemml/blob/master/scripts/algorithms/LinearRegCG.dml

2) Understanding the Problem: Unless you're already familiar with typical 
parameter server architectures, I would recommend to start from a recent paper 
and its references (e.g., 
https://ds3lab.org/wp-content/uploads/2017/07/sigmod2017_jiang.pdf, which does 
a good job in summarizing existing systems). Ultimately, we want to build 
compiler and runtime support for multiple different update strategies. So ask 
yourself if you would be interested in contributing to the internals of 
SystemML.

3) Project Discussion: Subsequently, we would discuss the actual project in 
more detail. This epic is large enough for allowing multiple interesting sub 
projects on which individual students can work. Based on your ideas, 
collaboration preferences, and technical interests, we can cut these projects 
accordingly. The goal is to work toward a high-quality project proposal in an 
interactive manner. 

4) GSoC Application: According to the GSoc timeline, you would then submit by 
March 27 your proposal to the ASF as the mentoring organization. For more 
details, please see http://community.apache.org/gsoc.html. 

> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
> Attachments: image-2018-02-14-12-18-48-932.png, 
> image-2018-02-14-12-21-00-932.png, image-2018-02-14-12-31-37-563.png
>
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These 

[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-18 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368784#comment-16368784
 ] 

Janardhan edited comment on SYSTEMML-2083 at 2/19/18 4:58 AM:
--

Hi [~lahiru94] - great to see you here, looking forward to the discussion later.

Hey, [~Guobao] - please go through Matthias Boehm's comment  for the first 
step. 

 

And If any one of you face installation problems, please let us know.

Best dev configurations:
 # *Linux:* CentOS7, 64 bit -> I kept the commands for [installing 
spark|https://github.com/j143/install/blob/master/CentOS/spark.sh] , [running 
systemml|[https://github.com/j143/install/blob/master/CentOS/systemml.sh 
|https://github.com/j143/install/blob/master/CentOS/systemml.sh]] at my repo.


was (Author: return_01):
Hi [~lahiru94] - great to see you here, looking forward to the discussion later.

Hey, [~Guobao] - please go through Matthias Boehm's comment  for the first 
step. 

 

And If any one of you face installation problems, please let us know.

Best dev configurations:
 # *Linux:* CentOS7, 64 bit -> I kept the commands for [installing 
spark|https://github.com/j143/install/blob/master/CentOS/spark.sh] , [running 
systemml|[https://github.com/j143/install/blob/master/CentOS/systemml.sh ] at 
my repo.
 

> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
> Attachments: image-2018-02-14-12-18-48-932.png, 
> image-2018-02-14-12-21-00-932.png, image-2018-02-14-12-31-37-563.png
>
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function {{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 
>  
> *Note for GSOC students:* This is large project which will be broken down 
> into sub projects, so everybody will be having their share of pie.
> *Prerequistes:* Java, machine learning experience is a plus but not required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-13 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363568#comment-16363568
 ] 

Janardhan edited comment on SYSTEMML-2083 at 2/14/18 7:04 AM:
--

The light weight parameter server interface is 
[ps-lite|[https://github.com/dmlc/ps-lite].] ] as a simple example.

In simple terms (takes 7 mins to read this explanation), let's say we have
{code:java}
to caculate weights, with help of gradients.{code}
 

1. How parameter server looks? contains workers, server and data.

!image-2018-02-14-12-18-48-932.png!

 

 

2. What worker do? takes a little data & *calculates gradients* from it & sends 
them to server.

!image-2018-02-14-12-21-00-932.png!

 

3. What server do? get the gradients from workers and *calculates weights*.

  !image-2018-02-14-12-31-37-563.png!


was (Author: return_01):
The light weight parameter server interface is 
[ps-lite|[https://github.com/dmlc/ps-lite].] ] as a simple example.

In simple terms, let's say we have (7 min read)
{code:java}
to caculate weights, with help of gradients.{code}
 

1. How parameter server looks? contains workers, server and data.

!image-2018-02-14-12-18-48-932.png!

 

 

2. What worker do? takes a little data & *calculates gradients* from it & sends 
them to server.

!image-2018-02-14-12-21-00-932.png!

 

3. What server do? get the gradients from workers and *calculates weights*.

  !image-2018-02-14-12-31-37-563.png!

> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
> Attachments: image-2018-02-14-12-18-48-932.png, 
> image-2018-02-14-12-21-00-932.png, image-2018-02-14-12-31-37-563.png
>
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-13 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363568#comment-16363568
 ] 

Janardhan edited comment on SYSTEMML-2083 at 2/14/18 7:01 AM:
--

The light weight parameter server interface is 
[ps-lite|[https://github.com/dmlc/ps-lite].] ] as a simple example.

In simple terms, let's say we have (7 min read)
{code:java}
to caculate weights, with help of gradients.{code}
 

1. How parameter server looks? contains workers, server and data.

!image-2018-02-14-12-18-48-932.png!

 

 

2. What worker do? takes a little data & *calculates gradients* from it & sends 
them to server.

!image-2018-02-14-12-21-00-932.png!

 

3. What server do? get the gradients from workers and *calculates weights*.

  !image-2018-02-14-12-31-37-563.png!


was (Author: return_01):
The light weight parameter server interface is 
[ps-lite|[https://github.com/dmlc/ps-lite|https://github.com/dmlc/ps-lite].] ] 
as a simple example.

In simple terms, let's say we have (7 min read)
{code:java}
to caculate weights, with help of gradients.{code}
 

1. How parameter server looks? contains workers, server and data.

!image-2018-02-14-12-18-48-932.png!

 

 

2. What worker do? takes a little data & *calculates gradients* from it & sends 
them to server.

!image-2018-02-14-12-21-00-932.png!

 

3. What server do? get the gradients from workers and *calculates weights*.

!image-2018-02-14-12-22-39-736.png!

 

> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
> Attachments: image-2018-02-14-12-18-48-932.png, 
> image-2018-02-14-12-21-00-932.png, image-2018-02-14-12-31-37-563.png
>
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-13 Thread Govinda Malavipathirana (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361498#comment-16361498
 ] 

Govinda Malavipathirana edited comment on SYSTEMML-2083 at 2/13/18 1:54 PM:


Hi’
  I am Govinda Malavipathirana, 4th year undergraduate from University of 
Moratuwa, Faculty of Information Technology in Sri Lanka. I read the initial 
documentation and found it exciting and every interesting. I would like to 
contribute to this project. I have good knowledge in deep learning, neural 
networks, machine learning, python, related technologies like numpy, pandas and 
Git as well. I really enthuse about deep learning driven software development 
and love to contribute to a DL open source project. Could you describe the 
project in much more detail like current approach, expected extensions. Thank 
you.

Sincerely,
Govinda.
 



> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-12 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361878#comment-16361878
 ] 

Matthias Boehm edited comment on SYSTEMML-2083 at 2/13/18 6:27 AM:
---

Great - thanks for your interest [~mpgovinda]. Below I try to give you a couple 
of pointers and a more concrete idea of the project. Additional details 
regarding the individual sub tasks will follow later and likely evolve over 
time. Note that we're happy to help out and guide you where needed. 

SystemML is an existing system for a broad range of machine learning 
algorithms. Users write their algorithms in an R- or Python-like syntax with 
abstract data types for scalars, matrix, and frames as well as operations such 
as linear algebra, element-wise operations, aggregations, indexing, and 
statistical functions. SystemML then automatically compiles these scripts into 
hybrid runtime plans of single-node and distributed operations on MapReduce or 
Spark according to data and cluster characteristics. For more details, please 
refer to our website (https://systemml.apache.org/) as well as our "SystemML on 
Spark" paper (http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf).

In the past, we primarily focused on data- and task-parallel execution 
strategies (as described above) but in the last years we also added support for 
deep learning including an nn script library, various builtin functions for 
specific layers, as well as a native and GPU operations. 

This epic aims to extend these capabilities by execution strategies for 
parameter servers. We want to build alternative runtime backends as a 
foundation which would already enable users to easily select their preferred 
strategy for local or distributed execution. Later (not part of this project) 
we would like to futher extend this to the automatic selection of these 
strategies.

Specifically, this project aims to introduce a new builtin function, called 
{{paramserv}} that can be called at script level.
{code}
[model’] = paramserv(model, X, y, X_val, y_val, fun1,
   mode=ASYNC, freq=EPOCH, agg=..., epochs=100, batchsize=64, k=7, 
checkpointing=...)
{code}
where we pass an existing (e.g., for transfer learning) or otherwise 
initialized {{model}}, the training feature and label matrices {{X}}, {{y}}, 
the validation features and labels {{X_val}}, {{y_val}}, a batch update 
function specified in SystemML's R- or Python-like language, an update strategy 
{{mode}} along with its frequency {{freq}} (e.g., per batch or epoch), an 
aggregation function {{agg}}, the number of {{epochs}}, {{batchsize}}, degree 
of parallelism {{k}}, and a checkpointing strategy. 

The core of the project then deals with implementing the runtime for this 
builtin function in Java for both local, multi-threaded execution and 
distributed execution on top Spark. The advantage of building the distributed 
parameter servers on top of the data-parallel Spark framework is a seamless 
integration with the rest of SystemML (e.g., where the input feature matrix 
{{X}} can be a large RDD). Since the update and aggregation functions are 
expressed in SystemML's language, we can simply reuse the existing runtime 
(control flow, instructions, and matrix operations) and concentrate on building 
the alternative parameter update mechanisms.



was (Author: mboehm7):
Great - thanks for your interest [~mpgovinda]. Below I try to give you a couple 
of pointers and a more concrete idea of the project. Additional details 
regarding the individual sub tasks will follow later and likely evolve over 
time. Note that we're happy to help out and guide you where needed. 

SystemML is an existing system for a broad range of machine learning 
algorithms. Users write their algorithms in an R- or Python-like syntax with 
abstract data types for scalars, matrix, and frames as well as operations such 
as linear algebra, element-wise operations, aggregations, indexing, and 
statistical functions. SystemML then automatically compiles these scripts into 
hybrid runtime plans of single-node and distributed operations on MapReduce or 
Spark according to data and cluster characteristics. For more details, please 
refer to our website (https://systemml.apache.org/) as well as our "SystemML on 
Spark" paper (http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf).

In the past, we primarily focused on data- and task-parallel execution 
strategies (as described above) but in the last years we also added support for 
deep learning including an nn script library, various builtin functions for 
specific layers, as well as a native and GPU operations. 

This epic aims to extend these capabilities by execution strategies for 
parameter servers. We want to build alternative runtime backends as a 
foundation which would already enable users to easily select their preferred 
strategy for local or distributed execution. Later (not part of this 

[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-02-12 Thread Govinda Malavipathirana (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361498#comment-16361498
 ] 

Govinda Malavipathirana edited comment on SYSTEMML-2083 at 2/12/18 9:49 PM:


Hi’
  I am Govinda Malavipathirana, 4th year undergraduate from University of 
Moratuwa, Faculty of Information Technology in Sri Lanka. I read the initial 
documentation and found it exciting and every interesting. I would like to 
contribute to this project. I have good knowledge in deep learning, neural 
networks, machine learning, python, related technologies like numpy, pandas and 
Git as well. I really enthuse about deep learning driven software development 
and love to contribute to a DL open source project. Could you describe the 
project in much more detail like current approach, expected extensions. Thank 
you.

Sincerely,
Govinda.
 



> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SYSTEMML-2083) Language and runtime for parameter servers

2018-01-30 Thread Janardhan (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345279#comment-16345279
 ] 

Janardhan edited comment on SYSTEMML-2083 at 1/30/18 3:59 PM:
--

Hi all, this issue  is a replica of the other two issues that I've assigned 
myself for. 
 # the jiras are - https://issues.apache.org/jira/browse/SYSTEMML-739


was (Author: return_01):
Hi all, this issue  is a replica of the other two issues that I've assigned 
myself for. 

> Language and runtime for parameter servers
> --
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
>  Issue Type: Epic
>Reporter: Matthias Boehm
>Priority: Major
>  Labels: gsoc2018
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)