[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: The objective of “paramserv” built-in function is to update an 
initial or existing model with configuration. An initial function signature 
would be _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, 
freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. 
We are interested in providing the model (which will be a struct-like data 
structure consisting the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function, the update strategy (e.g. sync, async, hogwild!, 
stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
gradient aggregation function, the number of epoch, the batch size, the degree 
of parallelism as well as the checkpointing strategy (e.g. rollback recovery). 
And the function will return a trained model in format of struct.  (was: The 
objective of “paramserv” built-in function is to update an initial or existing 
model with configuration. An initial function signature would be 
_model'=paramserv(model, X, y, X_val, y_val, g_cal_fun, upd=fun1, mode=SYNC, 
freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. 
We are interested in providing the model, the training features and labels, the 
validation features and labels, the gradient calculation function, the batch 
update function, the update strategy (e.g. sync, async, hogwild!, 
stale-synchronous), the update frequency (e.g. epoch or batch), the aggregation 
function, the number of epoch, the batch size, the degree of parallelism as 
well as the checkpointing strategy (e.g. rollback recovery).)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
> interested in providing the model (which will be a struct-like data structure 
> consisting the weights, the biases and the hyperparameters), the training 
> features and labels, the validation features and labels, the batch update 
> function, the update strategy (e.g. sync, async, hogwild!, 
> stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
> gradient aggregation function, the number of epoch, the batch size, the 
> degree of parallelism as well as the checkpointing strategy (e.g. rollback 
> recovery). And the function will return a trained model in format of struct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: The objective of “paramserv” built-in function is to update an 
initial or existing model with configuration. An initial function signature 
would be _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, 
freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. 
We are interested in providing the model (which will be a struct-like data 
structure consisting the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function, the update strategy (e.g. sync, async, hogwild!, 
stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
gradient aggregation function, the number of epoch, the batch size, the degree 
of parallelism as well as the checkpointing strategy (e.g. rollback recovery). 
And the function will return a trained model in struct format.  (was: The 
objective of “paramserv” built-in function is to update an initial or existing 
model with configuration. An initial function signature would be 
_model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
interested in providing the model (which will be a struct-like data structure 
consisting the weights, the biases and the hyperparameters), the training 
features and labels, the validation features and labels, the batch update 
function, the update strategy (e.g. sync, async, hogwild!, stale-synchronous), 
the update frequency (e.g. epoch or mini-batch), the gradient aggregation 
function, the number of epoch, the batch size, the degree of parallelism as 
well as the checkpointing strategy (e.g. rollback recovery). And the function 
will return a trained model in format of struct.)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
> interested in providing the model (which will be a struct-like data structure 
> consisting the weights, the biases and the hyperparameters), the training 
> features and labels, the validation features and labels, the batch update 
> function, the update strategy (e.g. sync, async, hogwild!, 
> stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
> gradient aggregation function, the number of epoch, the batch size, the 
> degree of parallelism as well as the checkpointing strategy (e.g. rollback 
> recovery). And the function will return a trained model in struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2298) Preparation of dev environment

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2298:

Summary: Preparation of dev environment  (was: Creation of a test dml 
script based on NN library)

> Preparation of dev environment
> --
>
> Key: SYSTEMML-2298
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2298
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> During the bonding time, all the development environment should be well 
> prepared. And a test dml script which leverages the new "paramserv" function 
> to rewrite the training function in the [MNIST LeNet 
> Example|https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet.dml]
>  could be prepared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2298) Preparation of dev environment

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2298:

Description: During the bonding time, all the development environment 
should be well prepared. The native library OpenBLAS should be installed in 
order to run the MNIST LeNet example. And then by leveraging the MNIST LeNet 
data generator ([http://leon.bottou.org/projects/infimnist]), we could generate 
256k instances to train the model.  (was: During the bonding time, all the 
development environment should be well prepared. And a test dml script which 
leverages the new "paramserv" function to rewrite the training function in the 
[MNIST LeNet 
Example|https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet.dml]
 could be prepared.)

> Preparation of dev environment
> --
>
> Key: SYSTEMML-2298
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2298
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> During the bonding time, all the development environment should be well 
> prepared. The native library OpenBLAS should be installed in order to run the 
> MNIST LeNet example. And then by leveraging the MNIST LeNet data generator 
> ([http://leon.bottou.org/projects/infimnist]), we could generate 256k 
> instances to train the model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2306) Implementation of a script with paramserv func

2018-05-09 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2306:
---

 Summary: Implementation of a script with paramserv func
 Key: SYSTEMML-2306
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2306
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


This task aims to write a dml script consisting the paramserv function. We 
could easily reuse the MNIST LeNet example and adapt it by creating a 
struct-like model and passing the update function as well as the aggregation 
function. In this case, the update function which will be executed in workers 
should consist of calculating the gradients by walking the batch forward and 
backward steps. And the aggregation function which will be runned in parameter 
server should consist of updating the weights and biases by aggregating the 
received gradients.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Due Date: 17/May/18  (was: 21/May/18)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
> interested in providing the model (which will be a struct-like data structure 
> consisting the weights, the biases and the hyperparameters), the training 
> features and labels, the validation features and labels, the batch update 
> function, the update strategy (e.g. sync, async, hogwild!, 
> stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
> gradient aggregation function, the number of epoch, the batch size, the 
> degree of parallelism as well as the checkpointing strategy (e.g. rollback 
> recovery). And the function will return a trained model in struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Due Date: 16/May/18  (was: 17/May/18)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
> interested in providing the model (which will be a struct-like data structure 
> consisting the weights, the biases and the hyperparameters), the training 
> features and labels, the validation features and labels, the batch update 
> function, the update strategy (e.g. sync, async, hogwild!, 
> stale-synchronous), the update frequency (e.g. epoch or mini-batch), the 
> gradient aggregation function, the number of epoch, the batch size, the 
> degree of parallelism as well as the checkpointing strategy (e.g. rollback 
> recovery). And the function will return a trained model in struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2306) Implementation of a script with paramserv func

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2306:

Due Date: 18/May/18

> Implementation of a script with paramserv func
> --
>
> Key: SYSTEMML-2306
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2306
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> This task aims to write a dml script consisting the paramserv function. We 
> could easily reuse the MNIST LeNet example and adapt it by creating a 
> struct-like model and passing the update function as well as the aggregation 
> function. In this case, the update function which will be executed in workers 
> should consist of calculating the gradients by walking the batch forward and 
> backward steps. And the aggregation function which will be runned in 
> parameter server should consist of updating the weights and biases by 
> aggregating the received gradients.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 
 # For the case of local multi-thread parameter server, it is easy to maintain 
a concurrent hashmap (where the parameters as value accompanied with a defined 
key) inside the CP. And the workers are launched in multi-threaded way to 
execute the gradients calculation function and push the gradients to the 
hashmap. An another thread will be launched to pull the gradients from hashmap 
and call the aggregation function to update the parameters. 
 # For the case of spark distributed backend, we could launch a remote single 
parameter server outside of CP (as a worker) to provide the pull and push 
service. For the moment, all the weights and biases are saved in this single 
server. And the exchange between server and workers will be implemented by TCP. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. 

We could also need to implement the synchronisation between workers and 
parameter server to be able to bring more parameter update strategies, e.g., 
the stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock consisting of all 
workers' clock in the server. Each time when an iteration finishes, the worker 
will send a request to server and then the server will send back a response to 
indicate if the worker should wait or not.

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 
 # For the case of local multi-thread parameter server, it is easy to maintain 
a concurrent hashmap (where the parameters as value accompanied with a defined 
key) inside the CP. And the workers are launched in multi-threaded way to 
execute the gradients calculation function and push the gradients to the 
hashmap. An another thread will be launched to pull the gradients from hashmap 
and call the aggregation function to update the parameters. 
 # For the case of spark distributed backend, we could launch a remote single 
parameter server outside of CP (as a worker) to provide the pull and push 
service. For the moment, all the weights and biases are saved in this single 
server. And the exchange between server and workers will be implemented by TCP. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. 

We could also need to implement the synchronisation between workers and 
parameter server to be able to bring more parameter update strategies, e.g., 
the stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock consisting of all 
workers' clock in the server. Each time when an iteration finishes, the worker 
will send a request to server and then the server will send back a response to 
indicate if the worker should wait or not.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
>  # For the case of local multi-thread parameter server, it is easy to 
> maintain a concurrent hashmap (where the parameters as value accompanied with 
> a defined key) inside the CP. And the workers are launched in multi-threaded 
> way to execute the gradients calculation function and push the gradients to 
> the hashmap. An another thread will be launched to pull the gradients from 
> hashmap and call the aggregation function to update the parameters. 
>  # For the case of spark distributed backend, we could launch a remote single 
> parameter server outside of CP (as a worker) to provide the pull and push 
> service. For the moment, all the weights and biases are saved in this single 
> server. And the exchange between server and workers will be implemented by 
> TCP. Hence, we could easily broadcast the IP address and the port number to 
> the workers. And then the workers can send the gradients and re

[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 
 # For the case of local multi-thread parameter server, it is easy to maintain 
a concurrent hashmap (where the parameters as value accompanied with a defined 
key) inside the CP. And the workers are launched in multi-threaded way to 
execute the gradients calculation function and push the gradients to the 
hashmap. An another thread will be launched to pull the gradients from hashmap 
and call the aggregation function to update the parameters. 
 # For the case of spark distributed backend, we could launch a remote single 
parameter server outside of CP (as a worker) to provide the pull and push 
service. For the moment, all the weights and biases are saved in this single 
server. And the exchange between server and workers will be implemented by TCP. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. 

We could also need to implement the synchronisation between workers and 
parameter server to be able to bring more parameter update strategies, e.g., 
the stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock consisting of all 
workers' clock in the server. Each time when an iteration finishes, the worker 
will send a request to server and then the server will send back a response to 
indicate if the worker should wait or not.

  was:A single node parameter server acts as a data-parallel parameter server. 
And a multi-node model parallel parameter server will be discussed if time 
permits. The idea is to run a single-node parameter server by maintaining a 
hashmap inside the CP (Control Program) where the parameter as value 
accompanied with a defined key. For example, inserting the global parameter 
with a key named “worker-param-replica” allows the workers to retrieve the 
parameter replica. Hence, in the context of local multi-threaded backend, 
workers can communicate directly with this hashmap in the same process. And in 
the context of Spark distributed backend, the CP firstly needs to fork a thread 
to start a parameter server which maintains a hashmap. And secondly the workers 
can send intermediates and retrieve parameters by connecting to parameter 
server via TCP socket. Since SystemML has good cache management, we only need 
to maintain the matrix reference pointing to a file location instead of real 
data instance in the hashmap. If time permits, to be able to introduce the 
async and staleness update strategies, we would need to implement the 
synchronization by leveraging vector clock.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
>  # For the case of local multi-thread parameter server, it is easy to 
> maintain a concurrent hashmap (where the parameters as value accompanied with 
> a defined key) inside the CP. And the workers are launched in multi-threaded 
> way to execute the gradients calculation function and push the gradients to 
> the hashmap. An another thread will be launched to pull the gradients from 
> hashmap and call the aggregation function to update the parameters. 
>  # For the case of spark distributed backend, we could launch a remote single 
> parameter server outside of CP (as a worker) to provide the pull and push 
> service. For the moment, all the weights and biases are saved in this single 
> server. And the exchange between server and workers will be implemented by 
> TCP. Hence, we could easily broadcast the IP address and the port number to 
> the workers. And then the workers can send the gradients and retrieve the new 
> parameters via TCP socket. 
> We could also need to implement the synchronisation between workers and 
> parameter server to be able to bring more parameter update strategies, e.g., 
> the stale-synchronous strategy needs a hyperparameter "staleness" to define 
> the waiting interval. The idea is to maintain a vector clock consisting of 
> all workers' clock in the server. Each time when an iteration finishes, the 
> worker will send a request to server and then the

[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: 
This part aims to design and implement a local execution backend for the 
compiled “paramserv” function. It consists of the implementations of 
partitioning the data for worker threads, launching the single-node parameter 
server in CP, shipping and calling the compiled statistical function and 
creating different update strategies. We will focus on
implementing BSP execution strategies, i.e., synchronous update strategy 
including per epoch and per batch. And other update strategies (e.g. 
asynchronous, stale-synchronous) and checkpointing strategies should be 
optional and will be added if time permits. The architecture for synchronous 
per epoch update strategy is illustrated below.

> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to design and implement a local execution backend for the 
> compiled “paramserv” function. It consists of the implementations of 
> partitioning the data for worker threads, launching the single-node parameter 
> server in CP, shipping and calling the compiled statistical function and 
> creating different update strategies. We will focus on
> implementing BSP execution strategies, i.e., synchronous update strategy 
> including per epoch and per batch. And other update strategies (e.g. 
> asynchronous, stale-synchronous) and checkpointing strategies should be 
> optional and will be added if time permits. The architecture for synchronous 
> per epoch update strategy is illustrated below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: 
This part aims to design and implement a local execution backend for the 
compiled “paramserv” function. It consists of the implementations of 
partitioning the data for worker threads, launching the single-node parameter 
server in CP, shipping and calling the compiled statistical function and 
creating different update strategies. We will focus on
 implementing BSP execution strategies, i.e., synchronous update strategy 
including per epoch and per batch. And other update strategies (e.g. 
asynchronous, stale-synchronous) and checkpointing strategies should be 
optional and will be added if time permits. The architecture for synchronous 
per epoch update strategy is illustrated below.

The idea is to spawn a thread to launch local parameter server which is 
responsible for maintaining the parameter hashmap and executing the aggregation 
work. And then a number of workers will be forked according to the level of 
parallelism. The worker loads data partition, operates the parameter updating 
per batch, pushes the gradients and retrieves a new parameter from server. The 
server will retrieve the gradients of each worker using the related keys in a 
round robin way, aggregate the parameters and push the new global parameter 
with the parameter related keys. At last, the paramserv function main thread 
should wait for the server aggregator thread joining it and got the last global 
parameters as final result. Hence, the pull/push primitive methods can bring 
more flexibility and facilitate to implement other update strategies.

  was:
This part aims to design and implement a local execution backend for the 
compiled “paramserv” function. It consists of the implementations of 
partitioning the data for worker threads, launching the single-node parameter 
server in CP, shipping and calling the compiled statistical function and 
creating different update strategies. We will focus on
implementing BSP execution strategies, i.e., synchronous update strategy 
including per epoch and per batch. And other update strategies (e.g. 
asynchronous, stale-synchronous) and checkpointing strategies should be 
optional and will be added if time permits. The architecture for synchronous 
per epoch update strategy is illustrated below.


> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to design and implement a local execution backend for the 
> compiled “paramserv” function. It consists of the implementations of 
> partitioning the data for worker threads, launching the single-node parameter 
> server in CP, shipping and calling the compiled statistical function and 
> creating different update strategies. We will focus on
>  implementing BSP execution strategies, i.e., synchronous update strategy 
> including per epoch and per batch. And other update strategies (e.g. 
> asynchronous, stale-synchronous) and checkpointing strategies should be 
> optional and will be added if time permits. The architecture for synchronous 
> per epoch update strategy is illustrated below.
> The idea is to spawn a thread to launch local parameter server which is 
> responsible for maintaining the parameter hashmap and executing the 
> aggregation work. And then a number of workers will be forked according to 
> the level of parallelism. The worker loads data partition, operates the 
> parameter updating per batch, pushes the gradients and retrieves a new 
> parameter from server. The server will retrieve the gradients of each worker 
> using the related keys in a round robin way, aggregate the parameters and 
> push the new global parameter with the parameter related keys. At last, the 
> paramserv function main thread should wait for the server aggregator thread 
> joining it and got the last global parameters as final result. Hence, the 
> pull/push primitive methods can bring more flexibility and facilitate to 
> implement other update strategies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Push/Pull service: 

In general, we could launch a parameter server inside (local multi-thread 
backend) or outside (spark distributed backend) of CP to provide the pull and 
push service. For the moment, all the weights and biases are saved in a hashmap 
using a key, e.g., "global parameter". Each worker's gradients will be put into 
the hashmap seperately with a given key. And the exchange between server and 
workers will be implemented by TCP. Hence, we could easily broadcast the IP 
address and the port number to the workers. And then the workers can send the 
gradients and retrieve the new parameters via TCP socket. The server will also 
spawn a thread which retrieves the gradients by polling the hashmap using 
relevant keys and aggregates them. At last, it updates the global parameter in 
the hashmap.

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 
 # For the case of local multi-thread parameter server, it is easy to maintain 
a concurrent hashmap (where the parameters as value accompanied with a defined 
key) inside the CP. And the workers are launched in multi-threaded way to 
execute the gradients calculation function and push the gradients to the 
hashmap. An another thread will be launched to pull the gradients from hashmap 
and call the aggregation function to update the parameters. 
 # For the case of spark distributed backend, we could launch a remote single 
parameter server outside of CP (as a worker) to provide the pull and push 
service. For the moment, all the weights and biases are saved in this single 
server. And the exchange between server and workers will be implemented by TCP. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. 

We could also need to implement the synchronisation between workers and 
parameter server to be able to bring more parameter update strategies, e.g., 
the stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock consisting of all 
workers' clock in the server. Each time when an iteration finishes, the worker 
will send a request to server and then the server will send back a response to 
indicate if the worker should wait or not.

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant k

[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: ps.png

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: (was: ps.png)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: This part aims to design and implement a local execution 
backend for the compiled “paramserv” function. The idea is to spawn a thread in 
CP for running the parameter server. And the workers are also launched in 
multi-threaded way in CP.  (was: This part aims to design and implement a local 
execution backend for the compiled “paramserv” function. It consists of the 
implementations of partitioning the data for worker threads, launching the 
single-node parameter server in CP, shipping and calling the compiled 
statistical function and creating different update strategies. We will focus on
 implementing BSP execution strategies, i.e., synchronous update strategy 
including per epoch and per batch. And other update strategies (e.g. 
asynchronous, stale-synchronous) and checkpointing strategies should be 
optional and will be added if time permits. The architecture for synchronous 
per epoch update strategy is illustrated below.

The idea is to spawn a thread to launch local parameter server which is 
responsible for maintaining the parameter hashmap and executing the aggregation 
work. And then a number of workers will be forked according to the level of 
parallelism. The worker loads data partition, operates the parameter updating 
per batch, pushes the gradients and retrieves a new parameter from server. The 
server will retrieve the gradients of each worker using the related keys in a 
round robin way, aggregate the parameters and push the new global parameter 
with the parameter related keys. At last, the paramserv function main thread 
should wait for the server aggregator thread joining it and got the last global 
parameters as final result. Hence, the pull/push primitive methods can bring 
more flexibility and facilitate to implement other update strategies.)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to design and implement a local execution backend for the 
> compiled “paramserv” function. The idea is to spawn a thread in CP for 
> running the parameter server. And the workers are also launched in 
> multi-threaded way in CP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: This part aims to implement a local execution backend for the 
compiled “paramserv” function. The idea is to spawn a thread in CP for running 
the parameter server. And the workers are also launched in multi-threaded way 
in CP.  (was: This part aims to design and implement a local execution backend 
for the compiled “paramserv” function. The idea is to spawn a thread in CP for 
running the parameter server. And the workers are also launched in 
multi-threaded way in CP.)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement a local execution backend for the compiled 
> “paramserv” function. The idea is to spawn a thread in CP for running the 
> parameter server. And the workers are also launched in multi-threaded way in 
> CP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2307) New structured data types

2018-05-09 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-2307:


 Summary: New structured data types
 Key: SYSTEMML-2307
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2307
 Project: SystemML
  Issue Type: Epic
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2309) Length and right indexing operations over lists

2018-05-09 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-2309:


 Summary: Length and right indexing operations over lists
 Key: SYSTEMML-2309
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2309
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2310) Length and right indexing operations over structs

2018-05-09 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-2310:


 Summary: Length and right indexing operations over structs
 Key: SYSTEMML-2310
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2310
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2308) New data types list and struct

2018-05-09 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-2308:


 Summary: New data types list and struct
 Key: SYSTEMML-2308
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2308
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2308) New data types list and struct, incl constructors

2018-05-09 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-2308:
-
Summary: New data types list and struct, incl constructors  (was: New data 
types list and struct)

> New data types list and struct, incl constructors
> -
>
> Key: SYSTEMML-2308
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2308
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2087:

Description: This part aims to implement the BSP for spark distributed 
backend. Hence the idea is to be able to launch a remote parameter server and 
the workers.

> Initial version of distributed spark backend
> 
>
> Key: SYSTEMML-2087
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the BSP for spark distributed backend. Hence the 
> idea is to be able to launch a remote parameter server and the workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2311) Allow lists and structs in function calls

2018-05-09 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-2311:


 Summary: Allow lists and structs in function calls
 Key: SYSTEMML-2311
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2311
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2302) Second version of execution backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2302:

Description: This part aims to complement the updating strategies by adding 
ASP and SSP.

> Second version of execution backend
> ---
>
> Key: SYSTEMML-2302
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2302
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to complement the updating strategies by adding ASP and SSP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: This part aims to implement the BSP strategy for the local 
execution backend. The idea is to spawn a thread in CP for running the 
parameter server. And the workers are also launched in multi-threaded way in 
CP.  (was: This part aims to implement a local execution backend for the 
compiled “paramserv” function. The idea is to spawn a thread in CP for running 
the parameter server. And the workers are also launched in multi-threaded way 
in CP.)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the BSP strategy for the local execution backend. 
> The idea is to spawn a thread in CP for running the parameter server. And the 
> workers are also launched in multi-threaded way in CP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2084) Implementation of language and compiler extension

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2084:

Due Date: 25/May/18  (was: 28/May/18)

> Implementation of language and compiler extension
> -
>
> Key: SYSTEMML-2084
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2084
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to add an additional language support for the “paramserv” 
> function in order to be able to compile this new function. Since SystemML 
> already supports the parameterized builtin function, we can easily extend an 
> additional operation type and generate a new instruction for the “paramserv” 
> function. Recently, we have also added a new “eval” built-in function which 
> is capable to pass a function pointer as argument so that it can be called in 
> runtime. Similar to it, we would need to extend the inter-procedural analysis 
> to avoid removing unused constructed functions in the presence of 
> second-order “paramserv” function. Because the referenced functions, i.e., 
> the aggregate function and update function, should be present in runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Due Date: 1/Jun/18  (was: 4/Jun/18)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Initial version of local backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Due Date: 22/Jun/18  (was: 25/Jun/18)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the BSP strategy for the local execution backend. 
> The idea is to spawn a thread in CP for running the parameter server. And the 
> workers are also launched in multi-threaded way in CP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2087:

Due Date: 6/Jul/18  (was: 9/Jul/18)

> Initial version of distributed spark backend
> 
>
> Key: SYSTEMML-2087
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the BSP for spark distributed backend. Hence the 
> idea is to be able to launch a remote parameter server and the workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2302) Second version of execution backend

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2302:

Due Date: 27/Jul/18  (was: 6/Aug/18)

> Second version of execution backend
> ---
>
> Key: SYSTEMML-2302
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2302
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to complement the updating strategies by adding ASP and SSP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2308) New data type list for lists and structs

2018-05-09 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-2308:
-
Summary: New data type list for lists and structs  (was: New data types 
list and struct, incl constructors)

> New data type list for lists and structs
> 
>
> Key: SYSTEMML-2308
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2308
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)