[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: ps.png

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. The idea is to run a single-node parameter server by maintaining a 
> hashmap inside the CP (Control Program) where the parameter as value 
> accompanied with a defined key. For example, inserting the global parameter 
> with a key named “worker-param-replica” allows the workers to retrieve the 
> parameter replica. Hence, in the context of local multi-threaded backend, 
> workers can communicate directly with this hashmap in the same process. And 
> in the context of Spark distributed backend, the CP firstly needs to fork a 
> thread to start a parameter server which maintains a hashmap. And secondly 
> the workers can send intermediates and retrieve parameters by connecting to 
> parameter server via TCP socket. Since SystemML has good cache management, we 
> only need to maintain the matrix reference pointing to a file location 
> instead of real data instance in the hashmap. If time permits, to be able to 
> introduce the async and staleness update strategies, we would need to 
> implement the synchronization by leveraging vector clock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: A single node parameter server acts as a data-parallel 
parameter server. And a multi-node model parallel parameter server will be 
discussed if time permits. The idea is to run a single-node parameter server by 
maintaining a hashmap inside the CP (Control Program) where the parameter as 
value accompanied with a defined key. For example, inserting the global 
parameter with a key named “worker-param-replica” allows the workers to 
retrieve the parameter replica. Hence, in the context of local multi-threaded 
backend, workers can communicate directly with this hashmap in the same 
process. And in the context of Spark distributed backend, the CP firstly needs 
to fork a thread to start a parameter server which maintains a hashmap. And 
secondly the workers can send intermediates and retrieve parameters by 
connecting to parameter server via TCP socket. Since SystemML has good cache 
management, we only need to maintain the matrix reference pointing to a file 
location instead of real data instance in the hashmap. If time permits, to be 
able to introduce the async and staleness update strategies, we would need to 
implement the synchronization by leveraging vector clock.  (was: A single node 
parameter server acts as a data-parallel parameter server. And a multi-node 
model parallel parameter server will be discussed if time permits. )

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. The idea is to run a single-node parameter server by maintaining a 
> hashmap inside the CP (Control Program) where the parameter as value 
> accompanied with a defined key. For example, inserting the global parameter 
> with a key named “worker-param-replica” allows the workers to retrieve the 
> parameter replica. Hence, in the context of local multi-threaded backend, 
> workers can communicate directly with this hashmap in the same process. And 
> in the context of Spark distributed backend, the CP firstly needs to fork a 
> thread to start a parameter server which maintains a hashmap. And secondly 
> the workers can send intermediates and retrieve parameters by connecting to 
> parameter server via TCP socket. Since SystemML has good cache management, we 
> only need to maintain the matrix reference pointing to a file location 
> instead of real data instance in the hashmap. If time permits, to be able to 
> introduce the async and staleness update strategies, we would need to 
> implement the synchronization by leveraging vector clock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: A single node parameter server acts as a data-parallel 
parameter server. And a multi-node model parallel parameter server will be 
discussed if time permits.   (was: Parameter server allows to persist the model 
parameters in a distributed manner. It is specially applied in the context of 
large-scale machine learning to train the model. The parameters computation 
will be done with data parallelism across the workers. The data-parallel 
parameter server architecture is illustrated in Figure 2. With the help
of a lightweight parameter server interface [1], we are inspired to provide the 
push and pull methods as internal primitives, i.e., not exposed to the script 
level, allowing to exchange the intermediates among workers.)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
Parameter server allows to persist the model parameters in a distributed 
manner. It is specially applied in the context of large-scale machine learning 
to train the model. The parameters computation will be done with data 
parallelism across the workers. The data-parallel parameter server architecture 
is illustrated in Figure 2. With the help
of a lightweight parameter server interface [1], we are inspired to provide the 
push and pull methods as internal primitives, i.e., not exposed to the script 
level, allowing to exchange the intermediates among workers.

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> Parameter server allows to persist the model parameters in a distributed 
> manner. It is specially applied in the context of large-scale machine 
> learning to train the model. The parameters computation will be done with 
> data parallelism across the workers. The data-parallel parameter server 
> architecture is illustrated in Figure 2. With the help
> of a lightweight parameter server interface [1], we are inspired to provide 
> the push and pull methods as internal primitives, i.e., not exposed to the 
> script level, allowing to exchange the intermediates among workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2084) Implementation of language and compiler extension

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2084:

Description: This part aims to add an additional language support for the 
“paramserv” function in order to be able to compile this new function. Since 
SystemML already supports the parameterized builtin function, we can easily 
extend an additional operation type and generate a new instruction for the 
“paramserv” function. Recently, we have also added a new “eval” built-in 
function which is capable to pass a function pointer as argument so that it can 
be called in runtime. Similar to it, we would need to extend the 
inter-procedural analysis to avoid removing unused constructed functions in the 
presence of second-order “paramserv” function. Because the referenced 
functions, i.e., the aggregate function and update function, should be present 
in runtime.

> Implementation of language and compiler extension
> -
>
> Key: SYSTEMML-2084
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2084
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to add an additional language support for the “paramserv” 
> function in order to be able to compile this new function. Since SystemML 
> already supports the parameterized builtin function, we can easily extend an 
> additional operation type and generate a new instruction for the “paramserv” 
> function. Recently, we have also added a new “eval” built-in function which 
> is capable to pass a function pointer as argument so that it can be called in 
> runtime. Similar to it, we would need to extend the inter-procedural analysis 
> to avoid removing unused constructed functions in the presence of 
> second-order “paramserv” function. Because the referenced functions, i.e., 
> the aggregate function and update function, should be present in runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: The objective of “paramserv” built-in function is to update an 
initial or existing model with configuration. An initial function signature 
would be _model'=paramserv(model, X, y, X_val, y_val, g_cal_fun, upd=fun1, 
mode=SYNC, freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, 
checkpointing=rollback)_. We are interested in providing the model, the 
training features and labels, the validation features and labels, the gradient 
calculation function, the batch update function, the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
batch), the aggregation function, the number of epoch, the batch size, the 
degree of parallelism as well as the checkpointing strategy (e.g. rollback 
recovery).  (was: The objective of “paramserv” built-in function is to update 
an initial or existing model with configuration. An initial function signature 
is illustrated in Figure 1. We are interested in providing the model, the 
training features and labels, the validation features and labels, the gradient 
calculation function, the batch update function, the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
batch), the aggregation function, the number of epoch, the batch size, the 
degree of parallelism as well as the checkpointing strategy (e.g. rollback 
recovery).)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, g_cal_fun, upd=fun1, mode=SYNC, 
> freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, 
> checkpointing=rollback)_. We are interested in providing the model, the 
> training features and labels, the validation features and labels, the 
> gradient calculation function, the batch update function, the update strategy 
> (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. 
> epoch or batch), the aggregation function, the number of epoch, the batch 
> size, the degree of parallelism as well as the checkpointing strategy (e.g. 
> rollback recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: The objective of “paramserv” built-in function is to update an 
initial or existing model with configuration. An initial function signature is 
illustrated in Figure 1. We are interested in providing the model, the training 
features and labels, the validation features and labels, the gradient 
calculation function, the batch update function, the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
batch), the aggregation function, the number of epoch, the batch size, the 
degree of parallelism as well as the checkpointing strategy (e.g. rollback 
recovery).  (was: The objective of “paramserv” built-in function is to update 
an initial or existing model with configuration. An initial function signature 
is illustrated in Figure 1. We are interested in providing the model, the 
training features and labels, the validation features and labels, the batch 
update function, the update strategy (e.g. sync, async, hogwild!, 
stale-synchronous), the update frequency (e.g. epoch or batch), the aggregation 
function, the number of epoch, the batch size, the degree of parallelism as 
well as the checkpointing strategy (e.g. rollback recovery).)

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature is 
> illustrated in Figure 1. We are interested in providing the model, the 
> training features and labels, the validation features and labels, the 
> gradient calculation function, the batch update function, the update strategy 
> (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. 
> epoch or batch), the aggregation function, the number of epoch, the batch 
> size, the degree of parallelism as well as the checkpointing strategy (e.g. 
> rollback recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: The objective of “paramserv” built-in function is to update an 
initial or existing model with configuration. An initial function signature is 
illustrated in Figure 1. We are interested in providing the model, the training 
features and labels, the validation features and labels, the batch update 
function, the update strategy (e.g. sync, async, hogwild!, stale-synchronous), 
the update frequency (e.g. epoch or batch), the aggregation function, the 
number of epoch, the batch size, the degree of parallelism as well as the 
checkpointing strategy (e.g. rollback recovery).

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature is 
> illustrated in Figure 1. We are interested in providing the model, the 
> training features and labels, the validation features and labels, the batch 
> update function, the update strategy (e.g. sync, async, hogwild!, 
> stale-synchronous), the update frequency (e.g. epoch or batch), the 
> aggregation function, the number of epoch, the batch size, the degree of 
> parallelism as well as the checkpointing strategy (e.g. rollback recovery).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)