[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, features=X, labels=Y, val_features=X_val, 
val_labels=Y_val, upd="fun1", agg="fun2", mode="BSP", freq="BATCH", epochs=100, 
batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, 
checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * features : training features matrix
 * labels : training label matrix
 * val_features : validation features matrix
 * val_labels : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize  [optional]: the size of batch, if the update frequence 
is "EPOCH", this argument will be ignored
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpointing  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, Y, X_val, Y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="BATCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * Y : training label matrix
 * X_val : validation features matrix
 * Y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize  [optional]: the size of batch, if the update frequence 
is "EPOCH", this argument will be ignored
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpointing  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective 

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, Y, X_val, Y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="BATCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * Y : training label matrix
 * X_val : validation features matrix
 * Y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize  [optional]: the size of batch, if the update frequence 
is "EPOCH", this argument will be ignored
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpointing  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpointing  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {co

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpointing  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> m

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpointing="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpoint="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BS

[jira] [Updated] (SYSTEMML-2323) Checkpointing

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2323:

Description: It aims to add the auxiliary checkpointing service. We would 
like to support types such as NONE, EPOCH, EPOCH10, to indicate at which 
frequency we perform model checkpointing  (was: It aims to add the auxiliary 
checkpointing service.)

> Checkpointing
> -
>
> Key: SYSTEMML-2323
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2323
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> It aims to add the auxiliary checkpointing service. We would like to support 
> types such as NONE, EPOCH, EPOCH10, to indicate at which frequency we perform 
> model checkpointing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2322) Local workers

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2322:

Description: It aims to implement the local workers. And it also contains 
the data management such as data distribution, program separation via function 
replication. We would like to support four schemes for data distribution such 
as disjoint_contiguous (contiguous splits of X and y), disjoint_round_robin 
(distributed X and y rowwise), disjoint_random, overlap_reshuffle (every worker 
gets all data but reshuffled in a different random order).  (was: It aims to 
implement the local workers. And it also contains the data management such as 
data distribution, program separation via function replication.)

> Local workers
> -
>
> Key: SYSTEMML-2322
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2322
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> It aims to implement the local workers. And it also contains the data 
> management such as data distribution, program separation via function 
> replication. We would like to support four schemes for data distribution such 
> as disjoint_contiguous (contiguous splits of X and y), disjoint_round_robin 
> (distributed X and y rowwise), disjoint_random, overlap_reshuffle (every 
> worker gets all data but reshuffled in a different random order).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Initial version of local backend

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: (was: ps.png)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> A single node parameter server acts as a data-parallel parameter server. A 
> diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Initial version of local backend

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: It aims to implement the local backend for the paramserv 
function.  (was: A single node parameter server acts as a data-parallel 
parameter server. A diagram of the parameter server architecture is shown 
below.)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> It aims to implement the local backend for the paramserv function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473556#comment-16473556
 ] 

LI Guobao commented on SYSTEMML-2299:
-

[~mboehm7] I still have a question about the function design. How could we 
decide whether the local or spark backend should execute the function? Should 
we need to specify it explicitly or infer it according to the data size?

> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, 
> scheme="disjoint_contiguous", hyperparam=params, checkpoint="NONE"){code}
> We are interested in providing the model (which will be a struct-like data 
> structure consisting of the weights, the biases and the hyperparameters), the 
> training features and labels, the validation features and labels, the batch 
> update function (i.e., gradient calculation func), the update strategy (e.g. 
> sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch 
> or mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism, the data partition scheme, a list of 
> additional hyper parameters, as well as the checkpointing strategy. And the 
> function will return a trained model in struct format.
> *Inputs*:
>  * model : a list consisting of the weight and bias matrices
>  * X : training features matrix
>  * y : training label matrix
>  * X_val : validation features matrix
>  * y_val : validation label matrix
>  * upd : the name of gradient calculation function
>  * agg : the name of gradient aggregation function
>  * mode  (options: BSP, ASP, SSP): the updating mode
>  * freq  (options: EPOCH, BATCH): the frequence of updates
>  * epochs : the number of epoch
>  * batchsize : the size of batch
>  * k : the degree of parallelism
>  * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
> disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
> the data is distributed across workers
>  * hyperparam  [optional]: a list consisting of the additional hyper 
> parameters, e.g., learning rate, momentum
>  * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: 
> the checkpoint strategy, we could set a checkpoint for each epoch or each 10 
> epochs 
> *Output*:
>  * model' : a list consisting of the updated weight and bias matrices



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2086) Push/pull service

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:

Description: This part aims to implement the push/pull service for local 
backend.  (was: This part aims to implement the BSP strategy for the local 
execution backend. The idea is to spawn a thread in CP for running the 
parameter server. And the workers are also launched in multi-threaded way in 
CP.)
Summary: Push/pull service  (was: Initial version of local backend)

> Push/pull service
> -
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the push/pull service for local backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Initial version of local backend

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: A single node parameter server acts as a data-parallel 
parameter server. A diagram of the parameter server architecture is shown 
below.  (was: A single node parameter server acts as a data-parallel parameter 
server. And a multi-node model parallel parameter server will be discussed if 
time permits. 

A diagram of the parameter server architecture is shown below.)
Summary: Initial version of local backend  (was: Single-node parameter 
server primitives)

> Initial version of local backend
> 
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. A 
> diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2324) Synchronization

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2324:
---

 Summary: Synchronization
 Key: SYSTEMML-2324
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2324
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Push/Pull service: 

In general, we could launch a parameter server inside (local multi-thread 
backend) or outside (spark distributed backend) of CP to provide the pull and 
push service. For the moment, all the weights and biases are saved in a hashmap 
using a key, e.g., "global parameter". Each worker's gradients will be put into 
the hashmap seperately with a given key. And the exchange between server and 
workers will be implemented by TCP. Hence, we could easily broadcast the IP 
address and the port number to the workers. And then the workers can send the 
gradients and retrieve the new parameters via TCP socket. The server will also 
spawn a thread which retrieves the gradients by polling the hashmap using 
relevant keys and aggregates them. At last, it updates the global parameter in 
the hashmap.

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2323) Checkpointing

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2323:

Description: It aims to add the auxiliary checkpointing service.  (was: It 
aims to add the auxilary checkpointing service.)

> Checkpointing
> -
>
> Key: SYSTEMML-2323
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2323
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> It aims to add the auxiliary checkpointing service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2323) Checkpointing

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2323:
---

 Summary: Checkpointing
 Key: SYSTEMML-2323
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2323
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to add the auxilary checkpointing service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2322) Local workers

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2322:
---

 Summary: Local workers
 Key: SYSTEMML-2322
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2322
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to implement the local workers. And it also contains the data 
management such as data distribution, program separation via function 
replication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2321) Aggregation service

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2321:
---

 Summary: Aggregation service
 Key: SYSTEMML-2321
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2321
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


The aggregation service is independant of local or remote workers. It is 
responsible for executing the parameter updating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2087:

Description: This part aims to implement the parameter server for spark 
distributed backend. In general, we could launch a parameter server in a host 
to provide the pull and push service. For the moment, all the weights and 
biases are saved in a hashmap using a key, e.g., "global parameter". Each 
worker's gradients will be put into the hashmap seperately with a given key. 
And the exchange between server and workers will be implemented by netty RPC. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via netty RPC. The server will also spawn a thread which retrieves 
the gradients by polling the hashmap using relevant keys and aggregates them. 
At last, it updates the global parameter in the hashmap.  (was: This part aims 
to implement the parameter server for spark distributed backend. In general, we 
could launch a parameter server in a host to provide the pull and push service. 
For the moment, all the weights and biases are saved in a hashmap using a key, 
e.g., "global parameter". Each worker's gradients will be put into the hashmap 
seperately with a given key. And the exchange between server and workers will 
be implemented by netty RPC. Hence, we could easily broadcast the IP address 
and the port number to the workers. And then the workers can send the gradients 
and retrieve the new parameters via TCP socket. The server will also spawn a 
thread which retrieves the gradients by polling the hashmap using relevant keys 
and aggregates them. At last, it updates the global parameter in the hashmap.)

> Initial version of distributed spark backend
> 
>
> Key: SYSTEMML-2087
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the parameter server for spark distributed 
> backend. In general, we could launch a parameter server in a host to provide 
> the pull and push service. For the moment, all the weights and biases are 
> saved in a hashmap using a key, e.g., "global parameter". Each worker's 
> gradients will be put into the hashmap seperately with a given key. And the 
> exchange between server and workers will be implemented by netty RPC. Hence, 
> we could easily broadcast the IP address and the port number to the workers. 
> And then the workers can send the gradients and retrieve the new parameters 
> via netty RPC. The server will also spawn a thread which retrieves the 
> gradients by polling the hashmap using relevant keys and aggregates them. At 
> last, it updates the global parameter in the hashmap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2087:

Description: This part aims to implement the parameter server for spark 
distributed backend. In general, we could launch a parameter server in a host 
to provide the pull and push service. For the moment, all the weights and 
biases are saved in a hashmap using a key, e.g., "global parameter". Each 
worker's gradients will be put into the hashmap seperately with a given key. 
And the exchange between server and workers will be implemented by netty RPC. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. The server will also spawn a thread which retrieves 
the gradients by polling the hashmap using relevant keys and aggregates them. 
At last, it updates the global parameter in the hashmap.  (was: This part aims 
to implement the BSP for spark distributed backend. Hence the idea is to be 
able to launch a remote parameter server and the workers.)

> Initial version of distributed spark backend
> 
>
> Key: SYSTEMML-2087
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2087
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to implement the parameter server for spark distributed 
> backend. In general, we could launch a parameter server in a host to provide 
> the pull and push service. For the moment, all the weights and biases are 
> saved in a hashmap using a key, e.g., "global parameter". Each worker's 
> gradients will be put into the hashmap seperately with a given key. And the 
> exchange between server and workers will be implemented by netty RPC. Hence, 
> we could easily broadcast the IP address and the port number to the workers. 
> And then the workers can send the gradients and retrieve the new parameters 
> via TCP socket. The server will also spawn a thread which retrieves the 
> gradients by polling the hashmap using relevant keys and aggregates them. At 
> last, it updates the global parameter in the hashmap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Issue Type: Technical task  (was: Sub-task)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2084) Implementation of language and compiler extension

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2084:

Issue Type: Technical task  (was: Sub-task)

> Implementation of language and compiler extension
> -
>
> Key: SYSTEMML-2084
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2084
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> This part aims to add an additional language support for the “paramserv” 
> function in order to be able to compile this new function. Since SystemML 
> already supports the parameterized builtin function, we can easily extend an 
> additional operation type and generate a new instruction for the “paramserv” 
> function. Recently, we have also added a new “eval” built-in function which 
> is capable to pass a function pointer as argument so that it can be called in 
> runtime. Similar to it, we would need to extend the inter-procedural analysis 
> to avoid removing unused constructed functions in the presence of 
> second-order “paramserv” function. Because the referenced functions, i.e., 
> the aggregate function and update function, should be present in runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2320) Parfor integration

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2320:
---

 Summary: Parfor integration
 Key: SYSTEMML-2320
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2320
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to garanti the robustness for the case that the paramserv function is 
used inside a parfor statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2319) IPA integration

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2319:
---

 Summary: IPA integration
 Key: SYSTEMML-2319
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2319
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to extend the IPA to avoid removing the referenced functions due to the 
fact that the paramserv function is a second-order function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2318) Hops, lops, instruction generation

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2318:
---

 Summary: Hops, lops, instruction generation
 Key: SYSTEMML-2318
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2318
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to implement the extension of hops, lops and instruction for the new 
paramserv function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SYSTEMML-2317) Implementation of language extension

2018-05-13 Thread LI Guobao (JIRA)
LI Guobao created SYSTEMML-2317:
---

 Summary: Implementation of language extension
 Key: SYSTEMML-2317
 URL: https://issues.apache.org/jira/browse/SYSTEMML-2317
 Project: SystemML
  Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao


It aims to extend the parsing and validation at language level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
hyperparam=params, checkpoint="NONE"){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", fre

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

*Output*:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

Output:
 * model' : a list consisting of the updated weight and bias matrices


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", freq="EPO

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

Output:
 * model' : a list consisting of the updated weight and bias matrices

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, 
> scheme=disjoint_contiguous, hyperparam=par

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model : a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam  [optional]: a list consisting of the additional hyper 
parameters, e.g., learning rate, momentum
 * checkpoint  (options: NONE(default), EPOCH, EPOCH10) [optional]: the 
checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs 

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model  [: a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam : a list consisting of the additional hyper parameters, 
e.g., learning rate, momentum
 * checkpoint  (options: NONE, EPOCH, EPOCH10): the checkpoint 
strategy, we could set a checkpoint for each epoch or each 10 epochs 


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, 
> scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code}
> We are interested in providing the model (which will be a struct-like data 
> 

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", 
freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism, the data partition scheme, a list of 
additional hyper parameters, as well as the checkpointing strategy. And the 
function will return a trained model in struct format.

*Inputs*:
 * model  [: a list consisting of the weight and bias matrices
 * X : training features matrix
 * y : training label matrix
 * X_val : validation features matrix
 * y_val : validation label matrix
 * upd : the name of gradient calculation function
 * agg : the name of gradient aggregation function
 * mode  (options: BSP, ASP, SSP): the updating mode
 * freq  (options: EPOCH, BATCH): the frequence of updates
 * epochs : the number of epoch
 * batchsize : the size of batch
 * k : the degree of parallelism
 * scheme  (options: disjoint_contiguous, disjoint_round_robin, 
disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
the data is distributed across workers
 * hyperparam : a list consisting of the additional hyper parameters, 
e.g., learning rate, momentum
 * checkpoint  (options: NONE, EPOCH, EPOCH10): the checkpoint 
strategy, we could set a checkpoint for each epoch or each 10 epochs 

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 

 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
 

We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism as well as the checkpointing strategy (e.g. 
rollback recovery). And the function will return a trained model in struct 
format.


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", 
> mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, 
> scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code}
> We are interested in providing the model (which will be a struct-like data 
> structure consisting of the weights, the biases and the hyperparameters), the 
> training features and labels, the validation features and labels, the batch 
> update function (i.e., gradient calculation func), the update strategy (e.g. 
> sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch 
> or mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism, the data partition scheme, a list of 
> additional hyper parameters, as well as the checkpointing strategy. And the 
> function will return a trained model in struct format.
> *Inputs*:
>  * model  [: a list consisting of the weight and bias matrices
>  * X : training features matrix
>  * y : training label matrix
>  * X_val : validation features matrix
>  * y_val : validation label matrix
>  * upd : the name of gradient calculation function
>  * agg : the name of gradient aggregation function
>  * mode  (options: BSP, ASP, SSP): the updating mode
>  * freq  (options

[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 

 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
hyperparam=params, checkpoint=NONE){code}
 

We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism as well as the checkpointing strategy (e.g. 
rollback recovery). And the function will return a trained model in struct 
format.

  was:
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 

 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, 
checkpoint=NONE){code}
 

We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism as well as the checkpointing strategy (e.g. 
rollback recovery). And the function will return a trained model in struct 
format.


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
>  
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
> freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, 
> hyperparam=params, checkpoint=NONE){code}
>  
> We are interested in providing the model (which will be a struct-like data 
> structure consisting of the weights, the biases and the hyperparameters), the 
> training features and labels, the validation features and labels, the batch 
> update function (i.e., gradient calculation func), the update strategy (e.g. 
> sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch 
> or mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism as well as the checkpointing strategy 
> (e.g. rollback recovery). And the function will return a trained model in 
> struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2299:

Description: 
The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be: 

 
{code:java}
model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, 
checkpoint=NONE){code}
 

We are interested in providing the model (which will be a struct-like data 
structure consisting of the weights, the biases and the hyperparameters), the 
training features and labels, the validation features and labels, the batch 
update function (i.e., gradient calculation func), the update strategy (e.g. 
sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism as well as the checkpointing strategy (e.g. 
rollback recovery). And the function will return a trained model in struct 
format.

  was:The objective of “paramserv” built-in function is to update an initial or 
existing model with configuration. An initial function signature would be 
_model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
interested in providing the model (which will be a struct-like data structure 
consisting of the weights, the biases and the hyperparameters), the training 
features and labels, the validation features and labels, the batch update 
function (i.e., gradient calculation func), the update strategy (e.g. sync, 
async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
mini-batch), the gradient aggregation function, the number of epoch, the batch 
size, the degree of parallelism as well as the checkpointing strategy (e.g. 
rollback recovery). And the function will return a trained model in struct 
format.


> API design of the paramserv function
> 
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: LI Guobao
>Assignee: LI Guobao
>Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
>  
> {code:java}
> model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, 
> freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, 
> checkpoint=NONE){code}
>  
> We are interested in providing the model (which will be a struct-like data 
> structure consisting of the weights, the biases and the hyperparameters), the 
> training features and labels, the validation features and labels, the batch 
> update function (i.e., gradient calculation func), the update strategy (e.g. 
> sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch 
> or mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism as well as the checkpointing strategy 
> (e.g. rollback recovery). And the function will return a trained model in 
> struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)