[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, features=X, labels=Y, val_features=X_val, val_labels=Y_val, upd="fun1", agg="fun2", mode="BSP", freq="BATCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * features : training features matrix * labels : training label matrix * val_features : validation features matrix * val_labels : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize [optional]: the size of batch, if the update frequence is "EPOCH", this argument will be ignored * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpointing (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, Y, X_val, Y_val, upd="fun1", agg="fun2", mode="BSP", freq="BATCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * Y : training label matrix * X_val : validation features matrix * Y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize [optional]: the size of batch, if the update frequence is "EPOCH", this argument will be ignored * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpointing (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, Y, X_val, Y_val, upd="fun1", agg="fun2", mode="BSP", freq="BATCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * Y : training label matrix * X_val : validation features matrix * Y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize [optional]: the size of batch, if the update frequence is "EPOCH", this argument will be ignored * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpointing (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpointing (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {co
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpointing (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > m
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpointing="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpoint="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BS
[jira] [Updated] (SYSTEMML-2323) Checkpointing
[ https://issues.apache.org/jira/browse/SYSTEMML-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2323: Description: It aims to add the auxiliary checkpointing service. We would like to support types such as NONE, EPOCH, EPOCH10, to indicate at which frequency we perform model checkpointing (was: It aims to add the auxiliary checkpointing service.) > Checkpointing > - > > Key: SYSTEMML-2323 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2323 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > It aims to add the auxiliary checkpointing service. We would like to support > types such as NONE, EPOCH, EPOCH10, to indicate at which frequency we perform > model checkpointing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2322) Local workers
[ https://issues.apache.org/jira/browse/SYSTEMML-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2322: Description: It aims to implement the local workers. And it also contains the data management such as data distribution, program separation via function replication. We would like to support four schemes for data distribution such as disjoint_contiguous (contiguous splits of X and y), disjoint_round_robin (distributed X and y rowwise), disjoint_random, overlap_reshuffle (every worker gets all data but reshuffled in a different random order). (was: It aims to implement the local workers. And it also contains the data management such as data distribution, program separation via function replication.) > Local workers > - > > Key: SYSTEMML-2322 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2322 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > It aims to implement the local workers. And it also contains the data > management such as data distribution, program separation via function > replication. We would like to support four schemes for data distribution such > as disjoint_contiguous (contiguous splits of X and y), disjoint_round_robin > (distributed X and y rowwise), disjoint_random, overlap_reshuffle (every > worker gets all data but reshuffled in a different random order). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Initial version of local backend
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Attachment: (was: ps.png) > Initial version of local backend > > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > A single node parameter server acts as a data-parallel parameter server. A > diagram of the parameter server architecture is shown below. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Initial version of local backend
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Description: It aims to implement the local backend for the paramserv function. (was: A single node parameter server acts as a data-parallel parameter server. A diagram of the parameter server architecture is shown below.) > Initial version of local backend > > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > It aims to implement the local backend for the paramserv function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473556#comment-16473556 ] LI Guobao commented on SYSTEMML-2299: - [~mboehm7] I still have a question about the function design. How could we decide whether the local or spark backend should execute the function? Should we need to specify it explicitly or infer it according to the data size? > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, > scheme="disjoint_contiguous", hyperparam=params, checkpoint="NONE"){code} > We are interested in providing the model (which will be a struct-like data > structure consisting of the weights, the biases and the hyperparameters), the > training features and labels, the validation features and labels, the batch > update function (i.e., gradient calculation func), the update strategy (e.g. > sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch > or mini-batch), the gradient aggregation function, the number of epoch, the > batch size, the degree of parallelism, the data partition scheme, a list of > additional hyper parameters, as well as the checkpointing strategy. And the > function will return a trained model in struct format. > *Inputs*: > * model : a list consisting of the weight and bias matrices > * X : training features matrix > * y : training label matrix > * X_val : validation features matrix > * y_val : validation label matrix > * upd : the name of gradient calculation function > * agg : the name of gradient aggregation function > * mode (options: BSP, ASP, SSP): the updating mode > * freq (options: EPOCH, BATCH): the frequence of updates > * epochs : the number of epoch > * batchsize : the size of batch > * k : the degree of parallelism > * scheme (options: disjoint_contiguous, disjoint_round_robin, > disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how > the data is distributed across workers > * hyperparam [optional]: a list consisting of the additional hyper > parameters, e.g., learning rate, momentum > * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: > the checkpoint strategy, we could set a checkpoint for each epoch or each 10 > epochs > *Output*: > * model' : a list consisting of the updated weight and bias matrices -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2086) Push/pull service
[ https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2086: Description: This part aims to implement the push/pull service for local backend. (was: This part aims to implement the BSP strategy for the local execution backend. The idea is to spawn a thread in CP for running the parameter server. And the workers are also launched in multi-threaded way in CP.) Summary: Push/pull service (was: Initial version of local backend) > Push/pull service > - > > Key: SYSTEMML-2086 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2086 > Project: SystemML > Issue Type: Sub-task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > This part aims to implement the push/pull service for local backend. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Initial version of local backend
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Description: A single node parameter server acts as a data-parallel parameter server. A diagram of the parameter server architecture is shown below. (was: A single node parameter server acts as a data-parallel parameter server. And a multi-node model parallel parameter server will be discussed if time permits. A diagram of the parameter server architecture is shown below.) Summary: Initial version of local backend (was: Single-node parameter server primitives) > Initial version of local backend > > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > Attachments: ps.png > > > A single node parameter server acts as a data-parallel parameter server. A > diagram of the parameter server architecture is shown below. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2324) Synchronization
LI Guobao created SYSTEMML-2324: --- Summary: Synchronization Key: SYSTEMML-2324 URL: https://issues.apache.org/jira/browse/SYSTEMML-2324 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao We also need to implement the synchronization between workers and parameter server to be able to bring more parameter update strategies, e.g., the stale-synchronous strategy needs a hyperparameter "staleness" to define the waiting interval. The idea is to maintain a vector clock recording all workers' clock in the server. Each time when an iteration in side of worker finishes, it waits server to give a signal, i.e., to send a request for calculating the staleness according to the vector clock. And when the server receives the gradients from certain worker, it will increment the vector clock for this worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and SSP as "staleness==N". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Description: A single node parameter server acts as a data-parallel parameter server. And a multi-node model parallel parameter server will be discussed if time permits. A diagram of the parameter server architecture is shown below. was: A single node parameter server acts as a data-parallel parameter server. And a multi-node model parallel parameter server will be discussed if time permits. Synchronization: We also need to implement the synchronization between workers and parameter server to be able to bring more parameter update strategies, e.g., the stale-synchronous strategy needs a hyperparameter "staleness" to define the waiting interval. The idea is to maintain a vector clock recording all workers' clock in the server. Each time when an iteration in side of worker finishes, it waits server to give a signal, i.e., to send a request for calculating the staleness according to the vector clock. And when the server receives the gradients from certain worker, it will increment the vector clock for this worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and SSP as "staleness==N". A diagram of the parameter server architecture is shown below. > Single-node parameter server primitives > --- > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > Attachments: ps.png > > > A single node parameter server acts as a data-parallel parameter server. And > a multi-node model parallel parameter server will be discussed if time > permits. > A diagram of the parameter server architecture is shown below. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Description: A single node parameter server acts as a data-parallel parameter server. And a multi-node model parallel parameter server will be discussed if time permits. Synchronization: We also need to implement the synchronization between workers and parameter server to be able to bring more parameter update strategies, e.g., the stale-synchronous strategy needs a hyperparameter "staleness" to define the waiting interval. The idea is to maintain a vector clock recording all workers' clock in the server. Each time when an iteration in side of worker finishes, it waits server to give a signal, i.e., to send a request for calculating the staleness according to the vector clock. And when the server receives the gradients from certain worker, it will increment the vector clock for this worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and SSP as "staleness==N". A diagram of the parameter server architecture is shown below. was: A single node parameter server acts as a data-parallel parameter server. And a multi-node model parallel parameter server will be discussed if time permits. Push/Pull service: In general, we could launch a parameter server inside (local multi-thread backend) or outside (spark distributed backend) of CP to provide the pull and push service. For the moment, all the weights and biases are saved in a hashmap using a key, e.g., "global parameter". Each worker's gradients will be put into the hashmap seperately with a given key. And the exchange between server and workers will be implemented by TCP. Hence, we could easily broadcast the IP address and the port number to the workers. And then the workers can send the gradients and retrieve the new parameters via TCP socket. The server will also spawn a thread which retrieves the gradients by polling the hashmap using relevant keys and aggregates them. At last, it updates the global parameter in the hashmap. Synchronization: We also need to implement the synchronization between workers and parameter server to be able to bring more parameter update strategies, e.g., the stale-synchronous strategy needs a hyperparameter "staleness" to define the waiting interval. The idea is to maintain a vector clock recording all workers' clock in the server. Each time when an iteration in side of worker finishes, it waits server to give a signal, i.e., to send a request for calculating the staleness according to the vector clock. And when the server receives the gradients from certain worker, it will increment the vector clock for this worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and SSP as "staleness==N". A diagram of the parameter server architecture is shown below. > Single-node parameter server primitives > --- > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > Attachments: ps.png > > > A single node parameter server acts as a data-parallel parameter server. And > a multi-node model parallel parameter server will be discussed if time > permits. > Synchronization: > We also need to implement the synchronization between workers and parameter > server to be able to bring more parameter update strategies, e.g., the > stale-synchronous strategy needs a hyperparameter "staleness" to define the > waiting interval. The idea is to maintain a vector clock recording all > workers' clock in the server. Each time when an iteration in side of worker > finishes, it waits server to give a signal, i.e., to send a request for > calculating the staleness according to the vector clock. And when the server > receives the gradients from certain worker, it will increment the vector > clock for this worker. So we could define BSP as "staleness==0", ASP as > "staleness==-1" and SSP as "staleness==N". > A diagram of the parameter server architecture is shown below. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2323) Checkpointing
[ https://issues.apache.org/jira/browse/SYSTEMML-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2323: Description: It aims to add the auxiliary checkpointing service. (was: It aims to add the auxilary checkpointing service.) > Checkpointing > - > > Key: SYSTEMML-2323 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2323 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > It aims to add the auxiliary checkpointing service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2323) Checkpointing
LI Guobao created SYSTEMML-2323: --- Summary: Checkpointing Key: SYSTEMML-2323 URL: https://issues.apache.org/jira/browse/SYSTEMML-2323 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to add the auxilary checkpointing service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2322) Local workers
LI Guobao created SYSTEMML-2322: --- Summary: Local workers Key: SYSTEMML-2322 URL: https://issues.apache.org/jira/browse/SYSTEMML-2322 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to implement the local workers. And it also contains the data management such as data distribution, program separation via function replication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2321) Aggregation service
LI Guobao created SYSTEMML-2321: --- Summary: Aggregation service Key: SYSTEMML-2321 URL: https://issues.apache.org/jira/browse/SYSTEMML-2321 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao The aggregation service is independant of local or remote workers. It is responsible for executing the parameter updating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend
[ https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2087: Description: This part aims to implement the parameter server for spark distributed backend. In general, we could launch a parameter server in a host to provide the pull and push service. For the moment, all the weights and biases are saved in a hashmap using a key, e.g., "global parameter". Each worker's gradients will be put into the hashmap seperately with a given key. And the exchange between server and workers will be implemented by netty RPC. Hence, we could easily broadcast the IP address and the port number to the workers. And then the workers can send the gradients and retrieve the new parameters via netty RPC. The server will also spawn a thread which retrieves the gradients by polling the hashmap using relevant keys and aggregates them. At last, it updates the global parameter in the hashmap. (was: This part aims to implement the parameter server for spark distributed backend. In general, we could launch a parameter server in a host to provide the pull and push service. For the moment, all the weights and biases are saved in a hashmap using a key, e.g., "global parameter". Each worker's gradients will be put into the hashmap seperately with a given key. And the exchange between server and workers will be implemented by netty RPC. Hence, we could easily broadcast the IP address and the port number to the workers. And then the workers can send the gradients and retrieve the new parameters via TCP socket. The server will also spawn a thread which retrieves the gradients by polling the hashmap using relevant keys and aggregates them. At last, it updates the global parameter in the hashmap.) > Initial version of distributed spark backend > > > Key: SYSTEMML-2087 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2087 > Project: SystemML > Issue Type: Sub-task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > This part aims to implement the parameter server for spark distributed > backend. In general, we could launch a parameter server in a host to provide > the pull and push service. For the moment, all the weights and biases are > saved in a hashmap using a key, e.g., "global parameter". Each worker's > gradients will be put into the hashmap seperately with a given key. And the > exchange between server and workers will be implemented by netty RPC. Hence, > we could easily broadcast the IP address and the port number to the workers. > And then the workers can send the gradients and retrieve the new parameters > via netty RPC. The server will also spawn a thread which retrieves the > gradients by polling the hashmap using relevant keys and aggregates them. At > last, it updates the global parameter in the hashmap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2087) Initial version of distributed spark backend
[ https://issues.apache.org/jira/browse/SYSTEMML-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2087: Description: This part aims to implement the parameter server for spark distributed backend. In general, we could launch a parameter server in a host to provide the pull and push service. For the moment, all the weights and biases are saved in a hashmap using a key, e.g., "global parameter". Each worker's gradients will be put into the hashmap seperately with a given key. And the exchange between server and workers will be implemented by netty RPC. Hence, we could easily broadcast the IP address and the port number to the workers. And then the workers can send the gradients and retrieve the new parameters via TCP socket. The server will also spawn a thread which retrieves the gradients by polling the hashmap using relevant keys and aggregates them. At last, it updates the global parameter in the hashmap. (was: This part aims to implement the BSP for spark distributed backend. Hence the idea is to be able to launch a remote parameter server and the workers.) > Initial version of distributed spark backend > > > Key: SYSTEMML-2087 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2087 > Project: SystemML > Issue Type: Sub-task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > This part aims to implement the parameter server for spark distributed > backend. In general, we could launch a parameter server in a host to provide > the pull and push service. For the moment, all the weights and biases are > saved in a hashmap using a key, e.g., "global parameter". Each worker's > gradients will be put into the hashmap seperately with a given key. And the > exchange between server and workers will be implemented by netty RPC. Hence, > we could easily broadcast the IP address and the port number to the workers. > And then the workers can send the gradients and retrieve the new parameters > via TCP socket. The server will also spawn a thread which retrieves the > gradients by polling the hashmap using relevant keys and aggregates them. At > last, it updates the global parameter in the hashmap. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives
[ https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2085: Issue Type: Technical task (was: Sub-task) > Single-node parameter server primitives > --- > > Key: SYSTEMML-2085 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2085 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > Attachments: ps.png > > > A single node parameter server acts as a data-parallel parameter server. And > a multi-node model parallel parameter server will be discussed if time > permits. > Push/Pull service: > In general, we could launch a parameter server inside (local multi-thread > backend) or outside (spark distributed backend) of CP to provide the pull and > push service. For the moment, all the weights and biases are saved in a > hashmap using a key, e.g., "global parameter". Each worker's gradients will > be put into the hashmap seperately with a given key. And the exchange between > server and workers will be implemented by TCP. Hence, we could easily > broadcast the IP address and the port number to the workers. And then the > workers can send the gradients and retrieve the new parameters via TCP > socket. The server will also spawn a thread which retrieves the gradients by > polling the hashmap using relevant keys and aggregates them. At last, it > updates the global parameter in the hashmap. > Synchronization: > We also need to implement the synchronization between workers and parameter > server to be able to bring more parameter update strategies, e.g., the > stale-synchronous strategy needs a hyperparameter "staleness" to define the > waiting interval. The idea is to maintain a vector clock recording all > workers' clock in the server. Each time when an iteration in side of worker > finishes, it waits server to give a signal, i.e., to send a request for > calculating the staleness according to the vector clock. And when the server > receives the gradients from certain worker, it will increment the vector > clock for this worker. So we could define BSP as "staleness==0", ASP as > "staleness==-1" and SSP as "staleness==N". > A diagram of the parameter server architecture is shown below. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2084) Implementation of language and compiler extension
[ https://issues.apache.org/jira/browse/SYSTEMML-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2084: Issue Type: Technical task (was: Sub-task) > Implementation of language and compiler extension > - > > Key: SYSTEMML-2084 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2084 > Project: SystemML > Issue Type: Technical task >Reporter: Matthias Boehm >Assignee: LI Guobao >Priority: Major > > This part aims to add an additional language support for the “paramserv” > function in order to be able to compile this new function. Since SystemML > already supports the parameterized builtin function, we can easily extend an > additional operation type and generate a new instruction for the “paramserv” > function. Recently, we have also added a new “eval” built-in function which > is capable to pass a function pointer as argument so that it can be called in > runtime. Similar to it, we would need to extend the inter-procedural analysis > to avoid removing unused constructed functions in the presence of > second-order “paramserv” function. Because the referenced functions, i.e., > the aggregate function and update function, should be present in runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2320) Parfor integration
LI Guobao created SYSTEMML-2320: --- Summary: Parfor integration Key: SYSTEMML-2320 URL: https://issues.apache.org/jira/browse/SYSTEMML-2320 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to garanti the robustness for the case that the paramserv function is used inside a parfor statement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2319) IPA integration
LI Guobao created SYSTEMML-2319: --- Summary: IPA integration Key: SYSTEMML-2319 URL: https://issues.apache.org/jira/browse/SYSTEMML-2319 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to extend the IPA to avoid removing the referenced functions due to the fact that the paramserv function is a second-order function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2318) Hops, lops, instruction generation
LI Guobao created SYSTEMML-2318: --- Summary: Hops, lops, instruction generation Key: SYSTEMML-2318 URL: https://issues.apache.org/jira/browse/SYSTEMML-2318 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to implement the extension of hops, lops and instruction for the new paramserv function. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (SYSTEMML-2317) Implementation of language extension
LI Guobao created SYSTEMML-2317: --- Summary: Implementation of language extension Key: SYSTEMML-2317 URL: https://issues.apache.org/jira/browse/SYSTEMML-2317 Project: SystemML Issue Type: Sub-task Reporter: LI Guobao Assignee: LI Guobao It aims to extend the parsing and validation at language level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", hyperparam=params, checkpoint="NONE"){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", fre
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs *Output*: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs Output: * model' : a list consisting of the updated weight and bias matrices > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", freq="EPO
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs Output: * model' : a list consisting of the updated weight and bias matrices was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, > scheme=disjoint_contiguous, hyperparam=par
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model : a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam [optional]: a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE(default), EPOCH, EPOCH10) [optional]: the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model [: a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam : a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE, EPOCH, EPOCH10): the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, > scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} > We are interested in providing the model (which will be a struct-like data >
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism, the data partition scheme, a list of additional hyper parameters, as well as the checkpointing strategy. And the function will return a trained model in struct format. *Inputs*: * model [: a list consisting of the weight and bias matrices * X : training features matrix * y : training label matrix * X_val : validation features matrix * y_val : validation label matrix * upd : the name of gradient calculation function * agg : the name of gradient aggregation function * mode (options: BSP, ASP, SSP): the updating mode * freq (options: EPOCH, BATCH): the frequence of updates * epochs : the number of epoch * batchsize : the size of batch * k : the degree of parallelism * scheme (options: disjoint_contiguous, disjoint_round_robin, disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how the data is distributed across workers * hyperparam : a list consisting of the additional hyper parameters, e.g., learning rate, momentum * checkpoint (options: NONE, EPOCH, EPOCH10): the checkpoint strategy, we could set a checkpoint for each epoch or each 10 epochs was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism as well as the checkpointing strategy (e.g. rollback recovery). And the function will return a trained model in struct format. > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd="fun1", agg="fun2", > mode="BSP", freq="EPOCH", epochs=100, batchsize=64, k=7, > scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} > We are interested in providing the model (which will be a struct-like data > structure consisting of the weights, the biases and the hyperparameters), the > training features and labels, the validation features and labels, the batch > update function (i.e., gradient calculation func), the update strategy (e.g. > sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch > or mini-batch), the gradient aggregation function, the number of epoch, the > batch size, the degree of parallelism, the data partition scheme, a list of > additional hyper parameters, as well as the checkpointing strategy. And the > function will return a trained model in struct format. > *Inputs*: > * model [: a list consisting of the weight and bias matrices > * X : training features matrix > * y : training label matrix > * X_val : validation features matrix > * y_val : validation label matrix > * upd : the name of gradient calculation function > * agg : the name of gradient aggregation function > * mode (options: BSP, ASP, SSP): the updating mode > * freq (options
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism as well as the checkpointing strategy (e.g. rollback recovery). And the function will return a trained model in struct format. was: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism as well as the checkpointing strategy (e.g. rollback recovery). And the function will return a trained model in struct format. > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, > freq=EPOCH, epochs=100, batchsize=64, k=7, scheme=disjoint_contiguous, > hyperparam=params, checkpoint=NONE){code} > > We are interested in providing the model (which will be a struct-like data > structure consisting of the weights, the biases and the hyperparameters), the > training features and labels, the validation features and labels, the batch > update function (i.e., gradient calculation func), the update strategy (e.g. > sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch > or mini-batch), the gradient aggregation function, the number of epoch, the > batch size, the degree of parallelism as well as the checkpointing strategy > (e.g. rollback recovery). And the function will return a trained model in > struct format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (SYSTEMML-2299) API design of the paramserv function
[ https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI Guobao updated SYSTEMML-2299: Description: The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be: {code:java} model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, checkpoint=NONE){code} We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism as well as the checkpointing strategy (e.g. rollback recovery). And the function will return a trained model in struct format. was:The objective of “paramserv” built-in function is to update an initial or existing model with configuration. An initial function signature would be _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are interested in providing the model (which will be a struct-like data structure consisting of the weights, the biases and the hyperparameters), the training features and labels, the validation features and labels, the batch update function (i.e., gradient calculation func), the update strategy (e.g. sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or mini-batch), the gradient aggregation function, the number of epoch, the batch size, the degree of parallelism as well as the checkpointing strategy (e.g. rollback recovery). And the function will return a trained model in struct format. > API design of the paramserv function > > > Key: SYSTEMML-2299 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2299 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > > The objective of “paramserv” built-in function is to update an initial or > existing model with configuration. An initial function signature would be: > > {code:java} > model'=paramserv(model, X, y, X_val, y_val, upd=fun1, agg=fun2, mode=BSP, > freq=EPOCH, epochs=100, batchsize=64, k=7, hyperparam=params, > checkpoint=NONE){code} > > We are interested in providing the model (which will be a struct-like data > structure consisting of the weights, the biases and the hyperparameters), the > training features and labels, the validation features and labels, the batch > update function (i.e., gradient calculation func), the update strategy (e.g. > sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch > or mini-batch), the gradient aggregation function, the number of epoch, the > batch size, the degree of parallelism as well as the checkpointing strategy > (e.g. rollback recovery). And the function will return a trained model in > struct format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)