[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

2020-07-30 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168293#comment-17168293
 ] 

Wangda Tan edited comment on SUBMARINE-548 at 7/30/20, 11:22 PM:
-

[~jotjohnting], thanks for working on this, I just reviewed 
[https://github.com/apache/submarine/pull/351]

I think we missed some part in the design: 

The design doc: 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]
 defined the spec of how to submit a pre-defined template, which will be 
sufficient for submission from CLI/REST/UI. However, it is not enough to 
*register/define* a pre-defined template. 

The differences between register and submission a pre-defined template are: 
 * *Register* an experiment-template requires information of how Submarine can 
run the experiment, for example, it needs to include: resources required for 
worker; environment (docker image, conda kernel); commandline options for 
workers/ps, etc. 
 * In contrast, *submit* an experiment-template only requires filling 
required/optional parameters.

So to register a pre-defined template, we need to *not only* include 
ExperimentTemplate, but also, we need to tell how Submarine can run it. 

*So the predefined template registration should include the following:* 

*1) A template of Experiment yaml, for example, if we take an experiment 
example from our* doc: 
[https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/run-tensorflow-experiment.md]
{code:java}
meta:
  name: "tf-mnist-yaml"
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log 
--learning_rate=0.01 --batch_size=150"
  envVars:
ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
replicas: 1
resources: "cpu=1,memory=1024M"
  Worker:
replicas: 1
resources: "cpu=1,memory=1024M" {code}
We can create a template of the YAML (with placeholders) using syntax like:
{code:java}
meta:
  name: {{name}}
  namespace: "default"
  framework: "TensorFlow"
  cmd: "python /var/tf_mnist/mnist_with_summaries.py --input {{input}} 
--log_dir=/train/log --learning_rate={{training.learning_rate}} 
--batch_size={{training.batch_size}}"
  envVars:
ENV_1: "ENV1"
environment:
  image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
spec:
  Ps:
replicas: 1
resources: "cpu=1,memory=1024M"
  Worker:
replicas: 1
resources: "cpu=1,memory=1024M" {code}
The above template defined 3 variables (placeholders): 
 * name 
 * input
 * training.learning_rate.
 * training.batch_size

(The above YAML placeholder is based on [https://stackoverflow.com/a/41620747)]

*2) A list of parameters (Similar to ExperimentTemplate)*

*So I think we need the following object:* 

*a. RegisterExperimentTemplateSpec*
{code:java}
{
   template_name: Name of the template
   experiment_spec: the spec for experiment with placeholders. 
   parameters: 
  List of parameters definition
} {code}
*a. SubmissionExperimentTemplateSpec*
{code:java}
{
   experiment_name: Name of the running experiment
   template_name: Name of the template
   parameters: 
  List of parameters (with values)
} {code}
Does this make sense? cc: [~pingsutw], [~ztang] for suggestions.

And if we agree with the proposal, we need to update our experiment spec design 
accordingly: 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]


was (Author: wangda):
[~jotjohnting], thanks for working on this, I just reviewed 
[https://github.com/apache/submarine/pull/351]

I think we missed some part in the design: 

The design doc: 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]
 defined the spec of how to submit a pre-defined template, which will be 
sufficient for submission from CLI/REST/UI. However, it is not enough to 
*register/define* a pre-defined template. 

The differences between register and submission a pre-defined template are: 
 * *Register* an experiment-template requires information of how Submarine can 
run the experiment, for example, it needs to include: resources required for 
worker; environment (docker image, conda kernel); commandline options for 
workers/ps, etc. 
 * In contrast, *submit* an experiment-template only requires filling 
required/optional parameters.

So to register a pre-defined template, we need to *not only* include 
ExperimentTemplate, but also, we need to tell how Submarine can run it. 

*So the predefined template registration should include the following:* 

*1) A template of Experiment yaml, for example, if we take an experiment 
example from our* doc: 
[https://github.com/

[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

2020-08-02 Thread Kevin Su (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169515#comment-17169515
 ] 

Kevin Su edited comment on SUBMARINE-548 at 8/2/20, 11:22 AM:
--

[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below

 
{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
   parameters:
 name: input.train_data
   required: true
   description: > 
 Train data is expected in SVM format, and can be stored in HDFS/S3
 name: training.learning_rate
   required: true
   default: 0.001
   description: > 
 Learning rate for mnist model, default is 0.001
 name: training.batch_size
   required: true
   description: > 
   Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32 
 } {code}

 Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

 

       2. After registering, we will submit a list of parameters to run an 
experiment like below

 
{code:java}
{ 
  experiment_name: 
mnist_example 
  template_name:
mnist_template 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
}{code}
 

 

IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.


was (Author: pingsutw):
[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
*ExperimentTemplateSpec* will look like below

{
   template_name: mnist_template
   experiment_spec: 
 meta:
   name: \{{name}}
   namespace: "default"
   framework: "TensorFlow"
   cmd: "python /var/tf_mnist/mnist_with_summaries.py --input 
\{{input.train_data}} --log_dir=/train/log -- 
 learning_rate=\{{training.learning_rate}} 
--batch_size=\{{training.batch_size}}"
   envVars:
 ENV_1: "ENV1"
 environment:
   image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
 spec:
   Ps:
 replicas: 1
 resources: "cpu=1,memory=1024M"
   Worker:
 replicas: 1
 resources: "cpu=1,memory=1024M"  
   parameters: 
 - name: input.train_data
   required: true
   description: > 
 Train data is expected in SVM format, and can be stored in HDFS/S3
 - name: training.learning_rate
   required: true
   description: > 
 Learning rate for mnist model, default is 0.001
 - name: training.batch_size
   required: true
   description: > 
 Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32 
} 
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

 

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{
   experiment_name: mnist_example
   template_name: mnist_template
   parameters: 
 input.train_data: "hdsf://foo/bar"
 training.learning_rate: 0.01
 training.batch_size: 64
} 
 

IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.

> [Umbrella] Predefined Experiment
> 
>
> Key: SUBMARINE-548
> URL: https://issues.apache.org/jira/browse/SUBMARINE-548
> Project: Apache Submarine
>  Issue Type: New Feature
>  Components: experiment templ

[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

2020-08-02 Thread Kevin Su (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169515#comment-17169515
 ] 

Kevin Su edited comment on SUBMARINE-548 at 8/2/20, 11:23 AM:
--

[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below 

{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
   parameters:
 name: input.train_data
   required: true
   description: 
 "Train data is expected in SVM format, and can be stored in HDFS/S3"
 name: training.learning_rate
   required: true
   default: 0.001
   description:
 "Learning rate for mnist model, default is 0.001"
 name: training.batch_size
   required: true
   description:
 "Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32"
 } {code}
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{code:java}
{ 
  experiment_name: 
mnist_example 
  template_name:
mnist_template 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
} {code}
IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.


was (Author: pingsutw):
[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below

 
{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
   parameters:
 name: input.train_data
   required: true
   description: > 
 Train data is expected in SVM format, and can be stored in HDFS/S3
 name: training.learning_rate
   required: true
   default: 0.001
   description: > 
 Learning rate for mnist model, default is 0.001
 name: training.batch_size
   required: true
   description: > 
   Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32 
 } {code}

 Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

 

       2. After registering, we will submit a list of parameters to run an 
experiment like below

 
{code:java}
{ 
  experiment_name: 
mnist_example 
  template_name:
mnist_template 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
}{code}
 

 

IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.

> [Umbrella] Predefined Experiment
> 
>
> Key: SUBMARINE-548
> URL: https://

[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

2020-08-02 Thread Kevin Su (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169515#comment-17169515
 ] 

Kevin Su edited comment on SUBMARINE-548 at 8/2/20, 11:25 AM:
--

[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below 

{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
 parameters:
   name: input.train_data
 required: true
 description: 
   "Train data is expected in SVM format, and can be stored in HDFS/S3"
   name: training.learning_rate
 required: true
 default: 0.001
 description:
   "Learning rate for mnist model, default is 0.001"
   name: training.batch_size
 required: true
 description:
   "Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32"
 } {code}
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{code:java}
{ 
  experiment_name: "mnist_example"
  template_name: "mnist_template" 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
} {code}
IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.


was (Author: pingsutw):
[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below 

{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
   parameters:
 name: input.train_data
   required: true
   description: 
 "Train data is expected in SVM format, and can be stored in HDFS/S3"
 name: training.learning_rate
   required: true
   default: 0.001
   description:
 "Learning rate for mnist model, default is 0.001"
 name: training.batch_size
   required: true
   description:
 "Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32"
 } {code}
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{code:java}
{ 
  experiment_name: 
mnist_example 
  template_name:
mnist_template 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
} {code}
IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.

> [Umbrella] Predefined Experiment
> 
>
> Key: SUBMARINE-548
> URL: https://issues.apache.org/jira/browse/SUBMARINE-548
>

[jira] [Comment Edited] (SUBMARINE-548) [Umbrella] Predefined Experiment

2020-08-02 Thread Kevin Su (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169515#comment-17169515
 ] 

Kevin Su edited comment on SUBMARINE-548 at 8/2/20, 11:31 AM:
--

[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below 

{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
   ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
 parameters:
   name: input.train_data
 required: true
 description: 
   "Train data is expected in SVM format, and can be stored in HDFS/S3"
   name: training.learning_rate
 required: true
 default: 0.001
 description:
   "Learning rate for mnist model, default is 0.001"
   name: training.batch_size
 required: true
 description:
   "Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32"
 } {code}
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{code:java}
{ 
  experiment_name: "mnist_example"
  template_name: "mnist_template" 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
} {code}
IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.


was (Author: pingsutw):
[~wangda], A few questions, make sure I understand it correctly.
 # If I want to use a predefined template to submit an experiment, we would 
register the *ExperimentTemplateSpec* first.
 *ExperimentTemplateSpec* will look like below 

{code:java}
{
 template_name: mnist_template
 experiment_spec: 
   meta:
 name: name
 namespace: "default"
 framework: "TensorFlow"
 cmd: "python /var/tf_mnist/mnist_with_summaries.py
   --input {{input.train_data }}
   --log_dir=/train/log
   --learning_rate={{training.learning_rate}}
   --batch_size={{training.batch_size}}"
 envVars:
 ENV_1: "ENV1"
   environment:
 image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0"
   spec:
 Ps:
   replicas: 1
   resources: "cpu=1,memory=1024M"
 Worker:
   replicas: 1
   resources: "cpu=1,memory=1024M" 
 parameters:
   name: input.train_data
 required: true
 description: 
   "Train data is expected in SVM format, and can be stored in HDFS/S3"
   name: training.learning_rate
 required: true
 default: 0.001
 description:
   "Learning rate for mnist model, default is 0.001"
   name: training.batch_size
 required: true
 description:
   "Integer or `None`. Number of samples per gradient update. If 
unspecified, `batch_size` will default to 32"
 } {code}
Should we add *Author* and *description* in *ExperimentTemplateSpec,* as 
mention in 
[https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experimen|https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment]

       2. After registering, we will submit a list of parameters to run an 
experiment like below
{code:java}
{ 
  experiment_name: "mnist_example"
  template_name: "mnist_template" 
  parameters:
input.train_data: "hdsf://foo/bar" 
training.learning_rate: 0.01 
training.batch_size: 64
} {code}
IIUC, It's a great proposal that users could very easily submit an experiment 
with a list of parameters, and no need to worry about other system resources 
and the environment.

> [Umbrella] Predefined Experiment
> 
>
> Key: SUBMARINE-548
> URL: https://issues.apache.org/jira/browse/SUBMARINE-548
> Project: Apache Submari