[ https://issues.apache.org/jira/browse/SUBMARINE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168293#comment-17168293 ]
Wangda Tan commented on SUBMARINE-548: -------------------------------------- [~jotjohnting], thanks for working on this, I just reviewed [https://github.com/apache/submarine/pull/351] I think we missed some part in the design: The design doc: [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#predefined-experiment-template-api-to-run-experiment] defined the spec of how to submit a pre-defined template, which will be sufficient for submission from CLI/REST/UI. However, it is not enough to *register/define* a pre-defined template. The differences between register and submission a pre-defined template are: * *Register* an experiment-template requires information of how Submarine can run the experiment, for example, it needs to include: resources required for worker; environment (docker image, conda kernel); commandline options for workers/ps, etc. * In contrast, *submit* an experiment-template only requires filling required/optional parameters. So to register a pre-defined template, we need to *not only* include ExperimentTemplate, but also, we need to tell how Submarine can run it. *So the predefined template registration should include the following:* *1) A template of Experiment yaml, for example, if we take an experiment example from our* doc: [https://github.com/apache/submarine/blob/master/docs/userdocs/k8s/run-tensorflow-experiment.md] {code:java} meta: name: "tf-mnist-yaml" namespace: "default" framework: "TensorFlow" cmd: "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150" envVars: ENV_1: "ENV1" environment: image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0" spec: Ps: replicas: 1 resources: "cpu=1,memory=1024M" Worker: replicas: 1 resources: "cpu=1,memory=1024M" {code} We can create a template of the YAML (with placeholders) using syntax like: {code:java} meta: name: {{name}} namespace: "default" framework: "TensorFlow" cmd: "python /var/tf_mnist/mnist_with_summaries.py --input {{input}} --log_dir=/train/log --learning_rate={{training.learning_rate}} --batch_size={{training.batch_size}}" envVars: ENV_1: "ENV1" environment: image: "gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0" spec: Ps: replicas: 1 resources: "cpu=1,memory=1024M" Worker: replicas: 1 resources: "cpu=1,memory=1024M" {code} The above template defined 3 variables (placeholders): * name * input * training.learning_rate. * training.batch_size (The above YAML placeholder is based on [https://stackoverflow.com/a/41620747)] *2) A list of parameters (Similar to ExperimentTemplate)* *So I think we need the following object:* *a. RegisterExperimentTemplateSpec* {code:java} { template_name: Name of the template experiment_spec: the spec for experiment with placeholders. parameters: List of parameters definition } {code} *a. SubmissionExperimentTemplateSpec* {code:java} { experiment_name: Name of the running experiment template_name: Name of the template parameters: List of parameters (with values) } {code} Does this make sense? cc: [~pingsutw], [~ztang] for suggestions. > [Umbrella] Predefined Experiment > -------------------------------- > > Key: SUBMARINE-548 > URL: https://issues.apache.org/jira/browse/SUBMARINE-548 > Project: Apache Submarine > Issue Type: New Feature > Components: experiment template > Reporter: JohnTing > Assignee: JohnTing > Priority: Major > Fix For: 0.5.0 > > > Predefined-experiment features > * [API] Define Experiment API for pre-defined template > * [SDK] Add Python SDK to support pre-defined experiment > * [UI] Allow Run pre-defined experiment > * [API] Define Swagger API for pre-defined template submission > * [API] Define Swagger API for pre-defined template registration/delete, etc. > * [Sever] Support submit pre-defined template, and translate it to actual job > [https://github.com/apache/submarine/blob/master/docs/design/experiment-implementation.md#support-predefined-experiment-templates] > [https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org For additional commands, e-mail: dev-h...@submarine.apache.org