Kai-Hsun Chen created SUBMARINE-949:
---------------------------------------
Summary: [Umbrella] Refactor and stabilize experiment service in
submarine-server
Key: SUBMARINE-949
URL: https://issues.apache.org/jira/browse/SUBMARINE-949
Project: Apache Submarine
Issue Type: Improvement
Components: Backend Server, experiment
Reporter: Kai-Hsun Chen
Assignee: Kai-Hsun Chen
Fix For: 0.6.0
Now, the experiment service is the most important feature in Apache Submarine.
However, the service is not stable and not user-friendly. For example,
(1) The frontend workbench cannot reflect the actual experiment status. (ex:
OOM)
(2) The server misses some constraints in Kubernetes Java Client. (ex: If the
experiment name contains the character "_", the k8s java API will throw an
exception.)
(3) Unexpected out-of-memory error: It is very inconvenient for users to
predict the actual memory usage before running the experiment. Thus, using the
memory request and memory limit mechanism to allow overcommitment of memory is
helpful for users.
(4) Allow users to create experiments with the same name, and they can retrieve
these experiments with the name.
(5) Set different tags on experiments to divide them into categories, and thus
users can retrieve these experiments with tags.
(6) The K8sSubmitter will submit an experiment to the Kubernetes cluster when
it is created, no matter how much resource quota is left.
With these reasons, it is necessary to refactor and stabilize experiment
service in submarine-server.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]