Kai-Hsun Chen created SUBMARINE-949: ---------------------------------------
Summary: [Umbrella] Refactor and stabilize experiment service in submarine-server Key: SUBMARINE-949 URL: https://issues.apache.org/jira/browse/SUBMARINE-949 Project: Apache Submarine Issue Type: Improvement Components: Backend Server, experiment Reporter: Kai-Hsun Chen Assignee: Kai-Hsun Chen Fix For: 0.6.0 Now, the experiment service is the most important feature in Apache Submarine. However, the service is not stable and not user-friendly. For example, (1) The frontend workbench cannot reflect the actual experiment status. (ex: OOM) (2) The server misses some constraints in Kubernetes Java Client. (ex: If the experiment name contains the character "_", the k8s java API will throw an exception.) (3) Unexpected out-of-memory error: It is very inconvenient for users to predict the actual memory usage before running the experiment. Thus, using the memory request and memory limit mechanism to allow overcommitment of memory is helpful for users. (4) Allow users to create experiments with the same name, and they can retrieve these experiments with the name. (5) Set different tags on experiments to divide them into categories, and thus users can retrieve these experiments with tags. (6) The K8sSubmitter will submit an experiment to the Kubernetes cluster when it is created, no matter how much resource quota is left. With these reasons, it is necessary to refactor and stabilize experiment service in submarine-server. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org For additional commands, e-mail: dev-h...@submarine.apache.org