Kai-Hsun Chen created SUBMARINE-949:
---------------------------------------

             Summary: [Umbrella] Refactor and stabilize experiment service in 
submarine-server
                 Key: SUBMARINE-949
                 URL: https://issues.apache.org/jira/browse/SUBMARINE-949
             Project: Apache Submarine
          Issue Type: Improvement
          Components: Backend Server, experiment
            Reporter: Kai-Hsun Chen
            Assignee: Kai-Hsun Chen
             Fix For: 0.6.0


Now, the experiment service is the most important feature in Apache Submarine. 
However, the service is not stable and not user-friendly. For example, 

(1) The frontend workbench cannot reflect the actual experiment status. (ex: 
OOM)

(2) The server misses some constraints in Kubernetes Java Client. (ex: If the 
experiment name contains the character "_", the k8s java API will throw an 
exception.)

(3) Unexpected out-of-memory error: It is very inconvenient for users to 
predict the actual memory usage before running the experiment. Thus, using the 
memory request and memory limit mechanism to allow overcommitment of memory is 
helpful for users.

(4) Allow users to create experiments with the same name, and they can retrieve 
these experiments with the name.

(5) Set different tags on experiments to divide them into categories, and thus 
users can retrieve these experiments with tags.

(6) The K8sSubmitter will submit an experiment to the Kubernetes cluster when 
it is created, no matter how much resource quota is left.
  

With these reasons, it is necessary to refactor and stabilize experiment 
service in submarine-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org

Reply via email to