[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-25 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116404#comment-17116404
 ] 

Manikandan R commented on SUBMARINE-507:


Closed all sub-tasks as those items will be covered in this JIRA itself.

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-27 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117911#comment-17117911
 ] 

Manikandan R commented on SUBMARINE-507:


Writing down my thoughts on storing environments..Please share your views.
 
1. Following tables can be created in Submarine Metastore:
 
a) Table Name: environments
 
Columns:
 
environment_id int primary key
name varchar(255) unique not null
description string
location string
docker_id int references docker_images(docker_id)
kernel_id int  references kernel(kernel_id)
created_date timestamp
last_updated_date timestamp
 
"location" column captures hdfs path of the environment file.
 
b) Table Name: docker_images
 
docker_id int primary key
name varchar(255) unique not null
description string
created_date timestamp
last_updated_date timestamp
 
c) Table Name: kernel
 
kernel_id int primary key
name varchar(255) unique not null
description string
repository string
repository_type enum(''private', 'public')
created_date timestamp
last_updated_date timestamp
 
Having separate tables for docker_images and kernel give us lot of flexibility 
while operating environments. 
- docker and kernel images could be created only once and used for many 
environments.
- Avoid creating the same images and kernel/conda again and again in registry 
and repository respectively.
(If required, we can clean up these 2 tables if it grows very very big and 
becomes a bottleneck, but very unlikely).
 
2. How to store environment file?
 
Create a directory hdfs://mycluster/submarine/environments/ if it doesn't 
exists and use environment name as file name.  For example, 
 
hdfs://mycluster/submarine/environments/my_env.txt
 
3. How to store docker_images?
 
We could set up our own registry as part of starting up the server. Please 
refer [https://www.docker.com/blog/how-to-use-your-own-registry/] for details. 
There are several options for this storage as documented in 
[https://docs.docker.com/registry/configuration/#storage]. For first cut, We 
can begin with file system and can be iterated over next releases based on the 
need.
 
4. How to store kernel/conda?
 
There are 2 types. 1. Private 2. Public.
 
For private repo, we will need to set up local repo's and can be used.

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-28 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118380#comment-17118380
 ] 

Zhankun Tang commented on SUBMARINE-507:


[~maniraj...@gmail.com], thanks for the update! In my understanding, "A 
environment = A base image + A conda env spec". The base Docker images can be 
split into two categories:
 # Submarine environment base images (OS version, base system libraries, 
anaconda installed)
 # End-user custom base images (custom OS, system libraries and anaconda)

The conda env spec seems to have anaconda version dependencies.

 

{{}}
{code:java}
name: "my_submarine_env",
vm-image: "...",
docker-image: "...",
kernel: name: team_default_python_3.7
channels:
- defaults dependencies:
- _ipyw_jlab_nb_ext_conf=0.1.0=py37_0
- alabaster=0.7.12=py37_0
- anaconda=2020.02=py37_0
- anaconda-client=1.7.2=py37_0
- anaconda-navigator=1.9.12=py37_0{code}
{{}}

For base image storage, one simple way, to begin with, might be storing the 
submarine base image or release them in Docker hub's "apache/submarine". And 
build an entry script to activate the conda spec at runtime.

Later we may set up a custom Docker registry or store the base image in the 
user's existing Docker registry.

 

For end-user custom base image storage, I'd prefer that we can integrate the 
end-user's existing Docker registry as a starting.

Thoughts? [~wangda]

 

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-29 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119816#comment-17119816
 ] 

Wangda Tan commented on SUBMARINE-507:
--

Thanks [~maniraj...@gmail.com], [~ztang] for the comments, 

For the environment, I don't think we should get into image storage / Docker 
registry setup business, it should just be a Docker image name and K8s can 
figure out what credentials to pull the docker image based on secrets stored in 
the namespace. And users are responsible for setting up Docker registry and K8s 
secrets. 
{quote}For base image storage, one simple way, to begin with, might be storing 
the submarine base image or release them in Docker hub's "apache/submarine". 
And build an entry script to activate the conda spec at runtime.
{quote}
This make sense to me, and this is just an example, user can choose whatever 
base Docker image they want to use, it could borrow from the apache/submarine's 
base image.
{quote}For end-user custom base image storage, I'd prefer that we can integrate 
the end-user's existing Docker registry as a starting.
{quote}
Make sense. 

One thing I realized is, if user only want to use a Docker image (instead of 
Anaconda), asking users to create an environment, and put Docker image like: 
{code:java}
Environment: 
  name: my-env
  docker-image: example.com/my-docker-image:0.1.2 

In experiment, user specify 

Experiment: 
  name: "My-tf-job" 
  environment: "my-env"
  task: 
 script: ...
 resource: ...{code}
It is purely an overhead and bad user experiences.  

How about we can provide a short cut to specify an "anonymous/embeded  
Environment" as part of the Experiment, for example:
{code:java}
Experiment: 
 name: "My-tf-job" 
 environment: "my-env" // point to a "named" environment, or
 environment: 
   docker-image: example.com/my-docker-image:0.1.2
   kernel-name: ...
 task: 
 script: ...
 resource: ... {code}
If it is possible, we should make the "embedded environment" be part of 0.4.0 
itself, so we don't have to change this API when we release 0.5.0

Thoughts? [~ztang], [~maniraj...@gmail.com]

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-31 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120450#comment-17120450
 ] 

Manikandan R commented on SUBMARINE-507:


Thanks for sharing your views.

IIUC, We would like to have only 1 table "environments" in Submarine Metastore. 

Few more questions:

1. How do we decide that we need to talk to Docker hub's "apache/submarine" or 
end-user's existing Docker registry? Using docker image name?
2. For both the categories, Can we safely assume images will be always 
available for use at run time? Nothing to worry about this while creating 
environment.
3. For first category of base images ("apache/submarine"),When & How are we 
going to create images and release to "apache/submarine"? Using docker file 
manually? Admin would be doing this?
4. For second category of base images, Can we assume end users would have 
created the image in their docker registry?
5. It seems we are in favour of storing environment spec as is in a string 
based column in "environments" table?
6. conda env activation happens at run time while running the notebook or 
experiment. When are we going to create an env in conda? 

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-31 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120592#comment-17120592
 ] 

Wangda Tan commented on SUBMARINE-507:
--

[~maniraj...@gmail.com], 
{quote} # How do we decide that we need to talk to Docker hub's 
"apache/submarine" or end-user's existing Docker registry? Using docker image 
name?{quote}
Yes, it should based on the image name and Handled by Docker daemon.

 
{quote}2. For both the categories, Can we safely assume images will be always 
available for use at run time? Nothing to worry about this while creating 
environment?
{quote}
Yes, if it has any issue (like image cannot be pulled), the experiment run will 
fail (naturally). 

 
{quote}3. For first category of base images ("apache/submarine"),When & How are 
we going to create images and release to "apache/submarine"? Using docker file 
manually? Admin would be doing this?
{quote}
This is part of the Submarine release process, I believe we already doing this, 
cc:[~ztang]. 
{quote}4. For second category of base images, Can we assume end users would 
have created the image in their docker registry?
{quote}
Yes, for whatever base image specified, user need to ensure it can be pulled.
{quote}5. It seems we are in favour of storing environment spec as is in a 
string based column in "environments" table?
{quote}
It looks like a good solution to me, I'm not sure if there's any other 
alternatives. 
{quote}6. conda env activation happens at run time while running the notebook 
or experiment. When are we going to create an env in conda?
{quote}
Yes, activation only happens at run time. To create an env, I think we can have 
following Python APIs: 
 
1) Create Environment using SDK (pseudo code)  
{code:java}
env = create_new_environment("my_env") 

# Set Docker image 
env.set_docker_image("apache/submarine-123:123") 

# Set conda kernel using spec 
env.add_conda_kernel(conda_kernel_from_spec("""
  name: team_default_python_3.7
  channels:
- defaults
  dependencies:
- _ipyw_jlab_nb_ext_conf=0.1.0=py37_0
- alabaster=0.7.12=py37_0
- anaconda=2020.02=py37_0
- anaconda-client=1.7.2=py37_0
- anaconda-navigator=1.9.12=py37_0
 """)

# Alternatively, set conda kernel using APIs 
conda_kernel = create_new_conda_kernel("my_kernel")
conda_kernel.set_channels(["..."])
conda_kernel.add_pip_dependency("...") 

# Finally, save the env, this will save to metadata
submarine.register_env(env)

{code}
2) Similarily, we need List API, Get API, Delete API for environment.

Thoughts?

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-05-31 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120732#comment-17120732
 ] 

Zhankun Tang commented on SUBMARINE-507:


[~maniraj...@gmail.com], yeah. We've already published the 
"apache/submarine:mini-0.3.0" under "apache/submarine" repo. We can release 
more.

[~wangda], for the kernel spec, is the sample spec generated by "condo export"? 
It seems hard to write that by hand.

Will the kernel spec has hard dependencies on the conda component versions when 
enabling?

For instance, we provide a submarine image with "anaconda=2019.02=py36", but 
the kernel spec is "2020.02=py37". Will the image fail to start due to version 
mismatch?

In other words. Is the kernel spec portability a concern for us?
{code:java}
- anaconda=2020.02=py37_0
- anaconda-client=1.7.2=py37_0
- anaconda-navigator=1.9.12=py37_0
{code}

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org



[jira] [Commented] (SUBMARINE-507) Submarine Environment Management

2020-06-01 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/SUBMARINE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121444#comment-17121444
 ] 

Wangda Tan commented on SUBMARINE-507:
--

[~ztang], good point, here's my thoughts: 
{quote}Will the kernel spec has hard dependencies on the conda component 
versions when enabling?
{quote}
Based on the Anaconda's typical usage, version of Python is part of the kernel, 
so in that case. Python inside the image will be ignored. (Anaconda uses 
environment variables to only pass whatever libraries when the kernel is 
activated). 
{quote}[~wangda], for the kernel spec, is the sample spec generated by "condo 
export"? It seems hard to write that by hand.
{quote}
That's correct, ideally user should not write the spec. Once a kernel is 
activated, all the activities like {{pip install}} will be tracked and when 
user call {{conda export}} such changes will be part of the exported conda spec.

> Submarine Environment Management
> 
>
> Key: SUBMARINE-507
> URL: https://issues.apache.org/jira/browse/SUBMARINE-507
> Project: Apache Submarine
>  Issue Type: New Feature
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>  Labels: pull-request-available
>
> Scope of this JIRA is to support environment management. It includes the 
> following:
> 1. Create Environment
> 2. Update Environment
> 3. Delete Environment
> 4. List Environments
> In addition, this JIRA should also ensures that environments has been 
> persisted like experiments so that it can used for later use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@submarine.apache.org
For additional commands, e-mail: dev-h...@submarine.apache.org