[ 
https://issues.apache.org/jira/browse/MESOS-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175226#comment-17175226
 ] 

Greg Mann commented on MESOS-10163:
-----------------------------------

{noformat}
commit fe0cd02a0697a4c4fcf5087fcafd6729beec0b41 (HEAD -> master, origin/master, 
origin/HEAD, merge)
Author: Greg Mann <g...@mesosphere.io>
Date:   Mon Aug 10 20:11:50 2020 -0700

    Added implementation of the CSI server.
    
    Review: https://reviews.apache.org/r/72716/
{noformat}

> Implement a new component to launch CSI plugins as standalone containers and 
> make CSI gRPC calls
> ------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-10163
>                 URL: https://issues.apache.org/jira/browse/MESOS-10163
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Qian Zhang
>            Assignee: Greg Mann
>            Priority: Major
>
> *Background:*
> Originally we want `volume/csi` isolator to leverage the existing [service 
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/service_manager.hpp#L50:L51]
>  to launch CSI plugins as standalone containers and currently service manager 
> needs to call the following agent HTTP APIs:
>  # `GET_CONTAINERS` to get all standalone containers in its `recover` method.
>  # `KILL_CONTAINER` and `WAIT_CONTAINER` to kill the outdated standalone 
> containers in its `recover` method.
>  # `LAUNCH_CONTAINER` via the existing 
> [ContainerDaemon|https://github.com/apache/mesos/blob/1.10.0/src/slave/container_daemon.hpp#L41:L46]
>  to launch CSI plugin as standalone container when its `getEndpoint` method 
> is called.
> The problem with the above design is, `volume/csi` isolator may need to clean 
> up orphan container during agent recovery which is triggered by containerizer 
> (see 
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/containerizer.cpp#L1272:L1275]
>  for details), to clean up an orphan container which is using a CSI volume, 
> `volume/csi` isolator needs to instantiate and recover the service manager 
> and get CSI plugin’s endpoint from it (i.e., service manager’s `getEndpoint` 
> method will be called by `volume/csi` isolator during agent recovery. And as 
> I mentioned above service manager’s `getEndpoint` may need to call 
> `LAUNCH_CONTAINER` to launch CSI plugin as standalone container, since agent 
> is still in recovering state, such agent HTTP call will be just rejected by 
> agent. So we have to instantiate and recover service manager *after agent 
> recovery is done*, but in `volume/csi` isolator we do not have such 
> information (i.e. the signal that agent recovery is done).
> *Solution*
> We need to implement a new component (like `CSIVolumeManager` or a better 
> name?) in Mesos agent which is responsible for launching CSI plugins as 
> standalone containers (via the existing [service 
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/service_manager.hpp#L50:L51])
>  and making CSI gRPC calls (via the existing [volume 
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/volume_manager.hpp#L55:L56]).
>  * We can instantiate this new component in the `main` method of agent and 
> pass it to both containerizer and agent (i.e. it will be a member of the 
> `Slave` object), and containerizer will in turn pass it to the `volume/csi` 
> isolator.
>  * Since this new component relies on service manager which will call agent 
> HTTP APIs, we need to pass agent URL to it, like `process::http::URL(scheme, 
> agentIP, agentPort, agentLibprocessId + "/api/v1")`, see 
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L459:L471]
>  for an example.
>  * When agent registers/reregisters with master (`Slave::registered` and 
> `Slave::reregistered`), we should call this new component’s `start` method 
> (see 
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1740:L1742]
>  and 
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1825:L1827]
>  as examples) which will scan the directory `--csi_plugin_config_dir` and 
> create the `service manager - volume manager` pair for each CSI plugin loaded 
> from that directory.
>  * For the `volume/csi` isolator, it needs to call this new component’s 
> `publishVolume` and `unpublishVolume` methods in its `prepare` and `cleanup` 
> method.
> In the case of clean up orphan containers during agent recovery, `volume/csi` 
> isolator will just call this new component’s `unpublishVolume` method as 
> usual, and it is this new component’s responsibility to only make the actual 
> CSI gRPC call after agent recovery is done and agent has registered with 
> master (e.g., when this new component’s start method is called).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to