[
https://issues.apache.org/jira/browse/MESOS-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175226#comment-17175226
]
Greg Mann commented on MESOS-10163:
-----------------------------------
{noformat}
commit fe0cd02a0697a4c4fcf5087fcafd6729beec0b41 (HEAD -> master, origin/master,
origin/HEAD, merge)
Author: Greg Mann <[email protected]>
Date: Mon Aug 10 20:11:50 2020 -0700
Added implementation of the CSI server.
Review: https://reviews.apache.org/r/72716/
{noformat}
> Implement a new component to launch CSI plugins as standalone containers and
> make CSI gRPC calls
> ------------------------------------------------------------------------------------------------
>
> Key: MESOS-10163
> URL: https://issues.apache.org/jira/browse/MESOS-10163
> Project: Mesos
> Issue Type: Task
> Reporter: Qian Zhang
> Assignee: Greg Mann
> Priority: Major
>
> *Background:*
> Originally we want `volume/csi` isolator to leverage the existing [service
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/service_manager.hpp#L50:L51]
> to launch CSI plugins as standalone containers and currently service manager
> needs to call the following agent HTTP APIs:
> # `GET_CONTAINERS` to get all standalone containers in its `recover` method.
> # `KILL_CONTAINER` and `WAIT_CONTAINER` to kill the outdated standalone
> containers in its `recover` method.
> # `LAUNCH_CONTAINER` via the existing
> [ContainerDaemon|https://github.com/apache/mesos/blob/1.10.0/src/slave/container_daemon.hpp#L41:L46]
> to launch CSI plugin as standalone container when its `getEndpoint` method
> is called.
> The problem with the above design is, `volume/csi` isolator may need to clean
> up orphan container during agent recovery which is triggered by containerizer
> (see
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/containerizer/mesos/containerizer.cpp#L1272:L1275]
> for details), to clean up an orphan container which is using a CSI volume,
> `volume/csi` isolator needs to instantiate and recover the service manager
> and get CSI plugin’s endpoint from it (i.e., service manager’s `getEndpoint`
> method will be called by `volume/csi` isolator during agent recovery. And as
> I mentioned above service manager’s `getEndpoint` may need to call
> `LAUNCH_CONTAINER` to launch CSI plugin as standalone container, since agent
> is still in recovering state, such agent HTTP call will be just rejected by
> agent. So we have to instantiate and recover service manager *after agent
> recovery is done*, but in `volume/csi` isolator we do not have such
> information (i.e. the signal that agent recovery is done).
> *Solution*
> We need to implement a new component (like `CSIVolumeManager` or a better
> name?) in Mesos agent which is responsible for launching CSI plugins as
> standalone containers (via the existing [service
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/service_manager.hpp#L50:L51])
> and making CSI gRPC calls (via the existing [volume
> manager|https://github.com/apache/mesos/blob/1.10.0/src/csi/volume_manager.hpp#L55:L56]).
> * We can instantiate this new component in the `main` method of agent and
> pass it to both containerizer and agent (i.e. it will be a member of the
> `Slave` object), and containerizer will in turn pass it to the `volume/csi`
> isolator.
> * Since this new component relies on service manager which will call agent
> HTTP APIs, we need to pass agent URL to it, like `process::http::URL(scheme,
> agentIP, agentPort, agentLibprocessId + "/api/v1")`, see
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L459:L471]
> for an example.
> * When agent registers/reregisters with master (`Slave::registered` and
> `Slave::reregistered`), we should call this new component’s `start` method
> (see
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1740:L1742]
> and
> [here|https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1825:L1827]
> as examples) which will scan the directory `--csi_plugin_config_dir` and
> create the `service manager - volume manager` pair for each CSI plugin loaded
> from that directory.
> * For the `volume/csi` isolator, it needs to call this new component’s
> `publishVolume` and `unpublishVolume` methods in its `prepare` and `cleanup`
> method.
> In the case of clean up orphan containers during agent recovery, `volume/csi`
> isolator will just call this new component’s `unpublishVolume` method as
> usual, and it is this new component’s responsibility to only make the actual
> CSI gRPC call after agent recovery is done and agent has registered with
> master (e.g., when this new component’s start method is called).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)