> On July 29, 2020, 11:27 p.m., Qian Zhang wrote: > > src/slave/csi_server.cpp > > Lines 233-235 (patched) > > <https://reviews.apache.org/r/72716/diff/2/?file=2236468#file2236468line233> > > > > I am just curious what would happen if any of the initialization logic > > fail, how will the failure be propogated back? > > Greg Mann wrote: > I updated the server so that now `start()` returns a future associated > with the initialization. > > Qian Zhang wrote: > I see. And I guess `CSIServer::start()` will be called in > `Slave::registered` and `Slave::reregistered`, right? I am just wondering how > we are going to handle the returned future there. Are we going to register an > `onAny` callback and log an error message if it is a failed future? > > Greg Mann wrote: > Yea I think we have to decide how to handle failures of CSI server > initialization. I might propose a timeout in the agent, after which we log an > error? And we could provide a task status message perhaps when task launches > fail because the CSI server failed to initialize? > > In any case, I think the interface offered by the current patch set will > be sufficient to let us handle the failed initialization case, WDYT?
I took a look at the code of local resource provider daemon and I found it just log an error message in its `start` method: https://github.com/apache/mesos/blob/1.10.0/src/slave/slave.cpp#L1740:L1742 https://github.com/apache/mesos/blob/1.10.0/src/resource_provider/daemon.cpp#L188:L191 Do you think if we can do the similar? > On July 29, 2020, 11:27 p.m., Qian Zhang wrote: > > src/slave/csi_server.cpp > > Lines 244-245 (patched) > > <https://reviews.apache.org/r/72716/diff/2/?file=2236468#file2236468line244> > > > > Do we have to use `started` and `initializationCallbacks`? Can we do > > the similar with > > https://github.com/apache/mesos/blob/1.10.0/src/csi/v1_volume_manager.cpp#L1336 > > ? > > Greg Mann wrote: > The reason it's more complicated here is because we may add more > "initialization logic" after server construction if publish/unpublish calls > are made before the server is started. So we need an approach which will > allow us to add more function calls which are executed during startup. I > explored another approach while coding but this is what I ended up settling > on, but I'm happy to explore other options if we can think of something > better. > > Qian Zhang wrote: > I see currently you put the "initialization logic" (i.e. generate auth > token and intialize plugins) in the constructor of `CSIServerProcess`. Can we > instead do that in `CSIServerProcess::start()` and do the following in > `CSIServer::start()`. > ``` > Future<Nothing> CSIServer::start() > { > started = process::dispatch(process.get(), &CSIServerProcess::start); > return started; > } > ``` > > And then in `CSIServer::publishVolume` and `CSIServer::unpublishVolume` > we could do the following: > ``` > Future<string> CSIServer::publishVolume( > const Volume::Source::CSIVolume& volume) > { > return started > .then(process::defer( > process.get(), > &CSIServerProcess::publishVolume, > volume)); > } > ``` > So any publish and unpublish volume calls can only be executed after CSI > server is started. HDYT? > > Greg Mann wrote: > The reason I didn't follow this approach is that it doesn't guarantee > that the order of publish/unpublish calls would be maintained when > initializing, but maybe that's OK? > > I think that in our current implementation of `Future` in libprocess the > order would be maintained, but this isn't guaranteed by the interface. Am I > being too paranoid here? :-) > it doesn't guarantee that the order of publish/unpublish calls would be > maintained when initializing Could you please elaborate a bit why this is a problem? In volume manager, I see we already have a sequence per volume to make sure all CSI gRPC calls on the same volume are processed in a sequential order. - Qian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72716/#review221405 ----------------------------------------------------------- On Aug. 4, 2020, 2:58 a.m., Greg Mann wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72716/ > ----------------------------------------------------------- > > (Updated Aug. 4, 2020, 2:58 a.m.) > > > Review request for mesos, Andrei Budnik and Qian Zhang. > > > Bugs: MESOS-10163 > https://issues.apache.org/jira/browse/MESOS-10163 > > > Repository: mesos > > > Description > ------- > > Added implementation of the CSI server. > > > Diffs > ----- > > src/CMakeLists.txt 4e15e3d99aa2cce2403fe07e762fef2fb4a27dea > src/Makefile.am 447db323875e4cad46000977f4a61600baff8f89 > src/slave/csi_server.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/72716/diff/4/ > > > Testing > ------- > > Details at the end of this chain. > > > Thanks, > > Greg Mann > >