Hi,

It seems that the C++ scheduler driver doesn't detect loss of the
connection to the master when not using zookeeper.

A simple way to reproduce this is to start a server passing it e.g.
"--ip=127.0.0.1", start the scheduler driver passing it "127.0.0.1:5050",
and then send a SIGKILL to the master. The scheduler logs the following:


I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
__reaper__(1)@192.168.65.76:34345 at 2019-12-20
10:56:11.679366144+00:00
I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
__reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
10:56:11.779389952+00:00)
I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
2019-12-20 10:56:11.690665984+00:00
I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
__http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
__http__(1)@192.168.65.76:34345
I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00

An strace confirms that the process receives EOF when reading from the
socket, but Scheduler::disconnected isn't called.
It's that expected?

Or is it assumed that the scheduler relies on zookeeper for detection?

Cheers,

Charles

Reply via email to