Hello everyone ,
I would like to share some ideas about refactoring
WorkerServer/MasterServer for dolphin-scheduler.
Background
For current implement of dolphin-scheduler, task info are stored in
zookeeper , and worker-server is using zookeeper lock to keep executing
task continuously. Each worker will try to acquire lock , and if it gets
the lock, it has the ability to execute the task (fetch task from zk, get
task info from db, and etc), or it has to wait for the lock . This is not a
nice way, for performance, dependance or parallelism.
Proposal
I suggest worker-server execute task in a way like listening tcp port,
and receive task command via rpc request instead. In this way,
worker-server is not using lock or connecting db anymore. And it can
execute task concurrently.
General Implementation
1. Refactor worker-server as a tcp server listening some port using
Netty for tcp communication. And we define our own binary protocol for
scheduling.
2. After starting worker-server, register itself in zookeeper ,
ephemeral node like
/dolphinscheduler/nodes/worker/test/xxx.xxx.xxx.xxx:9800.
{/dolphinscheduler/nodes/worker/$workerGroup/ip:port}
3. MasterServer has take the responsibility for trigger the task, and
choose worker-server to execute it 。
- first, we have to listen for worker nodes in zookeeper. And cache
the worker list in memory .
- second, when MasterScheduleThread acquire the lock to execute
command(t_ds_command) , it will extract the task info process instance,
then establish a tcp connection to target available worker using task info
, and send the command to worker.
4. When worker-server receives a task command , it will deserialize the
command into a Task, and execute in thread pool or subprocess.
5. After worker-server executes the task , it has to report the result
using the pre-connected socket to MasterServer。but if the socket is closed
in any way, WorkerServer has to connect to any other MasterServer to report
the result.
More detail for MasterServer/WorkerServer can be discuss in later mail
Simple graph
[image: image.png]
Tboy
[email protected]
<https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Tboy&uid=technotboy%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22technoboy%40apache.org%22%5D>