very good !

Can be achieved in stages ?

first of one master and worker can communication.

main point :
    1,master send task to worer by rpc(task_queue,kill_queue)
    2,worker receive task to execute
    3,master monitors the worker's execution task results(including normal task 
execution results and kill task execution results)

    new features need to consider :
    1,master need to election worker to execute task instand of zk preempt 
lock,so need election algorithm
    2,ifthe worker task is completed, but the master is down。at present, other 
masters will be fault tolerant and take over process instances,
    but workers need to be aware of the existence of other masters。but the 
current architecture is that workers do not sense the existence of the master

    thx
―――――――――――――
DolphinScheduler(Incubator)  PPMC
Zhanwei Qiao 乔占卫

[email protected]

From: guo jiwei<mailto:[email protected]>
Date: 2019-12-09 19:38
To: dev<mailto:[email protected]>
Subject: Re: A proposal for DolphinScheduler- refactor WorkerServer/MasterServer
 Sorry for not attach img

On Mon, Dec 9, 2019 at 7:19 PM guo jiwei 
<[email protected]<mailto:[email protected]>> wrote:

Hello everyone ,

    I would like to share some ideas about refactoring 
WorkerServer/MasterServer for dolphin-scheduler.

Background
   For current implement of dolphin-scheduler, task info are stored in 
zookeeper , and worker-server is using zookeeper lock to keep executing task 
continuously. Each worker will try to acquire lock , and if it gets the lock, 
it has the ability to execute the task (fetch task from zk, get task info from 
db, and etc), or it has to wait for the lock . This is not a nice way,  for 
performance, dependance or parallelism.

Proposal
    I suggest worker-server execute task in a way like listening tcp port, and 
receive task command via rpc request instead.  In this way, worker-server is 
not using lock or connecting db anymore. And it can execute task concurrently.

General Implementation
   1.  Refactor worker-server as a tcp server listening some port using Netty 
for tcp communication. And we define our own binary protocol for scheduling.
   2.  After starting worker-server, register itself in zookeeper , ephemeral 
node like /dolphinscheduler/nodes/worker/test/xxx.xxx.xxx.xxx:9800.  
{/dolphinscheduler/nodes/worker/$workerGroup/ip:port}
   3.  MasterServer has take the responsibility for trigger the task, and 
choose worker-server to execute it 。
     - first, we have to listen for worker nodes in zookeeper. And cache the 
worker list in memory .
    -  second,  when  MasterScheduleThread acquire the lock to execute 
command(t_ds_command) ,  it will extract the task info process instance, then 
establish a tcp connection to target available worker using task info , and 
send the command to worker.
   4. When worker-server receives a task command , it will deserialize the 
command into a Task, and execute in thread pool or subprocess.
   5. After worker-server executes the task , it has to report the result using 
the pre-connected socket to MasterServer。but if the socket is closed in any 
way, WorkerServer has to connect to any other MasterServer to report the result.


More detail for MasterServer/WorkerServer can be discuss in later mail

Simple graph
   [X]
[image.png]




<https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=Tboy&uid=technotboy%40gmail.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22technoboy%40apache.org%22%5D>
[https://mail-online.nosdn.127.net/qiyelogo/defaultAvatar.png]
Tboy
[email protected]

Reply via email to