Discussion: Time setting

I wrote a function to delay the execution of the task, but in my discussion 
with my mentor, there was a disagreement on how to assign the start time in the 
task instance.

Problem Description

In the original design, the start time of the task instance will be initialized 
or changed in several places:

In the createTaskInstance() method in 
dolphinscheduler-server/.../server/master/runner/MasterExecThread.java
dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill
 update the start time once the task is submitted
If a failed task tries to retry, it will be updated in the 
submitTaskInstanecToDB() method in 
dolphinscheduler-service/.../service/process/ProcessService.java
After the worker is dispatched, an ACK is sent in the process() method in 
dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato 
update the start time

Most tasks (such as tasks that are dispatched to workers) have two start times 
from the creation of an instance to the completion of execution. Let us briefly 
analyze what problems will arise in this way. For example, there is a function 
to view the running time of a task that uses this start time, and then a task 
is created at 10:00, and a worker receives the task at 10:05. According to the 
original design, I can see that the task has been running for 4 minutes at 
10:04, but I will see that the task has been running for 1 minute at 10:06 
(because the worker will send an ACK to update the start time at 10:05), which 
is obviously unreasonable.

Let me first define a few key time points (condition nodes and dependent nodes 
are considered separately):

Time when the master created the task instance
Worker receiving task instance time
Worker delay execution start time
Time when the worker starts to execute the task
Worker execution task completion time

Then define the time between each time point:

1->2: Waiting time in the queue (including submission to the database, removal 
from the database, submission to the task queue, and removal from the task 
queue)
3->4: Delay execution time
4->5: Task execution time
My mentor's point of view

He defines startTime in TaskInstance as time point 4. Before this time point, 
this value should not be null, and thinks that it is better to initialize to 
new Date(). At the same time, he believes that the original design should be 
maintained, and the value can be updated multiple times if necessary.

My point of view

I also defined the startTime in TaskInstance as time point 4, but I think this 
value should remain nullbefore this time point, and this value is assigned only 
once in the life cycle of a task instance. Will no longer be updated afterwards.

My reasons are as follows:

A variable should not be assigned before it meets its definition;
Multiple updates will lead to inconsistencies in reading data multiple times, 
such as the example of checking the running time of the task mentioned earlier;
There is a member submitTime in TaskInstance. In the original design, its value 
is very close to startTime. When the system load is small, they are almost 
always the same. So where you need to use before startTime initialization, you 
can use submitTime instead;
However, if the startTime before initialization needs to be used somewhere in 
the code, then whether this time should be called startTime is open to question;
Use IDEA for code analysis, startTime is rarely used.

In addition, I found that in the code for viewing the Gantt chart, there is a 
line of code that is written like this:

Date startTime = taskInstance.getStartTime() == null? New Date(): 
taskInstance.getStartTime();

It is considered here that startTime can be null, which supports my point from 
the side.

The above questions can be summed up as follows: The original design of the 
system is ambiguous in the definition of the task start time.

Impact of improvement

The code of startTime that has not been initialized may be used. I found two 
places:

checkTaskAfterWorkerStart() in 
dolphinscheduler-server/../server/zk/ZKMasterClient.java
getRemaintime() in 
dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java

For the first part, since I am not familiar with this part of the code, I don't 
know whether adding a judgment can complete the original logic. But for the 
second part, since the worker may crash at any time between time 2 and 5, if 
the startTime is initialized without sending an ACK before the crash, the 
master will not be able to determine whether it has timed out and take 
corresponding actions. I do not have a good solution to this problem for the 
time being, and welcome your comments.

At last

For the startTime of taskInstance, how do you think it is better to initialize? 
Do I need to make improvements based on my point of view? Or whether to keep 
the original design? Or have other suggestions?

PS:

The same problem exists for the endTime of taskInstance. After the worker 
finishes executing the task, the sendResult will update this time, but the 
master will also update this time after waitTaskQuit.

讨论:时间的设置

我编写了一个使任务延迟执行的功能,但是在我与我的导师的讨论中,就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。

问题描述

在原先的设计中,任务实例的开始时间会在多处得到初始化或者变更:

dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中
dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间
如果是失败任务尝试重试,会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新
worker被派发任务后,dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间

大多数任务(例如被派发到worker的任务)从创建实例到执行完成,会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间,然后有一个任务在10:00创建,在10:05时有一个worker接收了这个任务。按照原先的设计,我在10:04时能够查看到任务已运行时间为4分钟,但是我将在10:06时查看到任务已运行时间变短为1分钟(因为10:05时worker会发送一个ACK去更新开始时间),这显然是不合理的。

下面我先定义几个关键的时间点(条件节点和依赖节点单独考虑):

master创建任务实例时间
worker接收任务实例时间
worker延迟执行开始时间
worker开始执行任务时间
worker执行任务完成时间

然后就每个时间点之间的时间作出定义:

1->2:在队列中的等待时间(包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出)
3->4:延迟执行时间
4->5:任务执行时间
我的导师的观点

他将TaskInstance中的startTime定义为时间点4,在该时间点之前这个值不应该为null,并且认为初始化为new 
Date()比较好。同时他认为应保持原先的设计,必要时可以对该值进行多次更新。

我的观点

我也将TaskInstance中的startTime定义为时间点4,但是我认为在该时间点之前这个值应该保持为null,并且在一个任务实例的生命周期中,这个值有且只有一次被赋值,赋值后不再被更新。

我的理由有以下几点:

一个变量尚未满足它的定义之前不应该被赋值;
多次更新将导致多次读取数据不一致,例如前面提到的查看任务已运行时间的例子;
TaskInstance中有一个成员submitTime,在原先的设计中它的值跟startTime非常接近,在系统负载较小的时候,它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替;
但是,如果代码某处需要用到初始化之前的startTime,那么这个时间是否应该叫做startTime就有待商榷了;
使用IDEA进行代码分析,startTime很少被用到。

此外,我发现在查看甘特图的代码中,有一行代码是这么写的:

Date startTime = taskInstance.getStartTime() == null ? new Date() : 
taskInstance.getStartTime();

在这里就考虑到了startTime是可以为null的,这从侧面佐证了我的观点。

上面的问题总结起来就是:系统原先的设计在任务开始时间的定义上就是模棱两可的。

改进带来的影响

可能会用到尚未初始化的startTime的代码我发现两处:

dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart()
dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime()

对于第一处,由于我还不熟悉这部分代码,不知道增加一个判断能否完成原有的逻辑。但是对于第二处,由于worker可能会在时间点2到5之间的任意时刻奔溃,如果奔溃之前没有发送ACK将startTime初始化,master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案,欢迎大家提意见。

最后

对于taskInstance的startTime,大家觉得如何初始化比较好?是否需要按照我的观点进行改进?或者是否保持原有的设计?或者有其他的建议?

PS:

对于taskInstance的endTime也有同样的问题,worker在执行完任务之后,会sendResult更新这个时间,但master在waitTaskQuit之后也会去更新一次这个时间。

Reply via email to