Thank you for your reply and support, I will try my best to improve the code and keep the original functions unchanged. In addition, I submitted the same improved ISSUE in Github. You and others are welcome to discuss and make suggestions there.
https://github.com/apache/incubator-dolphinscheduler/issues/3352 在 2020-07-30 09:31:44,"wu shaoj" <[email protected]> 写道: >A variable should not be assigned before it meets its definition, which should >be a development principle. >Take your point. The start time should be assigned just once. Status should be >added to a task as an attribute which indicate current status, such as >PREPARE/ACKED/RUNNING/STOPED > > >On 2020/7/29, 18:24, "wang" <[email protected]> wrote: > > Discussion: Time setting > > I wrote a function to delay the execution of the task, but in my > discussion with my mentor, there was a disagreement on how to assign the > start time in the task instance. > > Problem Description > > In the original design, the start time of the task instance will be > initialized or changed in several places: > > In the createTaskInstance() method in > dolphinscheduler-server/.../server/master/runner/MasterExecThread.java > > dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill > update the start time once the task is submitted > If a failed task tries to retry, it will be updated in the > submitTaskInstanecToDB() method in > dolphinscheduler-service/.../service/process/ProcessService.java > After the worker is dispatched, an ACK is sent in the process() method in > dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato > update the start time > > Most tasks (such as tasks that are dispatched to workers) have two start > times from the creation of an instance to the completion of execution. Let us > briefly analyze what problems will arise in this way. For example, there is a > function to view the running time of a task that uses this start time, and > then a task is created at 10:00, and a worker receives the task at 10:05. > According to the original design, I can see that the task has been running > for 4 minutes at 10:04, but I will see that the task has been running for 1 > minute at 10:06 (because the worker will send an ACK to update the start time > at 10:05), which is obviously unreasonable. > > Let me first define a few key time points (condition nodes and dependent > nodes are considered separately): > > Time when the master created the task instance > Worker receiving task instance time > Worker delay execution start time > Time when the worker starts to execute the task > Worker execution task completion time > > Then define the time between each time point: > > 1->2: Waiting time in the queue (including submission to the database, > removal from the database, submission to the task queue, and removal from the > task queue) > 3->4: Delay execution time > 4->5: Task execution time > My mentor's point of view > > He defines startTime in TaskInstance as time point 4. Before this time > point, this value should not be null, and thinks that it is better to > initialize to new Date(). At the same time, he believes that the original > design should be maintained, and the value can be updated multiple times if > necessary. > > My point of view > > I also defined the startTime in TaskInstance as time point 4, but I think > this value should remain nullbefore this time point, and this value is > assigned only once in the life cycle of a task instance. Will no longer be > updated afterwards. > > My reasons are as follows: > > A variable should not be assigned before it meets its definition; > Multiple updates will lead to inconsistencies in reading data multiple > times, such as the example of checking the running time of the task mentioned > earlier; > There is a member submitTime in TaskInstance. In the original design, its > value is very close to startTime. When the system load is small, they are > almost always the same. So where you need to use before startTime > initialization, you can use submitTime instead; > However, if the startTime before initialization needs to be used somewhere > in the code, then whether this time should be called startTime is open to > question; > Use IDEA for code analysis, startTime is rarely used. > > In addition, I found that in the code for viewing the Gantt chart, there > is a line of code that is written like this: > > Date startTime = taskInstance.getStartTime() == null? New Date(): > taskInstance.getStartTime(); > > It is considered here that startTime can be null, which supports my point > from the side. > > The above questions can be summed up as follows: The original design of > the system is ambiguous in the definition of the task start time. > > Impact of improvement > > The code of startTime that has not been initialized may be used. I found > two places: > > checkTaskAfterWorkerStart() in > dolphinscheduler-server/../server/zk/ZKMasterClient.java > getRemaintime() in > dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java > > For the first part, since I am not familiar with this part of the code, I > don't know whether adding a judgment can complete the original logic. But for > the second part, since the worker may crash at any time between time 2 and 5, > if the startTime is initialized without sending an ACK before the crash, the > master will not be able to determine whether it has timed out and take > corresponding actions. I do not have a good solution to this problem for the > time being, and welcome your comments. > > At last > > For the startTime of taskInstance, how do you think it is better to > initialize? Do I need to make improvements based on my point of view? Or > whether to keep the original design? Or have other suggestions? > > PS: > > The same problem exists for the endTime of taskInstance. After the worker > finishes executing the task, the sendResult will update this time, but the > master will also update this time after waitTaskQuit. > > 讨论:时间的设置 > > > 我编写了一个使任务延迟执行的功能,但是在我与我的导师的讨论中,就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。 > > 问题描述 > > 在原先的设计中,任务实例的开始时间会在多处得到初始化或者变更: > > > dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中 > > dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间 > > 如果是失败任务尝试重试,会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新 > > worker被派发任务后,dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间 > > > 大多数任务(例如被派发到worker的任务)从创建实例到执行完成,会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间,然后有一个任务在10:00创建,在10:05时有一个worker接收了这个任务。按照原先的设计,我在10:04时能够查看到任务已运行时间为4分钟,但是我将在10:06时查看到任务已运行时间变短为1分钟(因为10:05时worker会发送一个ACK去更新开始时间),这显然是不合理的。 > > 下面我先定义几个关键的时间点(条件节点和依赖节点单独考虑): > > master创建任务实例时间 > worker接收任务实例时间 > worker延迟执行开始时间 > worker开始执行任务时间 > worker执行任务完成时间 > > 然后就每个时间点之间的时间作出定义: > > 1->2:在队列中的等待时间(包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出) > 3->4:延迟执行时间 > 4->5:任务执行时间 > 我的导师的观点 > > 他将TaskInstance中的startTime定义为时间点4,在该时间点之前这个值不应该为null,并且认为初始化为new > Date()比较好。同时他认为应保持原先的设计,必要时可以对该值进行多次更新。 > > 我的观点 > > > 我也将TaskInstance中的startTime定义为时间点4,但是我认为在该时间点之前这个值应该保持为null,并且在一个任务实例的生命周期中,这个值有且只有一次被赋值,赋值后不再被更新。 > > 我的理由有以下几点: > > 一个变量尚未满足它的定义之前不应该被赋值; > 多次更新将导致多次读取数据不一致,例如前面提到的查看任务已运行时间的例子; > > TaskInstance中有一个成员submitTime,在原先的设计中它的值跟startTime非常接近,在系统负载较小的时候,它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替; > 但是,如果代码某处需要用到初始化之前的startTime,那么这个时间是否应该叫做startTime就有待商榷了; > 使用IDEA进行代码分析,startTime很少被用到。 > > 此外,我发现在查看甘特图的代码中,有一行代码是这么写的: > > Date startTime = taskInstance.getStartTime() == null ? new Date() : > taskInstance.getStartTime(); > > 在这里就考虑到了startTime是可以为null的,这从侧面佐证了我的观点。 > > 上面的问题总结起来就是:系统原先的设计在任务开始时间的定义上就是模棱两可的。 > > 改进带来的影响 > > 可能会用到尚未初始化的startTime的代码我发现两处: > > > dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart() > > dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime() > > > 对于第一处,由于我还不熟悉这部分代码,不知道增加一个判断能否完成原有的逻辑。但是对于第二处,由于worker可能会在时间点2到5之间的任意时刻奔溃,如果奔溃之前没有发送ACK将startTime初始化,master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案,欢迎大家提意见。 > > 最后 > > 对于taskInstance的startTime,大家觉得如何初始化比较好?是否需要按照我的观点进行改进?或者是否保持原有的设计?或者有其他的建议? > > PS: > > > 对于taskInstance的endTime也有同样的问题,worker在执行完任务之后,会sendResult更新这个时间,但master在waitTaskQuit之后也会去更新一次这个时间。
