Re:Re: [DISCUSS][SUMMER-2020] Time setting of the task instance

wang Wed, 29 Jul 2020 19:13:18 -0700

Thank you for your reply and support, I will try my best to improve the code 
and keep the original functions unchanged. In addition, I submitted the same 
improved ISSUE in Github. You and others are welcome to discuss and make 
suggestions there.



https://github.com/apache/incubator-dolphinscheduler/issues/3352





在 2020-07-30 09:31:44，"wu shaoj" <[email protected]> 写道：
>A variable should not be assigned before it meets its definition, which should 
>be a development principle.
>Take your point. The start time should be assigned just once. Status should be 
>added to a task as an attribute which indicate current status, such as 
>PREPARE/ACKED/RUNNING/STOPED
>
>
>On 2020/7/29, 18:24, "wang" <[email protected]> wrote:
>
>    Discussion: Time setting
>
>    I wrote a function to delay the execution of the task, but in my 
> discussion with my mentor, there was a disagreement on how to assign the 
> start time in the task instance.
>
>    Problem Description
>
>    In the original design, the start time of the task instance will be 
> initialized or changed in several places:
>
>    In the createTaskInstance() method in 
> dolphinscheduler-server/.../server/master/runner/MasterExecThread.java
>    
> dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.javawill
>  update the start time once the task is submitted
>    If a failed task tries to retry, it will be updated in the 
> submitTaskInstanecToDB() method in 
> dolphinscheduler-service/.../service/process/ProcessService.java
>    After the worker is dispatched, an ACK is sent in the process() method in 
> dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.javato
>  update the start time
>
>    Most tasks (such as tasks that are dispatched to workers) have two start 
> times from the creation of an instance to the completion of execution. Let us 
> briefly analyze what problems will arise in this way. For example, there is a 
> function to view the running time of a task that uses this start time, and 
> then a task is created at 10:00, and a worker receives the task at 10:05. 
> According to the original design, I can see that the task has been running 
> for 4 minutes at 10:04, but I will see that the task has been running for 1 
> minute at 10:06 (because the worker will send an ACK to update the start time 
> at 10:05), which is obviously unreasonable.
>
>    Let me first define a few key time points (condition nodes and dependent 
> nodes are considered separately):
>
>    Time when the master created the task instance
>    Worker receiving task instance time
>    Worker delay execution start time
>    Time when the worker starts to execute the task
>    Worker execution task completion time
>
>    Then define the time between each time point:
>
>    1->2: Waiting time in the queue (including submission to the database, 
> removal from the database, submission to the task queue, and removal from the 
> task queue)
>    3->4: Delay execution time
>    4->5: Task execution time
>    My mentor's point of view
>
>    He defines startTime in TaskInstance as time point 4. Before this time 
> point, this value should not be null, and thinks that it is better to 
> initialize to new Date(). At the same time, he believes that the original 
> design should be maintained, and the value can be updated multiple times if 
> necessary.
>
>    My point of view
>
>    I also defined the startTime in TaskInstance as time point 4, but I think 
> this value should remain nullbefore this time point, and this value is 
> assigned only once in the life cycle of a task instance. Will no longer be 
> updated afterwards.
>
>    My reasons are as follows:
>
>    A variable should not be assigned before it meets its definition;
>    Multiple updates will lead to inconsistencies in reading data multiple 
> times, such as the example of checking the running time of the task mentioned 
> earlier;
>    There is a member submitTime in TaskInstance. In the original design, its 
> value is very close to startTime. When the system load is small, they are 
> almost always the same. So where you need to use before startTime 
> initialization, you can use submitTime instead;
>    However, if the startTime before initialization needs to be used somewhere 
> in the code, then whether this time should be called startTime is open to 
> question;
>    Use IDEA for code analysis, startTime is rarely used.
>
>    In addition, I found that in the code for viewing the Gantt chart, there 
> is a line of code that is written like this:
>
>    Date startTime = taskInstance.getStartTime() == null? New Date(): 
> taskInstance.getStartTime();
>
>    It is considered here that startTime can be null, which supports my point 
> from the side.
>
>    The above questions can be summed up as follows: The original design of 
> the system is ambiguous in the definition of the task start time.
>
>    Impact of improvement
>
>    The code of startTime that has not been initialized may be used. I found 
> two places:
>
>    checkTaskAfterWorkerStart() in 
> dolphinscheduler-server/../server/zk/ZKMasterClient.java
>    getRemaintime() in 
> dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java
>
>    For the first part, since I am not familiar with this part of the code, I 
> don't know whether adding a judgment can complete the original logic. But for 
> the second part, since the worker may crash at any time between time 2 and 5, 
> if the startTime is initialized without sending an ACK before the crash, the 
> master will not be able to determine whether it has timed out and take 
> corresponding actions. I do not have a good solution to this problem for the 
> time being, and welcome your comments.
>
>    At last
>
>    For the startTime of taskInstance, how do you think it is better to 
> initialize? Do I need to make improvements based on my point of view? Or 
> whether to keep the original design? Or have other suggestions?
>
>    PS:
>
>    The same problem exists for the endTime of taskInstance. After the worker 
> finishes executing the task, the sendResult will update this time, but the 
> master will also update this time after waitTaskQuit.
>
>    讨论：时间的设置
>
>    
> 我编写了一个使任务延迟执行的功能，但是在我与我的导师的讨论中，就任务实例(taskInstance)中的开始时间(startTime)的如何赋值发生了分歧。
>
>    问题描述
>
>    在原先的设计中，任务实例的开始时间会在多处得到初始化或者变更：
>
>    
> dolphinscheduler-server/.../server/master/runner/MasterExecThread.java中的createTaskInstance()方法中
>    
> dolphinscheduler-server/.../server/master/runner/SubProcessTaskExecThread.java会在提交任务后更新一次开始时间
>    
> 如果是失败任务尝试重试，会在dolphinscheduler-service/.../service/process/ProcessService.java中的submitTaskInstanecToDB()方法中得到更新
>    
> worker被派发任务后，dolphinscheduler-server/.../server/worker/processor/TaskExecuteProcessor.java中process()方法中发送了一个ACK去更新开始时间
>
>    
> 大多数任务（例如被派发到worker的任务）从创建实例到执行完成，会有两个开始时间。让我们简单分析一下这样会出现什么问题。比如现在有一个查看任务已运行时间的功能使用到了这个开始时间，然后有一个任务在10:00创建，在10:05时有一个worker接收了这个任务。按照原先的设计，我在10:04时能够查看到任务已运行时间为4分钟，但是我将在10:06时查看到任务已运行时间变短为1分钟（因为10:05时worker会发送一个ACK去更新开始时间），这显然是不合理的。
>
>    下面我先定义几个关键的时间点（条件节点和依赖节点单独考虑）：
>
>    master创建任务实例时间
>    worker接收任务实例时间
>    worker延迟执行开始时间
>    worker开始执行任务时间
>    worker执行任务完成时间
>
>    然后就每个时间点之间的时间作出定义：
>
>    1->2：在队列中的等待时间（包括提交到数据库、从数据库中取出、提交到任务队列、从任务队列中取出）
>    3->4：延迟执行时间
>    4->5：任务执行时间
>    我的导师的观点
>
>    他将TaskInstance中的startTime定义为时间点4，在该时间点之前这个值不应该为null，并且认为初始化为new 
> Date()比较好。同时他认为应保持原先的设计，必要时可以对该值进行多次更新。
>
>    我的观点
>
>    
> 我也将TaskInstance中的startTime定义为时间点4，但是我认为在该时间点之前这个值应该保持为null，并且在一个任务实例的生命周期中，这个值有且只有一次被赋值，赋值后不再被更新。
>
>    我的理由有以下几点：
>
>    一个变量尚未满足它的定义之前不应该被赋值；
>    多次更新将导致多次读取数据不一致，例如前面提到的查看任务已运行时间的例子；
>    
> TaskInstance中有一个成员submitTime，在原先的设计中它的值跟startTime非常接近，在系统负载较小的时候，它们俩几乎总是保持一致。所以在startTime初始化之前需要用到的地方可以使用submitTime代替；
>    但是，如果代码某处需要用到初始化之前的startTime，那么这个时间是否应该叫做startTime就有待商榷了；
>    使用IDEA进行代码分析，startTime很少被用到。
>
>    此外，我发现在查看甘特图的代码中，有一行代码是这么写的：
>
>    Date startTime = taskInstance.getStartTime() == null ? new Date() : 
> taskInstance.getStartTime();
>
>    在这里就考虑到了startTime是可以为null的，这从侧面佐证了我的观点。
>
>    上面的问题总结起来就是：系统原先的设计在任务开始时间的定义上就是模棱两可的。
>
>    改进带来的影响
>
>    可能会用到尚未初始化的startTime的代码我发现两处：
>
>    
> dolphinscheduler-server/../server/zk/ZKMasterClient.java中的checkTaskAfterWorkerStart()
>    
> dolphinscheduler-server/.../server/master/runner/MasterTaskExecThread.java中的getRemaintime()
>
>    
> 对于第一处，由于我还不熟悉这部分代码，不知道增加一个判断能否完成原有的逻辑。但是对于第二处，由于worker可能会在时间点2到5之间的任意时刻奔溃，如果奔溃之前没有发送ACK将startTime初始化，master将无法判断是否超时并作出相应的动作。对于这个问题我暂时没有好的解决方案，欢迎大家提意见。
>
>    最后
>
>    对于taskInstance的startTime，大家觉得如何初始化比较好？是否需要按照我的观点进行改进？或者是否保持原有的设计？或者有其他的建议？
>
>    PS：
>
>    
> 对于taskInstance的endTime也有同样的问题，worker在执行完任务之后，会sendResult更新这个时间，但master在waitTaskQuit之后也会去更新一次这个时间。

Re:Re: [DISCUSS][SUMMER-2020] Time setting of the task instance

Reply via email to