Hi Jing, Thanks for joining the discussion. It's a very good point to figure out the possible influence on the history server.
> 1. Does the improvement also cover history server or just Web UI? As far as I know most Web UI components are shared between runtime and history server, so the improvement is expected to cover both. We will make sure the changes proposed in this FLIP do not conflict with the ongoing FLIP-241 which is working on the enhancement of completed job information. > 2. How to know whether the job contains speculative execution instances after the job finished? Do we have to check each subtasks of all vertex one by one? When one attempt of a subtask finishes, all other concurrent attempts will be canceled, but still treated as the current executions. The way the speculative executions are presented should be almost the same as the job was running. Users can still find the executions folded in the subtask list page. As we mentioned in the FLIP, all changes are expected to be transparent to users who don't use speculative execution. And to users who do use speculative execution, the experience should be almost the same when watching a running job or a completed job in the history server. Best, Gen On Tue, Jul 12, 2022 at 8:41 PM Jing Zhang <beyond1...@gmail.com> wrote: > Thanks for driving this discussion. It's a very helpful improvement. > I only have two minor questions: > 1. Does the improvement also cover history server or just Web UI? > 2. How to know whether the job contains speculative execution instances > after the job finished? > Do we have to check each subtasks of all vertex one by one? > > Best, > Jing Zhang > > Gen Luo <luogen...@gmail.com> 于2022年7月11日周一 22:31写道: > > > Hi, everyone. > > > > Thanks for your feedback. > > If there are no more concerns or comments, I will start the vote > tomorrow. > > > > Gen Luo <luogen...@gmail.com> 于 2022年7月11日周一 11:12写道: > > > > > Hi Lijie and Zhu, > > > > > > Thanks for the suggestion. I agree that the name "Blocked Free Slots" > is > > > more clear to users. > > > I'll take the suggestion and update the FLIP. > > > > > > On Fri, Jul 8, 2022 at 9:12 PM Zhu Zhu <reed...@gmail.com> wrote: > > > > > >> I agree that it can be more useful to show the number of slots that > are > > >> free but blocked. Currently users infer the slots in use by > subtracting > > >> available slots from the total slots. With blocked slots introduced, > > this > > >> can be achieved by subtracting available slots and blocked free slots > > >> from the total slots. > > >> > > >> Therefore, +1 to show "Blocked Free Slots" on the resource card. > > >> > > >> Thanks, > > >> Zhu > > >> > > >> Lijie Wang <wangdachui9...@gmail.com> 于2022年7月8日周五 17:39写道: > > >> > > > >> > Hi Gen & Zhu, > > >> > > > >> > -> 1. Can we also show "Blocked Slots" in the resource card, so that > > >> users > > >> > can easily figure out how many slots are available/blocked/in-use? > > >> > > > >> > I think we should describe the "available" and "blocked" more > clearly. > > >> In > > >> > my opinion, I think users should be interested in the number of > slots > > in > > >> > the following 3 state: > > >> > 1. free and unblocked, I think it's OK to call this state > "available". > > >> > 2. free and blocked, I think it's not appropriate to call "blocked" > > >> > directly, because "blocked" should include both the "free and > blocked" > > >> and > > >> > "in-use and blocked". > > >> > 3. in-use > > >> > > > >> > And the sum of the aboved 3 kind of slots should be the total number > > of > > >> > slots in this cluster. > > >> > > > >> > WDYT? > > >> > > > >> > Best, > > >> > Lijie > > >> > > > >> > Gen Luo <luogen...@gmail.com> 于2022年7月8日周五 16:14写道: > > >> > > > >> > > Hi Zhu, > > >> > > Thanks for the feedback! > > >> > > > > >> > > 1.Good idea. Users should be more familiar with the slots as the > > >> resource > > >> > > units. > > >> > > > > >> > > 2.You remind me that the "speculative attempts" are execution > > attempts > > >> > > started by the SpeculativeScheduler when slot tasks are detected, > > >> while the > > >> > > current execution attempts other than the "most current" one are > not > > >> really > > >> > > the speculative attempts. I agree we should modify the field name. > > >> > > > > >> > > 3.ArchivedSpeculativeExecutionVertex seems to be introduced with > the > > >> > > speculative execution to handle the speculative attempts as a part > > of > > >> the > > >> > > execution history. Since this FLIP is handling the attempts with a > > >> more > > >> > > proper way, I agree that we can remove the > > >> > > ArchivedSpeculativeExecutionVertex. > > >> > > > > >> > > Thanks again and I'll update the FLIP later according to these > > >> suggestions. > > >> > > > > >> > > On Thu, Jul 7, 2022 at 4:35 PM Zhu Zhu <reed...@gmail.com> wrote: > > >> > > > > >> > > > Thanks for writing this FLIP and initiating the discussion, Gen, > > >> Yun and > > >> > > > Junhan! > > >> > > > It will be very useful to have these improvements on the web UI > > for > > >> > > > speculative execution users, allowing them to know what is > > >> happening. > > >> > > > I just have a few comment regarding the design details: > > >> > > > > > >> > > > 1. Can we also show "Blocked Slots" in the resource card, so > that > > >> users > > >> > > > can easily figure out how many slots are > available/blocked/in-use? > > >> > > > 2. I think "speculative-attempts" is not accurate, because the > > >> > > > root/fastest current can be a specualtive execution attempt, and > > in > > >> > > > this case "speculative-attempts" will contain the intial > execution > > >> > > > attempt. How about name it as "other-concurrent-attempts"? > > >> > > > 3. I think ArchivedSpeculativeExecutionVertex is not necessarily > > >> > > > needed. We can rework the ArchivedExecutionVertex to contains a > > set > > >> of > > >> > > > current execution attempts. The set will have one only element > in > > >> > > > non-speculative cases though. In this way, we can have a unified > > >> > > > processing for ArchivedExecutionVertex in > > >> speculative/non-speculative > > >> > > > cases. > > >> > > > > > >> > > > Thanks, > > >> > > > Zhu > > >> > > > > > >> > > > Gen Luo <luogen...@gmail.com> 于2022年7月5日周二 15:10写道: > > >> > > > > > >> > > > > > > >> > > > > Hi everyone, > > >> > > > > > > >> > > > > The speculative execution for batch jobs has been proposed and > > >> accepted > > >> > > > in > > >> > > > > FLIP-168[1], as well as the related blocklist mechanism in > > >> FLIP-224[2]. > > >> > > > As > > >> > > > > a follow-up step, the Flink Web UI needs to be enhanced to > > >> display the > > >> > > > > related information if the speculative execution mechanism is > > >> enabled. > > >> > > > > > > >> > > > > Junhan Yang, Yun Gao and I would like to start the discussion > > >> about the > > >> > > > Web > > >> > > > > UI enhancement and the corresponding REST API changes in > > >> FLIP-249[3], > > >> > > > > including: > > >> > > > > - show the speculative executions in the subtask list and the > > >> > > > backpressure > > >> > > > > page, where the fastest is shown directly while others are > > folded; > > >> > > > > - show the number of the blocked task managers in the Task > > >> Managers and > > >> > > > > Slots card, when the number is not 0; > > >> > > > > - show the BLOCKED label in the task manager list and the task > > >> manager > > >> > > > > detail page for the blocked task managers. > > >> > > > > > > >> > > > > All changes expect to be transparent to users who don’t use > > >> speculative > > >> > > > > execution. > > >> > > > > > > >> > > > > Please see the FLIP page[3] for more details. Looking forward > to > > >> your > > >> > > > > feedback. > > >> > > > > > > >> > > > > [1] > > >> > > > > > > >> > > > > > >> > > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job > > >> > > > > [2] > > >> > > > > > > >> > > > > > >> > > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-224%3A+Blocklist+Mechanism > > >> > > > > [3] > > >> > > > > > > >> > > > > > >> > > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-249%3A+Flink+Web+UI+Enhancement+for+Speculative+Execution > > >> > > > > > >> > > > > >> > > > > > >