Thanks for starting this discussion, Gen!
I agree it is confusing or even troublesome to show an attempt id that is
different from the corresponding attempt number in REST, metrics and logs.
It adds burden to users to do the mapping in troubleshooting. Mis-mapping
can be easy to happen and result in a waste of efforts and wrong
conclusion.

Therefore, +1 for this proposal.

Thanks,
Zhu

Gen Luo <luogen...@gmail.com> 于2022年7月20日周三 15:24写道:
>
> Hi everyone,
>
> I'd like to propose a change on the Web UI to replace the Attempt column
> with an Attempt Number column on the subtask list page.
>
> From the very beginning, the attempt number shown is calculated at the
> frontend by subtask.attempt + 1, which means the attempt number shown on
> the web UI is not the same as it is in the runtime, as well as the logs and
> the metrics. Users may get confused since they can't find logs or metrics
> of the subtask with the same attempt number.
>
> Fortunately, by now the users don't need to care about the attempt number,
> since there can be only one attempt of each subtask. However, the confusion
> seems inevitable once the speculative execution[1] or the attempt history
> is introduced, since multiple attempts of the same subtask can be executed
> or presented at the same time.
>
> I suggest that the attempt number shown on the web UI should be changed to
> align that on the runtime side, which is used in logging and metrics
> reporting. To avoid confusion, the column should also be renamed as
> "Attempt Number". The changes should only affect the Web UI. No REST API
> needs to change. What do you think?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job
>
> Best,
> Gen

Reply via email to