[jira] [Commented] (FLINK-6042) Display last n exceptions/causes for job restarts in Web UI

Till Rohrmann (Jira) Thu, 21 Jan 2021 08:12:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269409#comment-17269409
 ]


Till Rohrmann commented on FLINK-6042:
--------------------------------------

Taking your argument, why is it better to add the exception information method 
to the {{ArchivedExecutionGraph}} and making it thereby accessible to all 
{{AbstractExecutionGraphHandler}} handlers? Wouldn't it make sense to only 
provide access to those information a handler needs? In our case, one could 
give access to the {{AccessExecutionGraph}} for those handlers which extract 
information from the {{ExecutionGraph}} and maybe something like a 
{{FailureHistory}} for the {{JobExceptionsHandler}}? In the end the 
{{ArchivedExecutionGraph}} might also implement {{FailureHistory}} but I think 
the important bit is to segregate the interfaces.

Thinking a step ahead, how would it work with the {{ArchivedExecutionGraph}} if 
we send multiple graphs because it changed over the job's lifetime. To which 
graph will the exception causing the lifetime end of a graph be assigned?

> Display last n exceptions/causes for job restarts in Web UI
> -----------------------------------------------------------
>
>                 Key: FLINK-6042
>                 URL: https://issues.apache.org/jira/browse/FLINK-6042
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination, Runtime / Web Frontend
>    Affects Versions: 1.3.0
>            Reporter: Till Rohrmann
>            Assignee: Matthias
>            Priority: Major
>              Labels: pull-request-available
>
> Users requested that it would be nice to see the last {{n}} exceptions 
> causing a job restart in the Web UI. This will help to more easily debug and 
> operate a job.
> We could store the root causes for failures similar to how prior executions 
> are stored in the {{ExecutionVertex}} using the {{EvictingBoundedList}} and 
> then serve this information via the Web UI.
> _-- Update: January 21, 2021 --_
> The UI can already handle multiple exceptions through the Exception History. 
> Right now, we list one or more exceptions which caused the job to fail. 
> Instead, we could adapt it in a way that the history contains not only the 
> exceptions of the most recent failure but one expandable entry per restart. 
> If there are more than one exception connected to a single restart, we would 
> list their stacktraces within one expandable entry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-6042) Display last n exceptions/causes for job restarts in Web UI

Reply via email to