Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

Chesnay Schepler Thu, 23 Jun 2022 02:07:22 -0700

The addition of the /jobs/:jobid/jobmanager/config / environmentexclusively to the HS is a bit of a strange workaround.

How do you intend to document those? (and test compatibility)?

Why not just add a general /jobs/:jobid/environment endpoint that worksjust like jobmanager/environment.

To me that seems like a cleaner solution.

It is somewhat mentioned as an alternative in the FLIP, but I don'tunderstand what is supposed to be confusing about it.

Whether the job ID is actually used in the end isn't visible after all.

/jobmanager/config could be integrated into /jobs/:jobid/config.

The same approach could maybe be used for logs; not really sure yet (nota fan of displaying logs in the HS in the first place).


On 23/06/2022 06:55, junhan yang wrote:

Hi all,

Thank you all for your feedbacks. As far as I can see, it looks like the
discussion on this FLIP has been converged.

I will start a new vote thread now.

Best regards,
Junhan

Yangze Guo <[email protected]> 于2022年6月17日周五 14:05写道：

Thanks for the input, Jiangang.

I think it's a valid demand to distinguish completed jobs with the same
name.
- If they are different jobs, I think users need to give them
different meaningful names respectively.
- If they are exactly the same job, IIUC, what you need is to figure
out the order. ApplicationId in Yarn might help. But in this case, you
can just sort them with the start time.

Best,
Yangze Guo

On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu <[email protected]>
wrote:

Thanks for the FLIP. It is helpful to track detail infos for completed

jobs.

I want to ask another question. In our environment, sometimes it is hard

to

distinguish jobs since the same job names may appear multi times in the
completed jobs. Because a job may run multi times or different jobs have
the same job names. I wonder that wether we can enhance the complete jobs
display with more information, such as applicationId and application name
in yarn. Maybe it is different in k8s to identify a job.

Best
Jiangang Liu

Yangze Guo <[email protected]> 于2022年6月17日周五 11:40写道：

Thanks for the feedback, Aitozi and Jing.

Are each attempts of the TaskManager or JobManager pods (if failure

occurs)
all be shown in the ui?

The info of the prior execution attempts will be archived, you could
refer to `ArchivedExecutionVertex$priorExecutions`.

It seems that most of these metrics are more interesting to batch

jobs.

Does it make sense to calculate them for pure streaming jobs too?

All the proposed metrics will be calculated no matter what the job

type is.

Why "duration is less interesting" which is mentioned in the FLIP?

As a first step, we mainly focus on the most interesting status during
the job lifecycle. The duration of final states like FINISHED and
CANCELED is meaningless, while abnormal conditions like CANCELING will
not be included at the moment.

Could you share your thoughts on "accumulated-busy-time"? It should

describe the time while the task is working as expected, i.e. the happy
path. When do we need it for analytics or diagnosis?

A task could be busy or idle while it is working. Users may adjust the
parallelism or the partition key according to the ratio between them.

Best,
Yangze Guo

On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <[email protected]> wrote:

Hi Junhan

These are must-to-have information for batch processing. Thanks for
bringing it up.

I have some comments:

1. It seems that most of these metrics are more interesting to batch

jobs.

Does it make sense to calculate them for pure streaming jobs too?
2. Why "duration is less interesting" which is mentioned in the FLIP?
3. Could you share your thoughts on "accumulated-busy-time"? It

should

describe the time while the task is working as expected, i.e. the

happy

path. When do we need it for analytics or diagnosis?

BTW, you might want to optimize the format of the FLIP. Some text is
running out of the right border of the wiki page.

Best regards,
Jing

On Thu, Jun 16, 2022 at 4:40 PM Aitozi <[email protected]> wrote:

Thanks Junhan for driving this. It a great improvement for the

batch

jobs.

I'm looking forward to this feature in our internal use case. +1

for

it.

One more question:

Are each attempts of the TaskManager or JobManager pods (if failure

occurs)

all be shown in the ui ?

Best,
Aitozi.

Yang Wang <[email protected]> 于2022年6月16日周四 19:10写道：

Thanks Xintong for the explanation.

It makes sense to leave the discussion about job result store in

dedicated thread.


Best,
Yang

Xintong Song <[email protected]> 于2022年6月16日周四 13:40写道：

My impression of JobResultStore is more about fault tolerance

and

high

availability. Using it for providing information to users

sounds

worth

exploring. We probably need more time to think it through.

Given that it doesn't conflict with what we have proposed in

this

FLIP,

I'd

suggest considering it as a separate thread and exclude it

from the

scope

of this one.

Best,

Xintong



On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <

[email protected]>

wrote:

This is a very useful feature both for finished streaming and

batch

jobs.

Except for the WebUI & REST API improvements, I am curious

whether we

could

also integrate some critical information(e.g. latest

checkpoint)

into

the

job result store[1].
I am just feeling this is also somehow related with

"Completed

Jobs

Information Enhancement".
And I think the history server is not necessary for all the

scenarios

especially when users only want to check the job execution

result.

[1].

https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore


Best,
Yang

Xintong Song <[email protected]> 于2022年6月15日周三 15:37写道：

Thanks Junhan,

+1 for the proposed improvements.

Best,

Xintong



On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <

[email protected]

wrote:

Thanks for driving this, Junhan.

I think it's a valuable usability improvement for both

streaming

and

batch users. Looking forward to the community feedback.

Best,
Yangze Guo



On Wed, Jun 15, 2022 at 3:10 PM junhan yang <

[email protected]>

wrote:

Hi all,

I would like to open a discussion on FLIP-241:

Completed

Jobs

Information

Enhancement.

As far as we can tell, streaming and batch users have

different

interests

in probing a job. As Flink grows into a unified

streaming &

batch

processor

and is adopted by more and more batch users, the user

experience

of

completed job's inspection has become more and more

important.

After

doing

several market research, there are several potential

improvements

spotted.

The main purpose here is due to the involvement of

WebUI &

REST

API

changes, which should be openly discussed and voted on

as

FLIPs.

You can find more details in FLIP-241 document[1].

Looking

forward

to

your feedback.

[1] https://cwiki.apache.org/confluence/x/dRD1D

Best regards,
Junhan

Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

Reply via email to