The addition of the /jobs/:jobid/jobmanager/config / environment exclusively to the HS is a bit of a strange workaround.
How do you intend to document those? (and test compatibility)?

Why not just add a general /jobs/:jobid/environment endpoint that works just like jobmanager/environment.
To me that seems like a cleaner solution.
It is somewhat mentioned as an alternative in the FLIP, but I don't understand what is supposed to be confusing about it.
Whether the job ID is actually used in the end isn't visible after all.

/jobmanager/config could be integrated into /jobs/:jobid/config.

The same approach could maybe be used for logs; not really sure yet (not a fan of displaying logs in the HS in the first place).

On 23/06/2022 06:55, junhan yang wrote:
Hi all,

Thank you all for your feedbacks. As far as I can see, it looks like the
discussion on this FLIP has been converged.

I will start a new vote thread now.

Best regards,
Junhan

Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 14:05写道:

Thanks for the input, Jiangang.

I think it's a valid demand to distinguish completed jobs with the same
name.
- If they are different jobs, I think users need to give them
different meaningful names respectively.
- If they are exactly the same job, IIUC, what you need is to figure
out the order. ApplicationId in Yarn might help. But in this case, you
can just sort them with the start time.

Best,
Yangze Guo

On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu <liujiangangp...@gmail.com>
wrote:
Thanks for the FLIP. It is helpful to track detail infos for completed
jobs.
I want to ask another question. In our environment, sometimes it is hard
to
distinguish jobs since the same job names may appear multi times in the
completed jobs. Because a job may run multi times or different jobs have
the same job names. I wonder that wether we can enhance the complete jobs
display with more information, such as applicationId and application name
in yarn. Maybe it is different in k8s to identify a job.

Best
Jiangang Liu

Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 11:40写道:

Thanks for the feedback, Aitozi and Jing.

Are each attempts of the TaskManager or JobManager pods (if failure
occurs)
all be shown in the ui?

The info of the prior execution attempts will be archived, you could
refer to `ArchivedExecutionVertex$priorExecutions`.

It seems that most of these metrics are more interesting to batch
jobs.
Does it make sense to calculate them for pure streaming jobs too?

All the proposed metrics will be calculated no matter what the job
type is.
Why "duration is less interesting" which is mentioned in the FLIP?
As a first step, we mainly focus on the most interesting status during
the job lifecycle. The duration of final states like FINISHED and
CANCELED is meaningless, while abnormal conditions like CANCELING will
not be included at the moment.

Could you share your thoughts on "accumulated-busy-time"? It should
describe the time while the task is working as expected, i.e. the happy
path. When do we need it for analytics or diagnosis?

A task could be busy or idle while it is working. Users may adjust the
parallelism or the partition key according to the ratio between them.

Best,
Yangze Guo

On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <j...@ververica.com> wrote:
Hi Junhan

These are must-to-have information for batch processing. Thanks for
bringing it up.

I have some comments:

1. It seems that most of these metrics are more interesting to batch
jobs.
Does it make sense to calculate them for pure streaming jobs too?
2. Why "duration is less interesting" which is mentioned in the FLIP?
3. Could you share your thoughts on "accumulated-busy-time"? It
should
describe the time while the task is working as expected, i.e. the
happy
path. When do we need it for analytics or diagnosis?

BTW, you might want to optimize the format of the FLIP. Some text is
running out of the right border of the wiki page.

Best regards,
Jing

On Thu, Jun 16, 2022 at 4:40 PM Aitozi <gjying1...@gmail.com> wrote:

Thanks Junhan for driving this. It a great improvement for the
batch
jobs.
I'm looking forward to this feature in our internal use case. +1
for
it.
One more question:

Are each attempts of the TaskManager or JobManager pods (if failure
occurs)
all be shown in the ui ?

Best,
Aitozi.

Yang Wang <danrtsey...@gmail.com> 于2022年6月16日周四 19:10写道:

Thanks Xintong for the explanation.

It makes sense to leave the discussion about job result store in
a
dedicated thread.


Best,
Yang

Xintong Song <tonysong...@gmail.com> 于2022年6月16日周四 13:40写道:

My impression of JobResultStore is more about fault tolerance
and
high
availability. Using it for providing information to users
sounds
worth
exploring. We probably need more time to think it through.

Given that it doesn't conflict with what we have proposed in
this
FLIP,
I'd
suggest considering it as a separate thread and exclude it
from the
scope
of this one.

Best,

Xintong



On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <
danrtsey...@gmail.com>
wrote:
This is a very useful feature both for finished streaming and
batch
jobs.
Except for the WebUI & REST API improvements, I am curious
whether we
could
also integrate some critical information(e.g. latest
checkpoint)
into
the
job result store[1].
I am just feeling this is also somehow related with
"Completed
Jobs
Information Enhancement".
And I think the history server is not necessary for all the
scenarios
especially when users only want to check the job execution
result.
[1].


https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore

Best,
Yang

Xintong Song <tonysong...@gmail.com> 于2022年6月15日周三 15:37写道:

Thanks Junhan,

+1 for the proposed improvements.

Best,

Xintong



On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <
karma...@gmail.com
wrote:
Thanks for driving this, Junhan.

I think it's a valuable usability improvement for both
streaming
and
batch users. Looking forward to the community feedback.

Best,
Yangze Guo



On Wed, Jun 15, 2022 at 3:10 PM junhan yang <
yangjunhan1...@gmail.com>
wrote:
Hi all,

I would like to open a discussion on FLIP-241:
Completed
Jobs
Information
Enhancement.

As far as we can tell, streaming and batch users have
different
interests
in probing a job. As Flink grows into a unified
streaming &
batch
processor
and is adopted by more and more batch users, the user
experience
of
completed job's inspection has become more and more
important.
After
doing
several market research, there are several potential
improvements
spotted.
The main purpose here is due to the involvement of
WebUI &
REST
API
changes, which should be openly discussed and voted on
as
FLIPs.
You can find more details in FLIP-241 document[1].
Looking
forward
to
your feedback.

[1] https://cwiki.apache.org/confluence/x/dRD1D

Best regards,
Junhan


Reply via email to