Hi Yangze, Hi Till,

thanks you for working on this topic. I believe it will make debugging
large Apache Flink deployments much more feasible.

I was wondering whether it would make sense to allow the user to specify
the Resource ID in standalone setups?  For example, many users still
implicitly use standalone clusters on Kubernetes (the native support is
still experimental) and in these cases it would be interesting to also set
the PodName as the ResourceID. What do you think?

Cheers,

Kosntantin

On Thu, Mar 26, 2020 at 6:49 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Yangze,
>
> thanks for creating this FLIP. I think it is a very good improvement
> helping our users and ourselves understanding better what's going on in
> Flink.
>
> Creating the ResourceIDs with host information/pod name is a good idea.
>
> Also deriving ExecutionGraph IDs from their superset ID is a good idea.
>
> The InstanceID is used for fencing purposes. I would not make it a
> composition of the ResourceID + a monotonically increasing number. The
> problem is that in case of a RM failure the InstanceIDs would start from 0
> again and this could lead to collisions.
>
> Logging more information on how the different runtime IDs are correlated is
> also a good idea.
>
> Two other ideas for simplifying the ids are the following:
>
> * The SlotRequestID was introduced because the SlotPool was a separate
> RpcEndpoint a while ago. With this no longer being the case I think we
> could remove the SlotRequestID and replace it with the AllocationID.
> * Instead of creating new SlotRequestIDs for multi task slots one could
> derive them from the SlotRequestID used for requesting the underlying
> AllocatedSlot.
>
> Given that the slot sharing logic will most likely be reworked with the
> pipelined region scheduling, we might be able to resolve these two points
> as part of the pipelined region scheduling effort.
>
> Cheers,
> Till
>
> On Thu, Mar 26, 2020 at 10:51 AM Yangze Guo <karma...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > We would like to start a discussion thread on "FLIP-118: Improve
> > Flink’s ID system"[1].
> >
> > This FLIP mainly discusses the following issues, target to enhance the
> > readability of IDs in log and help user to debug in case of failures:
> >
> > - Enhance the readability of the string literals of IDs. Most of them
> > are hashcodes, e.g. ExecutionAttemptID, which do not provide much
> > meaningful information and are hard to recognize and compare for
> > users.
> > - Log the ID’s lineage information to make debugging more convenient.
> > Currently, the log fails to always show the lineage information
> > between IDs. Finding out relationships between entities identified by
> > given IDs is a common demand, e.g., slot of which AllocationID is
> > assigned to satisfy slot request of with SlotRequestID. Absence of
> > such lineage information, it’s impossible to track the end to end
> > lifecycle of an Execution or a Task now, which makes debugging
> > difficult.
> >
> > Key changes proposed in the FLIP are as follows:
> >
> > - Add location information to distributed components
> > - Add topology information to graph components
> > - Log the ID’s lineage information
> > - Expose the identifier of distributing component to user
> >
> > Please find more details in the FLIP wiki document [1]. Looking forward
> to
> > your feedbacks.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148643521
> >
> > Best,
> > Yangze Guo
> >
>


-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng

Reply via email to