Haibo Suen created FLINK-11256:
----------------------------------
Summary: Referencing StreamNode objects directly in StreamEdge
causes the sizes of JobGraph and TDD to become unnecessarily large
Key: FLINK-11256
URL: https://issues.apache.org/jira/browse/FLINK-11256
Project: Flink
Issue Type: Bug
Components: Streaming
Affects Versions: 1.7.1, 1.7.0
Reporter: Haibo Suen
Assignee: Haibo Suen
When a job graph is generated from StreamGraph, StreamEdge(s) on the stream
graph are serialized to StreamConfig and stored into the job graph. After that,
the serialized bytes will be included in the TDD and distributed to TM. Because
StreamEdge directly reference to StreamNode objects including sourceVertex and
targetVertex, these objects are also written transitively on serializing
StreamEdge. But these StreamNode objects are not needed at runtime. For a large
size topology, this will causes JobGraph/TDD to become much larger than that
actually need, and more likely to occur rpc timeout when transmitted.
In Streamedge, only the ID of StreamNode should be stored to avoid this
situation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)