xzw0223 created FLINK-31192:
-------------------------------
Summary: dataGen takes too long to initialize under sequence
Key: FLINK-31192
URL: https://issues.apache.org/jira/browse/FLINK-31192
Project: Flink
Issue Type: Improvement
Affects Versions: 1.16.1, 1.16.0
Reporter: xzw0223
Fix For: 1.16.1, 1.16.0
The SequenceGenerator preloads all sequence values in open. If the totalElement
number is too large, it will take too long.
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91]
The reason is that the capacity of the Deque will be expanded twice when the
current capacity is full, and the array copy is required, which is
time-consuming.
Here's what I think :
do not preload the full amount of data on Sequence, and generate a piece of
data each time next is called to solve the problem of slow initialization
caused by loading full amount of data.
record the currently sent Sequence position through the checkpoint, and
continue to send data through the recorded position after an abnormal restart
to ensure fault tolerance
--
This message was sent by Atlassian Jira
(v8.20.10#820010)