[ 
https://issues.apache.org/jira/browse/FLINK-31192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31192:
-----------------------------------
    Labels: pull-request-available  (was: )

> dataGen takes too long to initialize under sequence
> ---------------------------------------------------
>
>                 Key: FLINK-31192
>                 URL: https://issues.apache.org/jira/browse/FLINK-31192
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.17.0, 1.15.3, 1.16.1
>            Reporter: xzw0223
>            Assignee: xzw0223
>            Priority: Major
>              Labels: pull-request-available
>
> The SequenceGenerator preloads all sequence values in open. If the 
> totalElement number is too large, it will take too long.
> [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91]
> The reason is that the capacity of the Deque will be expanded twice when the 
> current capacity is full, and the array copy is required, which is 
> time-consuming.
>  
> Here's what I think : 
>  do not preload the full amount of data on Sequence, and generate a piece of 
> data each time next is called to solve the problem of slow initialization 
> caused by loading full amount of data.
>   record the currently sent Sequence position through the checkpoint, and 
> continue to send data through the recorded position after an abnormal restart 
> to ensure fault tolerance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to