[jira] [Updated] (FLINK-9352) In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9352:
--
Labels: pull-request-available  (was: )

> In Standalone checkpoint recover mode many jobs with same checkpoint interval 
> cause IO pressure
> ---
>
> Key: FLINK-9352
> URL: https://issues.apache.org/jira/browse/FLINK-9352
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>
> currently, the periodic checkpoint coordinator startCheckpointScheduler uses 
> *baseInterval* as the initialDelay parameter. the *baseInterval* is also the 
> checkpoint interval. 
> In standalone checkpoint mode, many jobs config the same checkpoint interval. 
> When all jobs being recovered (the cluster restart or jobmanager leadership 
> switched), all jobs' checkpoint period will tend to accordance. All jobs' 
> CheckpointCoordinator would start and trigger in a approximate time point.
> This caused the high IO cost in the same time period in our production 
> scenario.
> I suggest let the scheduleAtFixedRate's initial delay parameter as a API 
> config which can let user scatter checkpoint in this scenario.
>  
> cc [~StephanEwen] [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9352) In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure

2018-05-28 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9352:
-
Affects Version/s: 1.6.0
   1.5.0
   1.4.2

> In Standalone checkpoint recover mode many jobs with same checkpoint interval 
> cause IO pressure
> ---
>
> Key: FLINK-9352
> URL: https://issues.apache.org/jira/browse/FLINK-9352
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> currently, the periodic checkpoint coordinator startCheckpointScheduler uses 
> *baseInterval* as the initialDelay parameter. the *baseInterval* is also the 
> checkpoint interval. 
> In standalone checkpoint mode, many jobs config the same checkpoint interval. 
> When all jobs being recovered (the cluster restart or jobmanager leadership 
> switched), all jobs' checkpoint period will tend to accordance. All jobs' 
> CheckpointCoordinator would start and trigger in a approximate time point.
> This caused the high IO cost in the same time period in our production 
> scenario.
> I suggest let the scheduleAtFixedRate's initial delay parameter as a API 
> config which can let user scatter checkpoint in this scenario.
>  
> cc [~StephanEwen] [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9352) In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure

2018-05-28 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9352:
-
Issue Type: Improvement  (was: Bug)

> In Standalone checkpoint recover mode many jobs with same checkpoint interval 
> cause IO pressure
> ---
>
> Key: FLINK-9352
> URL: https://issues.apache.org/jira/browse/FLINK-9352
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.4.2
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Critical
>
> currently, the periodic checkpoint coordinator startCheckpointScheduler uses 
> *baseInterval* as the initialDelay parameter. the *baseInterval* is also the 
> checkpoint interval. 
> In standalone checkpoint mode, many jobs config the same checkpoint interval. 
> When all jobs being recovered (the cluster restart or jobmanager leadership 
> switched), all jobs' checkpoint period will tend to accordance. All jobs' 
> CheckpointCoordinator would start and trigger in a approximate time point.
> This caused the high IO cost in the same time period in our production 
> scenario.
> I suggest let the scheduleAtFixedRate's initial delay parameter as a API 
> config which can let user scatter checkpoint in this scenario.
>  
> cc [~StephanEwen] [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9352) In Standalone checkpoint recover mode many jobs with same checkpoint interval cause IO pressure

2018-05-28 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9352:
-
Priority: Major  (was: Critical)

> In Standalone checkpoint recover mode many jobs with same checkpoint interval 
> cause IO pressure
> ---
>
> Key: FLINK-9352
> URL: https://issues.apache.org/jira/browse/FLINK-9352
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.4.2, 1.6.0
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> currently, the periodic checkpoint coordinator startCheckpointScheduler uses 
> *baseInterval* as the initialDelay parameter. the *baseInterval* is also the 
> checkpoint interval. 
> In standalone checkpoint mode, many jobs config the same checkpoint interval. 
> When all jobs being recovered (the cluster restart or jobmanager leadership 
> switched), all jobs' checkpoint period will tend to accordance. All jobs' 
> CheckpointCoordinator would start and trigger in a approximate time point.
> This caused the high IO cost in the same time period in our production 
> scenario.
> I suggest let the scheduleAtFixedRate's initial delay parameter as a API 
> config which can let user scatter checkpoint in this scenario.
>  
> cc [~StephanEwen] [~Zentol]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)