Re: [SURVEY] How many people are using customized RestartStrategy(s)
We will then keep the decision that we do not support customized restart strategy in Flink 1.10. Thanks Steven for the inputs! Thanks, Zhu Zhu Steven Wu 于2019年9月26日周四 上午12:13写道: > Zhu Zhu, that is correct. > > On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu wrote: > >> Hi Steven, >> >> As a conclusion, since we will have a meter metric[1] for restarts, >> customized restart strategy is not needed in your case. >> Is that right? >> >> [1] https://issues.apache.org/jira/browse/FLINK-14164 >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月25日周三 上午2:30写道: >> >>> Zhu Zhu, >>> >>> Sorry, I was using different terminology. yes, Flink meter is what I was >>> talking about regarding "fullRestarts" for threshold based alerting. >>> >>> On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu wrote: >>> Steven, In my mind, Flink counter only stores its accumulated count and reports that value. Are you using an external counter directly? Maybe Flink Meter/MeterView is what you need? It stores the count and calculates the rate. And it will report its "count" as well as "rate" to external metric services. The counter "task_failures" only works if the individual failover strategy is enabled. However, it is not a public interface and is not suggested to use, as the fine grained recovery (region failover) now supersedes it. I've opened a ticket[1] to add a metric to show failovers that respects fine grained recovery. [1] https://issues.apache.org/jira/browse/FLINK-14164 Thanks, Zhu Zhu Steven Wu 于2019年9月24日周二 上午6:41写道: > > When we setup alert like "fullRestarts > 1" for some rolling window, > we want to use counter. if it is a Gauge, "fullRestarts" will never go > below 1 after a first full restart. So alert condition will always be true > after first job restart. If we can apply a derivative to the Gauge value, > I > guess alert can probably work. I can explore if that is an option or not. > > Yeah. Understood that "fullRestart" won't increment when fine grained > recovery happened. I think "task_failures" counter already exists in > Flink. > > > > On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: > >> Steven, >> >> Thanks for the information. If we can determine this a common issue, >> we can solve it in Flink core. >> To get to that state, I have two questions which need your help: >> 1. Why is gauge not good for alerting? The metric "fullRestart" is a >> Gauge. Does the metric reporter you use report Counter and >> Gauge to external services in different ways? Or anything else can >> be >> different due to the metric type? >> 2. Is the "number of restarts" what you actually need, rather than >> the "fullRestart" count? If so, I believe we will have such a counter >> metric in 1.10, since the previous "fullRestart" metric value is not the >> number of restarts when grained recovery (feature added 1.9.0) is >> enabled. >> "fullRestart" reveals how many times entire job graph has been >> restarted. If grained recovery (feature added 1.9.0) is enabled, the >> graph >> would not be restarted when task failures happen and the "fullRestart" >> value will not increment in such cases. >> >> I'd appreciate if you can help with these questions and we can make >> better decisions for Flink. >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月22日周日 上午3:31写道: >> >>> Zhu Zhu, >>> >>> Flink fullRestart metric is a Gauge, which is not good for alerting >>> on. We publish an equivalent Counter metric for alerting purpose. >>> >>> Thanks, >>> Steven >>> >>> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: >>> Thanks Steven for the feedback! Could you share more information about the metrics you add in you customized restart strategy? Thanks, Zhu Zhu Steven Wu 于2019年9月20日周五 上午7:11写道: > We do use config like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional > metrics than the Flink provided ones. > > On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > >> Thanks everyone for the input. >> >> The RestartStrategy customization is not recognized as a public >> interface as it is not explicitly documented. >> As it is not used from the feedbacks of this survey, I'll >> conclude that we do not need to support customized RestartStrategy >> for the >> new scheduler in Flink 1.10 >> >> Other usages are still supported, including all the strategies >> and configuring ways described in >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Zhu Zhu, that is correct. On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu wrote: > Hi Steven, > > As a conclusion, since we will have a meter metric[1] for restarts, > customized restart strategy is not needed in your case. > Is that right? > > [1] https://issues.apache.org/jira/browse/FLINK-14164 > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月25日周三 上午2:30写道: > >> Zhu Zhu, >> >> Sorry, I was using different terminology. yes, Flink meter is what I was >> talking about regarding "fullRestarts" for threshold based alerting. >> >> On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu wrote: >> >>> Steven, >>> >>> In my mind, Flink counter only stores its accumulated count and reports >>> that value. Are you using an external counter directly? >>> Maybe Flink Meter/MeterView is what you need? It stores the count and >>> calculates the rate. And it will report its "count" as well as "rate" to >>> external metric services. >>> >>> The counter "task_failures" only works if the individual failover >>> strategy is enabled. However, it is not a public interface and is not >>> suggested to use, as the fine grained recovery (region failover) now >>> supersedes it. >>> I've opened a ticket[1] to add a metric to show failovers that respects >>> fine grained recovery. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-14164 >>> >>> Thanks, >>> Zhu Zhu >>> >>> Steven Wu 于2019年9月24日周二 上午6:41写道: >>> When we setup alert like "fullRestarts > 1" for some rolling window, we want to use counter. if it is a Gauge, "fullRestarts" will never go below 1 after a first full restart. So alert condition will always be true after first job restart. If we can apply a derivative to the Gauge value, I guess alert can probably work. I can explore if that is an option or not. Yeah. Understood that "fullRestart" won't increment when fine grained recovery happened. I think "task_failures" counter already exists in Flink. On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: > Steven, > > Thanks for the information. If we can determine this a common issue, > we can solve it in Flink core. > To get to that state, I have two questions which need your help: > 1. Why is gauge not good for alerting? The metric "fullRestart" is a > Gauge. Does the metric reporter you use report Counter and > Gauge to external services in different ways? Or anything else can > be > different due to the metric type? > 2. Is the "number of restarts" what you actually need, rather than > the "fullRestart" count? If so, I believe we will have such a counter > metric in 1.10, since the previous "fullRestart" metric value is not the > number of restarts when grained recovery (feature added 1.9.0) is enabled. > "fullRestart" reveals how many times entire job graph has been > restarted. If grained recovery (feature added 1.9.0) is enabled, the graph > would not be restarted when task failures happen and the "fullRestart" > value will not increment in such cases. > > I'd appreciate if you can help with these questions and we can make > better decisions for Flink. > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月22日周日 上午3:31写道: > >> Zhu Zhu, >> >> Flink fullRestart metric is a Gauge, which is not good for alerting >> on. We publish an equivalent Counter metric for alerting purpose. >> >> Thanks, >> Steven >> >> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: >> >>> Thanks Steven for the feedback! >>> Could you share more information about the metrics you add in you >>> customized restart strategy? >>> >>> Thanks, >>> Zhu Zhu >>> >>> Steven Wu 于2019年9月20日周五 上午7:11写道: >>> We do use config like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional metrics than the Flink provided ones. On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > Thanks everyone for the input. > > The RestartStrategy customization is not recognized as a public > interface as it is not explicitly documented. > As it is not used from the feedbacks of this survey, I'll conclude > that we do not need to support customized RestartStrategy for the new > scheduler in Flink 1.10 > > Other usages are still supported, including all the strategies and > configuring ways described in > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies > . > > Feel free to share in this thread if you has any concern for it. > > Thanks, > Zhu Zhu > > Zhu Zhu 于2019年9月12日周四 下午10:33写道: > >> Thanks Oytun for the reply! >> >> Sorry for not have stated it clearly. When saying "customized >
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Hi Steven, As a conclusion, since we will have a meter metric[1] for restarts, customized restart strategy is not needed in your case. Is that right? [1] https://issues.apache.org/jira/browse/FLINK-14164 Thanks, Zhu Zhu Steven Wu 于2019年9月25日周三 上午2:30写道: > Zhu Zhu, > > Sorry, I was using different terminology. yes, Flink meter is what I was > talking about regarding "fullRestarts" for threshold based alerting. > > On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu wrote: > >> Steven, >> >> In my mind, Flink counter only stores its accumulated count and reports >> that value. Are you using an external counter directly? >> Maybe Flink Meter/MeterView is what you need? It stores the count and >> calculates the rate. And it will report its "count" as well as "rate" to >> external metric services. >> >> The counter "task_failures" only works if the individual failover >> strategy is enabled. However, it is not a public interface and is not >> suggested to use, as the fine grained recovery (region failover) now >> supersedes it. >> I've opened a ticket[1] to add a metric to show failovers that respects >> fine grained recovery. >> >> [1] https://issues.apache.org/jira/browse/FLINK-14164 >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月24日周二 上午6:41写道: >> >>> >>> When we setup alert like "fullRestarts > 1" for some rolling window, we >>> want to use counter. if it is a Gauge, "fullRestarts" will never go below 1 >>> after a first full restart. So alert condition will always be true after >>> first job restart. If we can apply a derivative to the Gauge value, I guess >>> alert can probably work. I can explore if that is an option or not. >>> >>> Yeah. Understood that "fullRestart" won't increment when fine grained >>> recovery happened. I think "task_failures" counter already exists in Flink. >>> >>> >>> >>> On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: >>> Steven, Thanks for the information. If we can determine this a common issue, we can solve it in Flink core. To get to that state, I have two questions which need your help: 1. Why is gauge not good for alerting? The metric "fullRestart" is a Gauge. Does the metric reporter you use report Counter and Gauge to external services in different ways? Or anything else can be different due to the metric type? 2. Is the "number of restarts" what you actually need, rather than the "fullRestart" count? If so, I believe we will have such a counter metric in 1.10, since the previous "fullRestart" metric value is not the number of restarts when grained recovery (feature added 1.9.0) is enabled. "fullRestart" reveals how many times entire job graph has been restarted. If grained recovery (feature added 1.9.0) is enabled, the graph would not be restarted when task failures happen and the "fullRestart" value will not increment in such cases. I'd appreciate if you can help with these questions and we can make better decisions for Flink. Thanks, Zhu Zhu Steven Wu 于2019年9月22日周日 上午3:31写道: > Zhu Zhu, > > Flink fullRestart metric is a Gauge, which is not good for alerting > on. We publish an equivalent Counter metric for alerting purpose. > > Thanks, > Steven > > On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: > >> Thanks Steven for the feedback! >> Could you share more information about the metrics you add in you >> customized restart strategy? >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月20日周五 上午7:11写道: >> >>> We do use config like "restart-strategy: >>> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional >>> metrics than the Flink provided ones. >>> >>> On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: >>> Thanks everyone for the input. The RestartStrategy customization is not recognized as a public interface as it is not explicitly documented. As it is not used from the feedbacks of this survey, I'll conclude that we do not need to support customized RestartStrategy for the new scheduler in Flink 1.10 Other usages are still supported, including all the strategies and configuring ways described in https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies . Feel free to share in this thread if you has any concern for it. Thanks, Zhu Zhu Zhu Zhu 于2019年9月12日周四 下午10:33写道: > Thanks Oytun for the reply! > > Sorry for not have stated it clearly. When saying "customized > RestartStrategy", we mean that users implement an > *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* > by themselves and use it by configuring like "restart-strategy: > or
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Zhu Zhu, Sorry, I was using different terminology. yes, Flink meter is what I was talking about regarding "fullRestarts" for threshold based alerting. On Mon, Sep 23, 2019 at 7:46 PM Zhu Zhu wrote: > Steven, > > In my mind, Flink counter only stores its accumulated count and reports > that value. Are you using an external counter directly? > Maybe Flink Meter/MeterView is what you need? It stores the count and > calculates the rate. And it will report its "count" as well as "rate" to > external metric services. > > The counter "task_failures" only works if the individual failover strategy > is enabled. However, it is not a public interface and is not suggested to > use, as the fine grained recovery (region failover) now supersedes it. > I've opened a ticket[1] to add a metric to show failovers that respects > fine grained recovery. > > [1] https://issues.apache.org/jira/browse/FLINK-14164 > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月24日周二 上午6:41写道: > >> >> When we setup alert like "fullRestarts > 1" for some rolling window, we >> want to use counter. if it is a Gauge, "fullRestarts" will never go below 1 >> after a first full restart. So alert condition will always be true after >> first job restart. If we can apply a derivative to the Gauge value, I guess >> alert can probably work. I can explore if that is an option or not. >> >> Yeah. Understood that "fullRestart" won't increment when fine grained >> recovery happened. I think "task_failures" counter already exists in Flink. >> >> >> >> On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: >> >>> Steven, >>> >>> Thanks for the information. If we can determine this a common issue, we >>> can solve it in Flink core. >>> To get to that state, I have two questions which need your help: >>> 1. Why is gauge not good for alerting? The metric "fullRestart" is a >>> Gauge. Does the metric reporter you use report Counter and >>> Gauge to external services in different ways? Or anything else can be >>> different due to the metric type? >>> 2. Is the "number of restarts" what you actually need, rather than >>> the "fullRestart" count? If so, I believe we will have such a counter >>> metric in 1.10, since the previous "fullRestart" metric value is not the >>> number of restarts when grained recovery (feature added 1.9.0) is enabled. >>> "fullRestart" reveals how many times entire job graph has been >>> restarted. If grained recovery (feature added 1.9.0) is enabled, the graph >>> would not be restarted when task failures happen and the "fullRestart" >>> value will not increment in such cases. >>> >>> I'd appreciate if you can help with these questions and we can make >>> better decisions for Flink. >>> >>> Thanks, >>> Zhu Zhu >>> >>> Steven Wu 于2019年9月22日周日 上午3:31写道: >>> Zhu Zhu, Flink fullRestart metric is a Gauge, which is not good for alerting on. We publish an equivalent Counter metric for alerting purpose. Thanks, Steven On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: > Thanks Steven for the feedback! > Could you share more information about the metrics you add in you > customized restart strategy? > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月20日周五 上午7:11写道: > >> We do use config like "restart-strategy: >> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional >> metrics than the Flink provided ones. >> >> On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: >> >>> Thanks everyone for the input. >>> >>> The RestartStrategy customization is not recognized as a public >>> interface as it is not explicitly documented. >>> As it is not used from the feedbacks of this survey, I'll conclude >>> that we do not need to support customized RestartStrategy for the new >>> scheduler in Flink 1.10 >>> >>> Other usages are still supported, including all the strategies and >>> configuring ways described in >>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies >>> . >>> >>> Feel free to share in this thread if you has any concern for it. >>> >>> Thanks, >>> Zhu Zhu >>> >>> Zhu Zhu 于2019年9月12日周四 下午10:33写道: >>> Thanks Oytun for the reply! Sorry for not have stated it clearly. When saying "customized RestartStrategy", we mean that users implement an *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by themselves and use it by configuring like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". The usage of restart strategies you mentioned will keep working with the new scheduler. Thanks, Zhu Zhu Oytun Tez 于2019年9月12日周四 下午10:05写道: > Hi Zhu, > > We are using custom restart strategy like this: > > environment.setRe
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Steven, In my mind, Flink counter only stores its accumulated count and reports that value. Are you using an external counter directly? Maybe Flink Meter/MeterView is what you need? It stores the count and calculates the rate. And it will report its "count" as well as "rate" to external metric services. The counter "task_failures" only works if the individual failover strategy is enabled. However, it is not a public interface and is not suggested to use, as the fine grained recovery (region failover) now supersedes it. I've opened a ticket[1] to add a metric to show failovers that respects fine grained recovery. [1] https://issues.apache.org/jira/browse/FLINK-14164 Thanks, Zhu Zhu Steven Wu 于2019年9月24日周二 上午6:41写道: > > When we setup alert like "fullRestarts > 1" for some rolling window, we > want to use counter. if it is a Gauge, "fullRestarts" will never go below 1 > after a first full restart. So alert condition will always be true after > first job restart. If we can apply a derivative to the Gauge value, I guess > alert can probably work. I can explore if that is an option or not. > > Yeah. Understood that "fullRestart" won't increment when fine grained > recovery happened. I think "task_failures" counter already exists in Flink. > > > > On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: > >> Steven, >> >> Thanks for the information. If we can determine this a common issue, we >> can solve it in Flink core. >> To get to that state, I have two questions which need your help: >> 1. Why is gauge not good for alerting? The metric "fullRestart" is a >> Gauge. Does the metric reporter you use report Counter and >> Gauge to external services in different ways? Or anything else can be >> different due to the metric type? >> 2. Is the "number of restarts" what you actually need, rather than >> the "fullRestart" count? If so, I believe we will have such a counter >> metric in 1.10, since the previous "fullRestart" metric value is not the >> number of restarts when grained recovery (feature added 1.9.0) is enabled. >> "fullRestart" reveals how many times entire job graph has been >> restarted. If grained recovery (feature added 1.9.0) is enabled, the graph >> would not be restarted when task failures happen and the "fullRestart" >> value will not increment in such cases. >> >> I'd appreciate if you can help with these questions and we can make >> better decisions for Flink. >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月22日周日 上午3:31写道: >> >>> Zhu Zhu, >>> >>> Flink fullRestart metric is a Gauge, which is not good for alerting on. >>> We publish an equivalent Counter metric for alerting purpose. >>> >>> Thanks, >>> Steven >>> >>> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: >>> Thanks Steven for the feedback! Could you share more information about the metrics you add in you customized restart strategy? Thanks, Zhu Zhu Steven Wu 于2019年9月20日周五 上午7:11写道: > We do use config like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional > metrics than the Flink provided ones. > > On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > >> Thanks everyone for the input. >> >> The RestartStrategy customization is not recognized as a public >> interface as it is not explicitly documented. >> As it is not used from the feedbacks of this survey, I'll conclude >> that we do not need to support customized RestartStrategy for the new >> scheduler in Flink 1.10 >> >> Other usages are still supported, including all the strategies and >> configuring ways described in >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies >> . >> >> Feel free to share in this thread if you has any concern for it. >> >> Thanks, >> Zhu Zhu >> >> Zhu Zhu 于2019年9月12日周四 下午10:33写道: >> >>> Thanks Oytun for the reply! >>> >>> Sorry for not have stated it clearly. When saying "customized >>> RestartStrategy", we mean that users implement an >>> *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* >>> by themselves and use it by configuring like "restart-strategy: >>> org.foobar.MyRestartStrategyFactoryFactory". >>> >>> The usage of restart strategies you mentioned will keep working with >>> the new scheduler. >>> >>> Thanks, >>> Zhu Zhu >>> >>> Oytun Tez 于2019年9月12日周四 下午10:05写道: >>> Hi Zhu, We are using custom restart strategy like this: environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), Time.minutes(10))); --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrot
Re: [SURVEY] How many people are using customized RestartStrategy(s)
When we setup alert like "fullRestarts > 1" for some rolling window, we want to use counter. if it is a Gauge, "fullRestarts" will never go below 1 after a first full restart. So alert condition will always be true after first job restart. If we can apply a derivative to the Gauge value, I guess alert can probably work. I can explore if that is an option or not. Yeah. Understood that "fullRestart" won't increment when fine grained recovery happened. I think "task_failures" counter already exists in Flink. On Sun, Sep 22, 2019 at 7:59 PM Zhu Zhu wrote: > Steven, > > Thanks for the information. If we can determine this a common issue, we > can solve it in Flink core. > To get to that state, I have two questions which need your help: > 1. Why is gauge not good for alerting? The metric "fullRestart" is a > Gauge. Does the metric reporter you use report Counter and > Gauge to external services in different ways? Or anything else can be > different due to the metric type? > 2. Is the "number of restarts" what you actually need, rather than > the "fullRestart" count? If so, I believe we will have such a counter > metric in 1.10, since the previous "fullRestart" metric value is not the > number of restarts when grained recovery (feature added 1.9.0) is enabled. > "fullRestart" reveals how many times entire job graph has been > restarted. If grained recovery (feature added 1.9.0) is enabled, the graph > would not be restarted when task failures happen and the "fullRestart" > value will not increment in such cases. > > I'd appreciate if you can help with these questions and we can make better > decisions for Flink. > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月22日周日 上午3:31写道: > >> Zhu Zhu, >> >> Flink fullRestart metric is a Gauge, which is not good for alerting on. >> We publish an equivalent Counter metric for alerting purpose. >> >> Thanks, >> Steven >> >> On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: >> >>> Thanks Steven for the feedback! >>> Could you share more information about the metrics you add in you >>> customized restart strategy? >>> >>> Thanks, >>> Zhu Zhu >>> >>> Steven Wu 于2019年9月20日周五 上午7:11写道: >>> We do use config like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional metrics than the Flink provided ones. On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > Thanks everyone for the input. > > The RestartStrategy customization is not recognized as a public > interface as it is not explicitly documented. > As it is not used from the feedbacks of this survey, I'll conclude > that we do not need to support customized RestartStrategy for the new > scheduler in Flink 1.10 > > Other usages are still supported, including all the strategies and > configuring ways described in > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies > . > > Feel free to share in this thread if you has any concern for it. > > Thanks, > Zhu Zhu > > Zhu Zhu 于2019年9月12日周四 下午10:33写道: > >> Thanks Oytun for the reply! >> >> Sorry for not have stated it clearly. When saying "customized >> RestartStrategy", we mean that users implement an >> *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by >> themselves and use it by configuring like "restart-strategy: >> org.foobar.MyRestartStrategyFactoryFactory". >> >> The usage of restart strategies you mentioned will keep working with >> the new scheduler. >> >> Thanks, >> Zhu Zhu >> >> Oytun Tez 于2019年9月12日周四 下午10:05写道: >> >>> Hi Zhu, >>> >>> We are using custom restart strategy like this: >>> >>> environment.setRestartStrategy(failureRateRestart(2, >>> Time.minutes(1), Time.minutes(10))); >>> >>> >>> --- >>> Oytun Tez >>> >>> *M O T A W O R D* >>> The World's Fastest Human Translation Platform. >>> oy...@motaword.com — www.motaword.com >>> >>> >>> On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: >>> Hi everyone, I wanted to reach out to you and ask how many of you are using a customized RestartStrategy[1] in production jobs. We are currently developing the new Flink scheduler[2] which interacts with restart strategies in a different way. We have to re-design the interfaces for the new restart strategies (so called RestartBackoffTimeStrategy). Existing customized RestartStrategy will not work any more with the new scheduler. We want to know whether we should keep the way to customized RestartBackoffTimeStrategy so that existing customized RestartStrategy can be migrated. I'd appreciate if you can share the status if you are using customized RestartStrategy. That
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Steven, Thanks for the information. If we can determine this a common issue, we can solve it in Flink core. To get to that state, I have two questions which need your help: 1. Why is gauge not good for alerting? The metric "fullRestart" is a Gauge. Does the metric reporter you use report Counter and Gauge to external services in different ways? Or anything else can be different due to the metric type? 2. Is the "number of restarts" what you actually need, rather than the "fullRestart" count? If so, I believe we will have such a counter metric in 1.10, since the previous "fullRestart" metric value is not the number of restarts when grained recovery (feature added 1.9.0) is enabled. "fullRestart" reveals how many times entire job graph has been restarted. If grained recovery (feature added 1.9.0) is enabled, the graph would not be restarted when task failures happen and the "fullRestart" value will not increment in such cases. I'd appreciate if you can help with these questions and we can make better decisions for Flink. Thanks, Zhu Zhu Steven Wu 于2019年9月22日周日 上午3:31写道: > Zhu Zhu, > > Flink fullRestart metric is a Gauge, which is not good for alerting on. We > publish an equivalent Counter metric for alerting purpose. > > Thanks, > Steven > > On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: > >> Thanks Steven for the feedback! >> Could you share more information about the metrics you add in you >> customized restart strategy? >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月20日周五 上午7:11写道: >> >>> We do use config like "restart-strategy: >>> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional >>> metrics than the Flink provided ones. >>> >>> On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: >>> Thanks everyone for the input. The RestartStrategy customization is not recognized as a public interface as it is not explicitly documented. As it is not used from the feedbacks of this survey, I'll conclude that we do not need to support customized RestartStrategy for the new scheduler in Flink 1.10 Other usages are still supported, including all the strategies and configuring ways described in https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies . Feel free to share in this thread if you has any concern for it. Thanks, Zhu Zhu Zhu Zhu 于2019年9月12日周四 下午10:33写道: > Thanks Oytun for the reply! > > Sorry for not have stated it clearly. When saying "customized > RestartStrategy", we mean that users implement an > *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by > themselves and use it by configuring like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". > > The usage of restart strategies you mentioned will keep working with > the new scheduler. > > Thanks, > Zhu Zhu > > Oytun Tez 于2019年9月12日周四 下午10:05写道: > >> Hi Zhu, >> >> We are using custom restart strategy like this: >> >> environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), >> Time.minutes(10))); >> >> >> --- >> Oytun Tez >> >> *M O T A W O R D* >> The World's Fastest Human Translation Platform. >> oy...@motaword.com — www.motaword.com >> >> >> On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: >> >>> Hi everyone, >>> >>> I wanted to reach out to you and ask how many of you are using a >>> customized RestartStrategy[1] in production jobs. >>> >>> We are currently developing the new Flink scheduler[2] which >>> interacts with restart strategies in a different way. We have to >>> re-design >>> the interfaces for the new restart strategies (so called >>> RestartBackoffTimeStrategy). Existing customized RestartStrategy will >>> not >>> work any more with the new scheduler. >>> >>> We want to know whether we should keep the way >>> to customized RestartBackoffTimeStrategy so that existing customized >>> RestartStrategy can be migrated. >>> >>> I'd appreciate if you can share the status if you are >>> using customized RestartStrategy. That will be valuable for use to make >>> decisions. >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies >>> [2] https://issues.apache.org/jira/browse/FLINK-10429 >>> >>> Thanks, >>> Zhu Zhu >>> >>
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Zhu Zhu, Flink fullRestart metric is a Gauge, which is not good for alerting on. We publish an equivalent Counter metric for alerting purpose. Thanks, Steven On Thu, Sep 19, 2019 at 7:45 PM Zhu Zhu wrote: > Thanks Steven for the feedback! > Could you share more information about the metrics you add in you > customized restart strategy? > > Thanks, > Zhu Zhu > > Steven Wu 于2019年9月20日周五 上午7:11写道: > >> We do use config like "restart-strategy: >> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional >> metrics than the Flink provided ones. >> >> On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: >> >>> Thanks everyone for the input. >>> >>> The RestartStrategy customization is not recognized as a public >>> interface as it is not explicitly documented. >>> As it is not used from the feedbacks of this survey, I'll conclude that >>> we do not need to support customized RestartStrategy for the new scheduler >>> in Flink 1.10 >>> >>> Other usages are still supported, including all the strategies and >>> configuring ways described in >>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies >>> . >>> >>> Feel free to share in this thread if you has any concern for it. >>> >>> Thanks, >>> Zhu Zhu >>> >>> Zhu Zhu 于2019年9月12日周四 下午10:33写道: >>> Thanks Oytun for the reply! Sorry for not have stated it clearly. When saying "customized RestartStrategy", we mean that users implement an *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by themselves and use it by configuring like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". The usage of restart strategies you mentioned will keep working with the new scheduler. Thanks, Zhu Zhu Oytun Tez 于2019年9月12日周四 下午10:05写道: > Hi Zhu, > > We are using custom restart strategy like this: > > environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), > Time.minutes(10))); > > > --- > Oytun Tez > > *M O T A W O R D* > The World's Fastest Human Translation Platform. > oy...@motaword.com — www.motaword.com > > > On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: > >> Hi everyone, >> >> I wanted to reach out to you and ask how many of you are using a >> customized RestartStrategy[1] in production jobs. >> >> We are currently developing the new Flink scheduler[2] which >> interacts with restart strategies in a different way. We have to >> re-design >> the interfaces for the new restart strategies (so called >> RestartBackoffTimeStrategy). Existing customized RestartStrategy will not >> work any more with the new scheduler. >> >> We want to know whether we should keep the way >> to customized RestartBackoffTimeStrategy so that existing customized >> RestartStrategy can be migrated. >> >> I'd appreciate if you can share the status if you are >> using customized RestartStrategy. That will be valuable for use to make >> decisions. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies >> [2] https://issues.apache.org/jira/browse/FLINK-10429 >> >> Thanks, >> Zhu Zhu >> >
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Thanks Steven for the feedback! Could you share more information about the metrics you add in you customized restart strategy? Thanks, Zhu Zhu Steven Wu 于2019年9月20日周五 上午7:11写道: > We do use config like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional > metrics than the Flink provided ones. > > On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > >> Thanks everyone for the input. >> >> The RestartStrategy customization is not recognized as a public interface >> as it is not explicitly documented. >> As it is not used from the feedbacks of this survey, I'll conclude that >> we do not need to support customized RestartStrategy for the new scheduler >> in Flink 1.10 >> >> Other usages are still supported, including all the strategies and >> configuring ways described in >> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies >> . >> >> Feel free to share in this thread if you has any concern for it. >> >> Thanks, >> Zhu Zhu >> >> Zhu Zhu 于2019年9月12日周四 下午10:33写道: >> >>> Thanks Oytun for the reply! >>> >>> Sorry for not have stated it clearly. When saying "customized >>> RestartStrategy", we mean that users implement an >>> *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by >>> themselves and use it by configuring like "restart-strategy: >>> org.foobar.MyRestartStrategyFactoryFactory". >>> >>> The usage of restart strategies you mentioned will keep working with the >>> new scheduler. >>> >>> Thanks, >>> Zhu Zhu >>> >>> Oytun Tez 于2019年9月12日周四 下午10:05写道: >>> Hi Zhu, We are using custom restart strategy like this: environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), Time.minutes(10))); --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: > Hi everyone, > > I wanted to reach out to you and ask how many of you are using a > customized RestartStrategy[1] in production jobs. > > We are currently developing the new Flink scheduler[2] which interacts > with restart strategies in a different way. We have to re-design the > interfaces for the new restart strategies (so called > RestartBackoffTimeStrategy). Existing customized RestartStrategy will not > work any more with the new scheduler. > > We want to know whether we should keep the way > to customized RestartBackoffTimeStrategy so that existing customized > RestartStrategy can be migrated. > > I'd appreciate if you can share the status if you are using customized > RestartStrategy. That will be valuable for use to make decisions. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies > [2] https://issues.apache.org/jira/browse/FLINK-10429 > > Thanks, > Zhu Zhu >
Re: [SURVEY] How many people are using customized RestartStrategy(s)
We do use config like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional metrics than the Flink provided ones. On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote: > Thanks everyone for the input. > > The RestartStrategy customization is not recognized as a public interface > as it is not explicitly documented. > As it is not used from the feedbacks of this survey, I'll conclude that we > do not need to support customized RestartStrategy for the new scheduler in > Flink 1.10 > > Other usages are still supported, including all the strategies and > configuring ways described in > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies > . > > Feel free to share in this thread if you has any concern for it. > > Thanks, > Zhu Zhu > > Zhu Zhu 于2019年9月12日周四 下午10:33写道: > >> Thanks Oytun for the reply! >> >> Sorry for not have stated it clearly. When saying "customized >> RestartStrategy", we mean that users implement an >> *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by >> themselves and use it by configuring like "restart-strategy: >> org.foobar.MyRestartStrategyFactoryFactory". >> >> The usage of restart strategies you mentioned will keep working with the >> new scheduler. >> >> Thanks, >> Zhu Zhu >> >> Oytun Tez 于2019年9月12日周四 下午10:05写道: >> >>> Hi Zhu, >>> >>> We are using custom restart strategy like this: >>> >>> environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), >>> Time.minutes(10))); >>> >>> >>> --- >>> Oytun Tez >>> >>> *M O T A W O R D* >>> The World's Fastest Human Translation Platform. >>> oy...@motaword.com — www.motaword.com >>> >>> >>> On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: >>> Hi everyone, I wanted to reach out to you and ask how many of you are using a customized RestartStrategy[1] in production jobs. We are currently developing the new Flink scheduler[2] which interacts with restart strategies in a different way. We have to re-design the interfaces for the new restart strategies (so called RestartBackoffTimeStrategy). Existing customized RestartStrategy will not work any more with the new scheduler. We want to know whether we should keep the way to customized RestartBackoffTimeStrategy so that existing customized RestartStrategy can be migrated. I'd appreciate if you can share the status if you are using customized RestartStrategy. That will be valuable for use to make decisions. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies [2] https://issues.apache.org/jira/browse/FLINK-10429 Thanks, Zhu Zhu >>>
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Thanks everyone for the input. The RestartStrategy customization is not recognized as a public interface as it is not explicitly documented. As it is not used from the feedbacks of this survey, I'll conclude that we do not need to support customized RestartStrategy for the new scheduler in Flink 1.10 Other usages are still supported, including all the strategies and configuring ways described in https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies . Feel free to share in this thread if you has any concern for it. Thanks, Zhu Zhu Zhu Zhu 于2019年9月12日周四 下午10:33写道: > Thanks Oytun for the reply! > > Sorry for not have stated it clearly. When saying "customized > RestartStrategy", we mean that users implement an > *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by > themselves and use it by configuring like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". > > The usage of restart strategies you mentioned will keep working with the > new scheduler. > > Thanks, > Zhu Zhu > > Oytun Tez 于2019年9月12日周四 下午10:05写道: > >> Hi Zhu, >> >> We are using custom restart strategy like this: >> >> environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), >> Time.minutes(10))); >> >> >> --- >> Oytun Tez >> >> *M O T A W O R D* >> The World's Fastest Human Translation Platform. >> oy...@motaword.com — www.motaword.com >> >> >> On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: >> >>> Hi everyone, >>> >>> I wanted to reach out to you and ask how many of you are using a >>> customized RestartStrategy[1] in production jobs. >>> >>> We are currently developing the new Flink scheduler[2] which interacts >>> with restart strategies in a different way. We have to re-design the >>> interfaces for the new restart strategies (so called >>> RestartBackoffTimeStrategy). Existing customized RestartStrategy will not >>> work any more with the new scheduler. >>> >>> We want to know whether we should keep the way >>> to customized RestartBackoffTimeStrategy so that existing customized >>> RestartStrategy can be migrated. >>> >>> I'd appreciate if you can share the status if you are using customized >>> RestartStrategy. That will be valuable for use to make decisions. >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies >>> [2] https://issues.apache.org/jira/browse/FLINK-10429 >>> >>> Thanks, >>> Zhu Zhu >>> >>
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Thanks Oytun for the reply! Sorry for not have stated it clearly. When saying "customized RestartStrategy", we mean that users implement an *org.apache.flink.runtime.executiongraph.restart.RestartStrategy* by themselves and use it by configuring like "restart-strategy: org.foobar.MyRestartStrategyFactoryFactory". The usage of restart strategies you mentioned will keep working with the new scheduler. Thanks, Zhu Zhu Oytun Tez 于2019年9月12日周四 下午10:05写道: > Hi Zhu, > > We are using custom restart strategy like this: > > environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), > Time.minutes(10))); > > > --- > Oytun Tez > > *M O T A W O R D* > The World's Fastest Human Translation Platform. > oy...@motaword.com — www.motaword.com > > > On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: > >> Hi everyone, >> >> I wanted to reach out to you and ask how many of you are using a >> customized RestartStrategy[1] in production jobs. >> >> We are currently developing the new Flink scheduler[2] which interacts >> with restart strategies in a different way. We have to re-design the >> interfaces for the new restart strategies (so called >> RestartBackoffTimeStrategy). Existing customized RestartStrategy will not >> work any more with the new scheduler. >> >> We want to know whether we should keep the way >> to customized RestartBackoffTimeStrategy so that existing customized >> RestartStrategy can be migrated. >> >> I'd appreciate if you can share the status if you are using customized >> RestartStrategy. That will be valuable for use to make decisions. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies >> [2] https://issues.apache.org/jira/browse/FLINK-10429 >> >> Thanks, >> Zhu Zhu >> >
Re: [SURVEY] How many people are using customized RestartStrategy(s)
Hi Zhu, We are using custom restart strategy like this: environment.setRestartStrategy(failureRateRestart(2, Time.minutes(1), Time.minutes(10))); --- Oytun Tez *M O T A W O R D* The World's Fastest Human Translation Platform. oy...@motaword.com — www.motaword.com On Thu, Sep 12, 2019 at 7:11 AM Zhu Zhu wrote: > Hi everyone, > > I wanted to reach out to you and ask how many of you are using a > customized RestartStrategy[1] in production jobs. > > We are currently developing the new Flink scheduler[2] which interacts > with restart strategies in a different way. We have to re-design the > interfaces for the new restart strategies (so called > RestartBackoffTimeStrategy). Existing customized RestartStrategy will not > work any more with the new scheduler. > > We want to know whether we should keep the way > to customized RestartBackoffTimeStrategy so that existing customized > RestartStrategy can be migrated. > > I'd appreciate if you can share the status if you are using customized > RestartStrategy. That will be valuable for use to make decisions. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies > [2] https://issues.apache.org/jira/browse/FLINK-10429 > > Thanks, > Zhu Zhu >
[SURVEY] How many people are using customized RestartStrategy(s)
Hi everyone, I wanted to reach out to you and ask how many of you are using a customized RestartStrategy[1] in production jobs. We are currently developing the new Flink scheduler[2] which interacts with restart strategies in a different way. We have to re-design the interfaces for the new restart strategies (so called RestartBackoffTimeStrategy). Existing customized RestartStrategy will not work any more with the new scheduler. We want to know whether we should keep the way to customized RestartBackoffTimeStrategy so that existing customized RestartStrategy can be migrated. I'd appreciate if you can share the status if you are using customized RestartStrategy. That will be valuable for use to make decisions. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/task_failure_recovery.html#restart-strategies [2] https://issues.apache.org/jira/browse/FLINK-10429 Thanks, Zhu Zhu