[jira] [Commented] (FLINK-20390) Programmatic access to the back-pressure

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17336002#comment-17336002
 ] 

Flink Jira Bot commented on FLINK-20390:


This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Programmatic access to the back-pressure
> 
>
> Key: FLINK-20390
> URL: https://issues.apache.org/jira/browse/FLINK-20390
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Core
>Reporter: Gaël Renoux
>Priority: Major
>  Labels: stale-major
>
> It would be useful to access the back-pressure monitoring from within 
> functions.
> Here is our use case: we have a real-time Flink job, which takes decisions 
> based on input data. Sometimes, we have traffic spikes on the input and the 
> decisions process cannot processe records fast enough. Back-pressure starts 
> mounting, all the way back to the Source. What we want to do is to start 
> dropping records in this case, because it's better to make decisions based on 
> just a sample of the data rather than accumulate too much lag.
> Right now, the only way is to have a filter with a hard-limit on the number 
> of records per-interval-of-time, and to drop records once we are over this 
> limit. However, this requires a lot of tuning to find out what the correct 
> limit is, especially since it may depend on the nature of the inputs (some 
> decisions take longer to make than others). It's also heavily dependent on 
> the buffers: the limit needs to be low enough that all records that pass the 
> limit can fit in the downstream buffers, or the back-pressure will will go 
> back past the filtering task and we're back to square one. Finally, it's not 
> very resilient to change: whenever we scale the infrastructure up, we need to 
> redo the whole tuning thing.
> With programmatic access to the back-pressure, we could simply start dropping 
> records based on its current level. No tuning, and adjusted to the actual 
> issue. For performance, I assume it would be better if it reused the existing 
> back-pressure monitoring mechanism, rather than looking directly into the 
> buffer. A sampling of the back-pressure should be enough, and if more 
> precision is needed you can simply change the existing back-pressure 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20390) Programmatic access to the back-pressure

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17327505#comment-17327505
 ] 

Flink Jira Bot commented on FLINK-20390:


This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Programmatic access to the back-pressure
> 
>
> Key: FLINK-20390
> URL: https://issues.apache.org/jira/browse/FLINK-20390
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Core
>Reporter: Gaël Renoux
>Priority: Major
>  Labels: stale-major
>
> It would be useful to access the back-pressure monitoring from within 
> functions.
> Here is our use case: we have a real-time Flink job, which takes decisions 
> based on input data. Sometimes, we have traffic spikes on the input and the 
> decisions process cannot processe records fast enough. Back-pressure starts 
> mounting, all the way back to the Source. What we want to do is to start 
> dropping records in this case, because it's better to make decisions based on 
> just a sample of the data rather than accumulate too much lag.
> Right now, the only way is to have a filter with a hard-limit on the number 
> of records per-interval-of-time, and to drop records once we are over this 
> limit. However, this requires a lot of tuning to find out what the correct 
> limit is, especially since it may depend on the nature of the inputs (some 
> decisions take longer to make than others). It's also heavily dependent on 
> the buffers: the limit needs to be low enough that all records that pass the 
> limit can fit in the downstream buffers, or the back-pressure will will go 
> back past the filtering task and we're back to square one. Finally, it's not 
> very resilient to change: whenever we scale the infrastructure up, we need to 
> redo the whole tuning thing.
> With programmatic access to the back-pressure, we could simply start dropping 
> records based on its current level. No tuning, and adjusted to the actual 
> issue. For performance, I assume it would be better if it reused the existing 
> back-pressure monitoring mechanism, rather than looking directly into the 
> buffer. A sampling of the back-pressure should be enough, and if more 
> precision is needed you can simply change the existing back-pressure 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20390) Programmatic access to the back-pressure

2021-01-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/FLINK-20390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267392#comment-17267392
 ] 

Gaël Renoux commented on FLINK-20390:
-

Most of the time it's a Kafka input. However, you could also want to make the 
decision to drop messages or not further down the line, not necessarily in the 
source itself, so you wouldn't have access to the lag anymore.

> Programmatic access to the back-pressure
> 
>
> Key: FLINK-20390
> URL: https://issues.apache.org/jira/browse/FLINK-20390
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Core
>Reporter: Gaël Renoux
>Priority: Major
>
> It would be useful to access the back-pressure monitoring from within 
> functions.
> Here is our use case: we have a real-time Flink job, which takes decisions 
> based on input data. Sometimes, we have traffic spikes on the input and the 
> decisions process cannot processe records fast enough. Back-pressure starts 
> mounting, all the way back to the Source. What we want to do is to start 
> dropping records in this case, because it's better to make decisions based on 
> just a sample of the data rather than accumulate too much lag.
> Right now, the only way is to have a filter with a hard-limit on the number 
> of records per-interval-of-time, and to drop records once we are over this 
> limit. However, this requires a lot of tuning to find out what the correct 
> limit is, especially since it may depend on the nature of the inputs (some 
> decisions take longer to make than others). It's also heavily dependent on 
> the buffers: the limit needs to be low enough that all records that pass the 
> limit can fit in the downstream buffers, or the back-pressure will will go 
> back past the filtering task and we're back to square one. Finally, it's not 
> very resilient to change: whenever we scale the infrastructure up, we need to 
> redo the whole tuning thing.
> With programmatic access to the back-pressure, we could simply start dropping 
> records based on its current level. No tuning, and adjusted to the actual 
> issue. For performance, I assume it would be better if it reused the existing 
> back-pressure monitoring mechanism, rather than looking directly into the 
> buffer. A sampling of the back-pressure should be enough, and if more 
> precision is needed you can simply change the existing back-pressure 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20390) Programmatic access to the back-pressure

2020-11-27 Thread Steven Zhen Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239806#comment-17239806
 ] 

Steven Zhen Wu commented on FLINK-20390:


not sure what is the input/source. If it is Kafka, would Kafka consumer lag 
(either in the number of msgs or in the wall clock time) be a good trigger. if 
it is lagging too much, it means that the job can't keep up with the input 
load. Then start to drop/sample msgs.

> Programmatic access to the back-pressure
> 
>
> Key: FLINK-20390
> URL: https://issues.apache.org/jira/browse/FLINK-20390
> Project: Flink
>  Issue Type: New Feature
>  Components: API / Core
>Reporter: Gaël Renoux
>Priority: Major
>
> It would be useful to access the back-pressure monitoring from within 
> functions.
> Here is our use case: we have a real-time Flink job, which takes decisions 
> based on input data. Sometimes, we have traffic spikes on the input and the 
> decisions process cannot processe records fast enough. Back-pressure starts 
> mounting, all the way back to the Source. What we want to do is to start 
> dropping records in this case, because it's better to make decisions based on 
> just a sample of the data rather than accumulate too much lag.
> Right now, the only way is to have a filter with a hard-limit on the number 
> of records per-interval-of-time, and to drop records once we are over this 
> limit. However, this requires a lot of tuning to find out what the correct 
> limit is, especially since it may depend on the nature of the inputs (some 
> decisions take longer to make than others). It's also heavily dependent on 
> the buffers: the limit needs to be low enough that all records that pass the 
> limit can fit in the downstream buffers, or the back-pressure will will go 
> back past the filtering task and we're back to square one. Finally, it's not 
> very resilient to change: whenever we scale the infrastructure up, we need to 
> redo the whole tuning thing.
> With programmatic access to the back-pressure, we could simply start dropping 
> records based on its current level. No tuning, and adjusted to the actual 
> issue. For performance, I assume it would be better if it reused the existing 
> back-pressure monitoring mechanism, rather than looking directly into the 
> buffer. A sampling of the back-pressure should be enough, and if more 
> precision is needed you can simply change the existing back-pressure 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)