[ https://issues.apache.org/jira/browse/FLINK-20390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239806#comment-17239806 ]
Steven Zhen Wu commented on FLINK-20390: ---------------------------------------- not sure what is the input/source. If it is Kafka, would Kafka consumer lag (either in the number of msgs or in the wall clock time) be a good trigger. if it is lagging too much, it means that the job can't keep up with the input load. Then start to drop/sample msgs. > Programmatic access to the back-pressure > ---------------------------------------- > > Key: FLINK-20390 > URL: https://issues.apache.org/jira/browse/FLINK-20390 > Project: Flink > Issue Type: New Feature > Components: API / Core > Reporter: Gaël Renoux > Priority: Major > > It would be useful to access the back-pressure monitoring from within > functions. > Here is our use case: we have a real-time Flink job, which takes decisions > based on input data. Sometimes, we have traffic spikes on the input and the > decisions process cannot processe records fast enough. Back-pressure starts > mounting, all the way back to the Source. What we want to do is to start > dropping records in this case, because it's better to make decisions based on > just a sample of the data rather than accumulate too much lag. > Right now, the only way is to have a filter with a hard-limit on the number > of records per-interval-of-time, and to drop records once we are over this > limit. However, this requires a lot of tuning to find out what the correct > limit is, especially since it may depend on the nature of the inputs (some > decisions take longer to make than others). It's also heavily dependent on > the buffers: the limit needs to be low enough that all records that pass the > limit can fit in the downstream buffers, or the back-pressure will will go > back past the filtering task and we're back to square one. Finally, it's not > very resilient to change: whenever we scale the infrastructure up, we need to > redo the whole tuning thing. > With programmatic access to the back-pressure, we could simply start dropping > records based on its current level. No tuning, and adjusted to the actual > issue. For performance, I assume it would be better if it reused the existing > back-pressure monitoring mechanism, rather than looking directly into the > buffer. A sampling of the back-pressure should be enough, and if more > precision is needed you can simply change the existing back-pressure > configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)