wsry opened a new pull request #10083: [FLINK-14472][runtime]Implement back-pressure monitor with non-blocking outputs. URL: https://github.com/apache/flink/pull/10083 ## What is the purpose of the change Currently back-pressure monitor relies on detecting task threads that are stuck in `requestBufferBuilderBlocking`. There are actually two cases to cause back-pressure ATM: - There are no available buffers in `LocalBufferPool` and all the given quotas from global pool are also exhausted. Then we need to wait for buffer recycling to `LocalBufferPool`. - No available buffers in `LocalBufferPool`, but the quota has not been used up. While requesting buffer from global pool, it is blocked because of no available buffers in global pool. Then we need to wait for buffer recycling to global pool. We try to implement the non-blocking network output in FLINK-14396, so the back pressure monitor should be adjusted accordingly after the non-blocking output is used in practice. In this PR, we implement a new back pressure monitor which monitors the task back pressure by checking the availability of ResultPartitionWriter, e.g. if there are available free buffers in the BufferPool of ResultPartitions for output. ## Brief change log - A new back pressure tracker was implemented which monitors the task back pressure by checking the availability of ResultPartitionWriter, e.g. if there are available free buffers in the BufferPool of ResultPartitions for output. - The old stack sampling based back pressure tracker implementation and relevant code were removed. - New test cases were added to verify the changes. ## Verifying this change Several new test cases are added to verify the changes, including ```BackPressureStatsTrackerImplTest```, ```BackPressureSampleCoordinatorTest```, ```TaskBackPressureSampleServiceTest```, ```TaskTest#testNoBackPressureIfTaskNotStarted```, ```TaskExecutorSubmissionTest#testSampleTaskBackPressure```. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services