[ https://issues.apache.org/jira/browse/FLINK-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529727#comment-16529727 ]
ASF GitHub Bot commented on FLINK-9636: --------------------------------------- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/6238 [FLINK-9636][network] fix inconsistency with failed buffer redistribution ## What is the purpose of the change If an exception is thrown in `NetworkBufferPool#requestMemorySegments()`'s first call to `redistributeBuffers()`, the accounting for `numTotalRequiredBuffers` is wrong for future uses of this buffer pool. ## Brief change log - fix accounting of `NetworkBufferPool#numTotalRequiredBuffers` during failures in `NetworkBufferPool#requestMemorySegments()` - fix some checkstyle warnings - add a few more checks around buffer/memory segment recycling ## Verifying this change This change added tests and can be verified as follows: - added `NetworkBufferPoolTest#testRequestMemorySegmentsExceptionDuringBufferRedistribution()` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **JavaDocs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-9636 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6238 ---- commit b3e03fc93f8daa34d81b822a7cd56c353e88c4f7 Author: Nico Kruber <nico@...> Date: 2018-07-02T09:50:47Z [hotfix][network] checkstyle commit abc724d2ea3acc8ff1e5bf4137ed1fccb6d61372 Author: Nico Kruber <nico@...> Date: 2018-07-02T09:51:09Z [FLINK-9636][network] fix inconsistency with failed buffer redistribution commit 702b664d1b5b7525a620fbc0ef8121f4df882279 Author: Nico Kruber <nico@...> Date: 2018-07-02T10:42:40Z [hotfix][network] add a few more checks and tags ---- > Network buffer leaks in requesting a batch of segments during canceling > ----------------------------------------------------------------------- > > Key: FLINK-9636 > URL: https://issues.apache.org/jira/browse/FLINK-9636 > Project: Flink > Issue Type: Bug > Components: Network > Affects Versions: 1.5.0, 1.6.0 > Reporter: zhijiang > Priority: Major > Labels: pull-request-available > Fix For: 1.5.1 > > > In {{NetworkBufferPool#requestMemorySegments}}, {{numTotalRequiredBuffers}} > is increased by {{numRequiredBuffers}} first. > If {{InterruptedException}} is thrown during polling segments from the > available queue, the requested segments will be recycled back to > {{NetworkBufferPool}}, {{numTotalRequiredBuffers}} is decreased by the number > of polled segments which is now inconsistent with {{numRequiredBuffers}}. So > {{numTotalRequiredBuffers}} in {{NetworkBufferPool}} leaks in this case, and > we can also decrease {{numRequiredBuffers}} to fix this bug. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)