GitHub user YuvalItzchakov opened a pull request: https://github.com/apache/spark/pull/21997
[SPARK-24987][SS] - Fix Kafka consumer leak when no new offsets for TopicPartition ## What changes were proposed in this pull request? This small fix adds a `consumer.release()` call to `KafkaSourceRDD` in the case where we've retrieved offsets from Kafka, but the `fromOffset` is equal to the `lastOffset`, meaning there is no new data to read for a particular topic partition. Up until now, we'd just return an empty iterator without closing the consumer which would cause a FD leak. If accepted, this pull request should be merged into master as well. ## How was this patch tested? Haven't ran any specific tests, would love help on how to test methods running inside `RDD.compute`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/YuvalItzchakov/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21997 ---- commit c20bd14a4bed34644efc11de420a1caeccea329e Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2017-08-26T15:21:17Z Avoid using "return" inside `CachedKafkaConsumer.get` as it is passed to `org.apache.spark.util.UninterruptibleThread.runUninterruptibly` as a function type which causes a NonLocalReturnControl to be called for every call commit 18b9301553427a7b6c038e144f1be52949d82eb9 Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2017-08-27T07:58:01Z Comments after code review commit 46af335ed42371c4bd200e63c9bec351ddcb112e Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2018-08-02T17:51:41Z Merge remote-tracking branch 'upstream/master' commit 059b47a9a62a4630cfd1f43d4e3de41989adfd1b Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2018-08-04T07:23:39Z Merge remote-tracking branch 'origin/master' commit 2b43146ff0155301cad403605f15171a8c6a9149 Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2018-08-04T07:24:24Z Fixes SPARK-24987. Kafka consumer wasn't released when `fromOffset` was equal to `toOffset`. commit 7558d422ae24daf9d3cffc43b5ef3d975c4c9d3a Author: Yuval Itzchakov <yuval.itzchakov@...> Date: 2018-08-04T07:28:26Z Merge remote-tracking branch 'upstream/master' ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org