GitHub user YuvalItzchakov opened a pull request:
https://github.com/apache/spark/pull/21983
SPARK-24987 - Fix Kafka consumer leak when no new offsets for TopicPartition
## What changes were proposed in this pull request?
This small fix adds a `consumer.release()` call to `KafkaSourceRDD` in the
case where we've retrieved offsets from Kafka, but the `fromOffset` is equal to
the `lastOffset`, meaning there is no new data to read for a particular topic.
Up until now, we'd just return an empty iterator without closing the consumer
which would cause a FD leak.
If accepted, this pull request should be merged into master as well.
## How was this patch tested?
Haven't ran any specific tests, would love help on how to test methods
running inside `RDD.compute`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/YuvalItzchakov/spark branch-2.3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21983.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21983
----
commit 4e366f8be24373de98908465070f718cbabd3790
Author: Yuval Itzchakov <yuval.itzchakov@...>
Date: 2018-08-02T18:12:08Z
Fixes SPARK-24987. Kafka consumer wasn't released when `fromOffset` was
equal to `toOffset`.
commit e5db69f291bd099bee38d3b555b0c040ef942f29
Author: Yuval Itzchakov <yuval.itzchakov@...>
Date: 2018-08-03T09:41:47Z
Merge remote-tracking branch 'upstream/branch-2.3' into branch-2.3
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]