cloventt opened a new pull request, #12842:
URL: https://github.com/apache/druid/pull/12842
### Description
The Kafka lookup extractor has to consume an entire topic from the beginning
in order to build the internal lookup map. Previously, the extractor would
always use a randomly generated Kafka `group.id`. This meant that the service
would register a new consumer group every time it started, essentially
"forgetting" it's previously committed consumer offsets. This guarantees that
the service will always consume the entire topic.
This has the unintended side-effect of also leaving a lot of "ghost"
consumers registered with the Kafka cluster. These consumer groups will never
be used again and so they just hang around on the broker until Kafka decides to
delete them (by default, after 2 days). This needlessly adds bloat to the Kafka
broker.
This has been fixed by setting the Kafka consumer config
`enable.auto.commit` to `false`. This means that the consumer never attempts to
commit offsets, achieving the same result as before without leaving a bunch of
"ghost" consumer groups registered on the broker.
I also took the chance to flesh out the documentation a whole bunch.
<hr>
##### Key changed/added classes in this PR
* `org.apache.druid.query.lookup.KafkaLookupExtractorFactory`
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]