Re: What are the biggest issues with Apache Kafka?

Divij Vaidya Fri, 11 Aug 2023 03:11:09 -0700

Hey Liam

Thanks for asking this question. I have been meaning to write a post to the
community for a long time about potential open areas where newcomers can
contribute but it never made it to priority in my to-do list.

In addition to what others mentioned above, here's a couple of options to
pick from. It's not an exhaustive list and I would be able to help more if
you tell me what you folks are interested in working on (e.g. on server,
client side, streams etc.) and what is the current familiarity with Kafka
code base. I can personally provide rapid reviews for option 1 and option
3, since those are the ones I feel most passionate about, but can't promise
time commitment from my side for other options.

*Option 1: KIP-405 (Tiered Storage) related work*

We are targeting an early access [1] release for KIP-405 [2] (tiered
storage in Kafka) for the upcoming version in 3.6. There is loads of work
left to polish this feature and make it production ready. If you like, you
can help over there. You can pick up any "unassigned" ticket from
https://issues.apache.org/jira/browse/KAFKA-7739 OR pick up a ticket where
the assigned person hasn't provided an update in the last 1 month.

*Option 2: Metrics related work*

We currently use two different ways of capturing metrics on the
broker/server. Historically we started with Yammer, moved to using
KafkaMetrics starting on clients but more recently we started using
KafkaMetrics on broker too. Currently the majority of broker metrics use
Yammer (which has it's own set of problems such as we are using a 10 year
old library) but the alternative KafkaMetrics has a slow histogram [2].
Here's a recent discussion about this:
https://lists.apache.org/thread/jww851jcyjtsq010bbt81b5dgwzqrgwx and
https://lists.apache.org/thread/f5wknqhmoo5lml99np7ksocz7fyk3m0r. You will
find that on the broker, KafkaRaftMetrics uses KafkaMetrics but
QuorumControllerMetrics uses Yammer metrics.We need someone in the
community pick up unifying this so that we can start using only one
methodology moving ahead. My recommendation would be to upgrade the library
of Yammer to use the latest drop wizard library as proposed in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-510%3A+Metrics+library+upgrade
but there are backwarrd compatibility problems associated with it. My
colleague Christo has done some digging in the past on this and found that
the major problem of completing KIP-510 comes from the usage of
https://github.com/xvrl/kafka/blob/01208fd218286d2cd318a891f2cb5883422283b1/core/src/main/java/kafka/metrics/FilteringJmxReporter.java
introduced in KIP-544. This functionality is no longer directly available
in Dropwizard 4.2.0.
Can you dig more into this and see if there is a way to upgrade without
impacting backward compatibility?

To summarise option 2, we have the following problems:
1. We use 10 year old version of a library for capturing yammer metrics
2. Histogram calculation in metrics is very expensive. See:
https://issues.apache.org/jira/browse/KAFKA-15192?focusedCommentId=17744169&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17744169

3. KafkaMetrics library and Yammer metrics both have downsides as captured
in https://issues.apache.org/jira/browse/KAFKA-15058,
https://issues.apache.org/jira/browse/KAFKA-15154 and

*Option 3: Zero copy over SSL*

This is more of a personal project which I am not getting time to finish
up. Today zero copy doesn't have SSL enabled in Kafka. However, there is a
path forward on newer linux kernels by using kTLS. My idea is to have Kafka
use dynamically bound openssl (>=3.0) via netty-tcnative. Openssl 3.0 and
above can be compiled with the ability to enable kTLS. Hence, it should be
possible to use Kafka + netty-tcnative + openSSL compiled with ktls flag on
the OS to enable zero-copy even for SSL workloads. I can fill you in if
this is something that you are interested in pursuing.

*Option 4: Getting rid of easy mock & power mock dependencies from Kafka*

We have been making slow and steady progress towards achieving this goal
and it is being tracked in https://issues.apache.org/jira/browse/KAFKA-7438.
But it has been slow moving either because of code reviewer bandwidth or
because of lack of folks implementing the tests. We can use your help in
bringing it across the finish line.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage

--
Divij Vaidya

Divij Vaidya

On Fri, Aug 11, 2023 at 4:55 AM ziming deng <dengziming1...@gmail.com>
wrote:

> Hi Liam,
>
> The Apache Kafka project has several modules, I think you should firstly
> select a module you are interested in.
>
> For example, we are currently working on KIP-500 related features, which
> includes
> 1. KIP-856: KRaft Disk Failure Recovery,
> 2. KIP-642: Dynamic quorum reassignment,
> 3. kafka-metadata-shell.sh,
> 4. KIP-866: ZooKeeper to KRaft Migration,
> 5. KIP-858: Handle JBOD broker disk failure in KRaft
> 6. Migrtion test cases to support Kraft mode
> 7. KRaft transactions
>
> We even have the idea of implementing multi raft and using it to replace
> kakfa replica protocal. Apart from KRaft, you can also explore tired
> storage, kafka streams, kafka connect,  group coordinator, transaction
> coordinator, which are also In rapid iteration.
>
> --,
> Best,
> Ziming
>
>
> > On Aug 11, 2023, at 08:16, Liam Hodges <lhodg...@binghamton.edu.INVALID>
> wrote:
> >
> > I'm working with a small team of engineers looking to contribute to the
> > open source tools for Apache Kafka. What is missing in the Kafka
> community
> > right now? Are there any problems an open source project could solve for
> > it's developers? Appreciate all feedback.
>
>

Re: What are the biggest issues with Apache Kafka?

Reply via email to