Hey Liam Thanks for asking this question. I have been meaning to write a post to the community for a long time about potential open areas where newcomers can contribute but it never made it to priority in my to-do list.
In addition to what others mentioned above, here's a couple of options to pick from. It's not an exhaustive list and I would be able to help more if you tell me what you folks are interested in working on (e.g. on server, client side, streams etc.) and what is the current familiarity with Kafka code base. I can personally provide rapid reviews for option 1 and option 3, since those are the ones I feel most passionate about, but can't promise time commitment from my side for other options. *Option 1: KIP-405 (Tiered Storage) related work* We are targeting an early access [1] release for KIP-405 [2] (tiered storage in Kafka) for the upcoming version in 3.6. There is loads of work left to polish this feature and make it production ready. If you like, you can help over there. You can pick up any "unassigned" ticket from https://issues.apache.org/jira/browse/KAFKA-7739 OR pick up a ticket where the assigned person hasn't provided an update in the last 1 month. *Option 2: Metrics related work* We currently use two different ways of capturing metrics on the broker/server. Historically we started with Yammer, moved to using KafkaMetrics starting on clients but more recently we started using KafkaMetrics on broker too. Currently the majority of broker metrics use Yammer (which has it's own set of problems such as we are using a 10 year old library) but the alternative KafkaMetrics has a slow histogram [2]. Here's a recent discussion about this: https://lists.apache.org/thread/jww851jcyjtsq010bbt81b5dgwzqrgwx and https://lists.apache.org/thread/f5wknqhmoo5lml99np7ksocz7fyk3m0r. You will find that on the broker, KafkaRaftMetrics uses KafkaMetrics but QuorumControllerMetrics uses Yammer metrics.We need someone in the community pick up unifying this so that we can start using only one methodology moving ahead. My recommendation would be to upgrade the library of Yammer to use the latest drop wizard library as proposed in https://cwiki.apache.org/confluence/display/KAFKA/KIP-510%3A+Metrics+library+upgrade but there are backwarrd compatibility problems associated with it. My colleague Christo has done some digging in the past on this and found that the major problem of completing KIP-510 comes from the usage of https://github.com/xvrl/kafka/blob/01208fd218286d2cd318a891f2cb5883422283b1/core/src/main/java/kafka/metrics/FilteringJmxReporter.java introduced in KIP-544. This functionality is no longer directly available in Dropwizard 4.2.0. Can you dig more into this and see if there is a way to upgrade without impacting backward compatibility? To summarise option 2, we have the following problems: 1. We use 10 year old version of a library for capturing yammer metrics 2. Histogram calculation in metrics is very expensive. See: https://issues.apache.org/jira/browse/KAFKA-15192?focusedCommentId=17744169&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17744169 3. KafkaMetrics library and Yammer metrics both have downsides as captured in https://issues.apache.org/jira/browse/KAFKA-15058, https://issues.apache.org/jira/browse/KAFKA-15154 and *Option 3: Zero copy over SSL* This is more of a personal project which I am not getting time to finish up. Today zero copy doesn't have SSL enabled in Kafka. However, there is a path forward on newer linux kernels by using kTLS. My idea is to have Kafka use dynamically bound openssl (>=3.0) via netty-tcnative. Openssl 3.0 and above can be compiled with the ability to enable kTLS. Hence, it should be possible to use Kafka + netty-tcnative + openSSL compiled with ktls flag on the OS to enable zero-copy even for SSL workloads. I can fill you in if this is something that you are interested in pursuing. *Option 4: Getting rid of easy mock & power mock dependencies from Kafka* We have been making slow and steady progress towards achieving this goal and it is being tracked in https://issues.apache.org/jira/browse/KAFKA-7438. But it has been slow moving either because of code reviewer bandwidth or because of lack of folks implementing the tests. We can use your help in bringing it across the finish line. [1] https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes [2] https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage -- Divij Vaidya Divij Vaidya On Fri, Aug 11, 2023 at 4:55 AM ziming deng <dengziming1...@gmail.com> wrote: > Hi Liam, > > The Apache Kafka project has several modules, I think you should firstly > select a module you are interested in. > > For example, we are currently working on KIP-500 related features, which > includes > 1. KIP-856: KRaft Disk Failure Recovery, > 2. KIP-642: Dynamic quorum reassignment, > 3. kafka-metadata-shell.sh, > 4. KIP-866: ZooKeeper to KRaft Migration, > 5. KIP-858: Handle JBOD broker disk failure in KRaft > 6. Migrtion test cases to support Kraft mode > 7. KRaft transactions > > We even have the idea of implementing multi raft and using it to replace > kakfa replica protocal. Apart from KRaft, you can also explore tired > storage, kafka streams, kafka connect, group coordinator, transaction > coordinator, which are also In rapid iteration. > > --, > Best, > Ziming > > > > On Aug 11, 2023, at 08:16, Liam Hodges <lhodg...@binghamton.edu.INVALID> > wrote: > > > > I'm working with a small team of engineers looking to contribute to the > > open source tools for Apache Kafka. What is missing in the Kafka > community > > right now? Are there any problems an open source project could solve for > > it's developers? Appreciate all feedback. > >