Hi Assane,

I also share the same concern as Greg has, which is that the KIP is not
kafka ecosystem friendly.
And this will make the kafka client and broker have high dependencies that
once you use the pluggable compression interface, the producer must be java
client.
This seems to go against the original Kafka's design.

If the proposal can support all kinds of clients, that would be great.

Thanks.
Luke

On Tue, Feb 27, 2024 at 7:44 AM Diop, Assane <assane.d...@intel.com> wrote:

> Hi Greg,
>
> Thanks for taking the time to give some feedback. It was very insightful.
>
> I have some answers:
>
> 1. The current proposal is Java centric. We want to figure out with Java
> first and then later incorporate other languages. We will get there.
>
> 2. The question of where the plugins would live is an important one. I
> would like to get the community engagement on where a plugin would live.
>    Officially supported plugins could be part of Kafka and others could
> live in a plugin repository. Is there currently a way to store plugins in
> Kafka and load them into the classpath? If such a space could be allowed
> then it would provide an standard way of installing officially supported
> plugins.
>    In OpenSearch for example, there is a plugin utility that takes the jar
> and installs it across the cluster, privileges can be granted by an admin.
> Such utility could be implemented in Kafka.
>
> 3. There is many way to look at this, we could change the message format
> that use the pluggable interface to be for example v3 and synchronize
> against that.
>    In order to use the pluggable codec, you will have to be at message
> version 3 for example.
>
> 4. Passing the class name as metadata is one way to have the producer talk
> to the broker about which plugin to use. However there could be other
> implementation
>    where you could set every thing to know about the topic using topic
> level compression. In this case for example a rule could be that in order
> to use the
>    pluggable interface, you should use topic level compression.
>
>  I would like to have your valuable inputs on this!!
>
> Thanks before end,
> Assane
>
> -----Original Message-----
> From: Greg Harris <greg.har...@aiven.io.INVALID>
> Sent: Wednesday, February 14, 2024 2:36 PM
> To: dev@kafka.apache.org
> Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to Kafka
>
> Hi Assane,
>
> Thanks for the KIP!
> Looking back, it appears that the project has only ever added compression
> types twice: lz4 in 2014 and zstd in 2018, and perhaps Kafka has fallen
> behind the state-of-the-art compression algorithms.
> Thanks for working to fix that!
>
> I do have some concerns:
>
> 1. I think this is a very "java centric" proposal, and doesn't take
> non-java clients into enough consideration. librdkafka [1] is a great
> example of an implementation of the Kafka protocol which doesn't have the
> same classloading and plugin infrastructure that Java has, which would make
> implementing this feature much more difficult.
>
> 2. By making the interface pluggable, it puts the burden of maintaining
> individual compression codecs onto external developers, which may not be
> willing to maintain a codec for the service-lifetime of such a codec.
> An individual developer can easily implement a plugin to allow them to use
> a cutting-edge compression algorithm without consulting the Kafka project,
> but as soon as data is compressed using that algorithm, they are on the
> hook to support that plugin going forward by the organizations using their
> implementation.
> Part of the collective benefits of the Kafka project is to ensure the
> ongoing maintenance of such codecs, and provide a long deprecation window
> when a codec reaches EOL. I think the Kafka project is well-equipped to
> evaluate the maturity and properties of compression codecs and then
> maintain them going forward.
>
> 3. Also by making the interface pluggable, it reduces the scope of
> individual compression codecs. No longer is there a single lineage of Kafka
> protocols, where vN+1 of a protocol supports a codec that vN does not. Now
> there will be "flavors" of the protocol, and operators will need to ensure
> that their servers and their clients support the same "flavors" or else
> encounter errors.
> This is the sort of protocol forking which can be dangerous to the Kafka
> community going forward. If there is a single lineage of codecs such that
> the upstream Kafka vX.Y supports codec Z, it is much simpler for other
> implementations to check and specify "Kafka vX.Y compatible", than it is to
> check & specify "Kafka vX.Y & Z compatible".
>
> 4. The Java class namespace is distributed, as anyone can name their class
> anything. It achieves this by being very verbose, with long fully-qualified
> names for classes. This is in conflict with a binary protocol, where it is
> desirable for the overhead to be as small as possible.
> This may incentivise developers to keep their class names short, which
> also makes conflict more likely. If you have the option of naming your
> class "B" instead of "org.example.blah.BrotlCompressionCodecVersionOne",
> and meaningfully save a flat 47 bytes on every request, somebody/everybody
> is going to do that.
> This now increases the likelihood for conflict, as perhaps two developers
> want the same short name. Yes there are 52 one-letter class names, but to
> ensure that no two codecs ever conflict requires global coordination that a
> pluggable interface tries to avoid.
> Operators then take on the burden of ensuring that the "B" codec on the
> other machine is indeed the "B" codec that they have installed on their
> machines, or else encounter errors.
>
> I think that having contributors propose that Kafka support their favorite
> compression type in order to get assigned a globally unique number is much
> healthier for the ecosystem than making this a pluggable interface and
> leaving the namespace to be wrangled by operators and client libraries.
>
> Thanks,
> Greg
>
> [1] https://github.com/confluentinc/librdkafka
> [2]
> https://github.com/apache/kafka/blob/e8c70fce26626ed2ab90f2728a45f6e55e907ec1/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L130
>
> On Wed, Feb 14, 2024 at 12:59 PM Diop, Assane <assane.d...@intel.com>
> wrote:
> >
> > Hi Divij, Mickael,
> > Since Mickael KIP-390 was accepted, I did not want to respond in that
> thread to not confuse the work.
> >
> > As mentioned in the thread, the KIP-390 and KIP-984 do not supercede
> each other. However the scope of KIP-984 goes beyond the scope of KIP-390.
> Pluggable compression interface is added as a new codec. The other codecs
> already implemented are not affected by this change.  I believe these 2 KIP
> are not the same but they compliment each other.
> >
> > As I stated before, the motivation is to give the users the ability to
> use different compressors without needing future changes in Kafka.
> > Kafka currently supports zstd, snappy, gzip and lz4. However, other
> opensource compression projects like the Brotli algorithm are also gaining
> traction. For example the HTTP servers Apache and nginx offer Brotli
> compression as an option. With a pluggable interface, any Kafka developer
> could integrate and test Brotli with Kafka simply by writing a plugin. This
> same motivation can be applied to any other compression algorithm including
> hardware accelerated compression. There are hardware companies including
> intel and AMD that are working on accelerating compression.
> >
> > The main change in itself is an update in the message format to allow
> for metadata to be passed indicating the which plugin to use  to the
> broker. This only happens if the user selects the pluggable codec. The
> metadata adds on an additional 52 bytes to the message format.
> >
> > Broker recompression is taking care of when producer and brokers have
> different codec because it is just another codec being added as far as
> Kafka.
> > I have added more information to the
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+plugg
> > able+compression+interface+to+Kafka I am ready for a PR if this KIP
> > gets accepted
> >
> > Assane
> >
> > -----Original Message-----
> > From: Diop, Assane <assane.d...@intel.com>
> > Sent: Wednesday, January 31, 2024 10:24 AM
> > To: dev@kafka.apache.org
> > Subject: RE: DISCUSS KIP-984 Add pluggable compression interface to
> > Kafka
> >
> > Hi Divij,
> > Thank you for your response!
> >
> > Although compression is not a new problem, it has continued to be an
> important research topic.
> > The integration and testing of new compression algorithms into Kafka
> currently requires significant code changes and rebuilding of the
> distribution package for Kafka.
> > This KIP will allow for any compression algorithm to be seamlessly
> integrated into Kafka by writing a plugin that would bind into the
> wrapForInput and wrapForOutput methods in Kafka.
> >
> > As you mentioned, Kafka currently supports zstd, snappy, gzip and lz4.
> However, other opensource compression projects like the Brotli algorithm
> are also gaining traction. For example the HTTP servers Apache and nginx
> offer Brotli compression as an option. With a pluggable interface, any
> Kafka developer could integrate and test Brotli with Kafka simply by
> writing a plugin. This same motivation can be applied to any other
> compression algorithm including hardware accelerated compression. There are
> hardware companies including intel and AMD that are working on accelerating
> compression.
> >
> > This KIP would certainly complement the current
> https://issues.apache.org/jira/browse/KAFKA-7632 by adding even more
> flexibility for the users.
> > A plugin could be tailored to arbitrary datasets in response to a user's
> specific resource requirements.
> >
> > For reference, other opensource projects have already started or
> implemented this type of plugin technology such as:
> >         1. Cassandra, which has implemented the same concept of
> pluggable interface.
> >         2. OpenSearch is also working on enabling the same type of
> plugin framework.
> >
> > With respect to message recompression, the plugin interface would handle
> this use case on the broker side similar to the current recompression
> process.
> >
> > Assane
> >
> > -----Original Message-----
> > From: Divij Vaidya <divijvaidy...@gmail.com>
> > Sent: Friday, December 22, 2023 2:27 AM
> > To: dev@kafka.apache.org
> > Subject: Re: DISCUSS KIP-984 Add pluggable compression interface to
> > Kafka
> >
> > Thank you for writing the KIP Assane.
> >
> > In general, exposing a "pluggable" interface is not a decision made
> lightly because it limits our ability to remove / change that interface in
> future.
> > Any future changes to the interface will have to remain compatible with
> existing plugins which limits the flexibility of changes we can make inside
> Kafka. Hence, we need a strong motivation for adding a pluggable interface.
> >
> > 1\ May I ask the motivation for this KIP? Are the current compression
> > codecs (zstd, gzip, lz4, snappy) not sufficient for your use case?
> > Would proving fine grained compression options as proposed in
> > https://issues.apache.org/jira/browse/KAFKA-7632 and
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Support+C
> > ompression+Level
> > address your use case?
> > 2\ "This option impacts the following processes" -> This should also
> include the decompression and compression that occurs during message
> version transformation, i.e. when client send message with V1 and broker
> expects in V2, we convert the message and recompress it.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Dec 18, 2023 at 7:22 PM Diop, Assane <assane.d...@intel.com>
> wrote:
> >
> > > I would like to bring some attention to this KIP. We have added an
> > > interface to the compression code that allow anyone to build their
> > > own compression plugin and integrate easily back to kafka.
> > >
> > > Assane
> > >
> > > -----Original Message-----
> > > From: Diop, Assane <assane.d...@intel.com>
> > > Sent: Monday, October 2, 2023 9:27 AM
> > > To: dev@kafka.apache.org
> > > Subject: DISCUSS KIP-984 Add pluggable compression interface to
> > > Kafka
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-984%3A+Add+plu
> > > gg
> > > able+compression+interface+to+Kafka
> > >
>

Reply via email to