Hi Lukas,

Thanks for your interest in Tiered Storage!

Disclaimer: I work/have worked for both of the vendors mentioned.

There are two different Tiered Storage implementations:
* Confluent's closed-source implementation [1]
* Apache Kafka's open-source implementation [2]

You can read the documentation for each implementation to find out
more, but at a high level the two implementations:
1. Solve the same problem of offloading storage from brokers' disks
2. Expose different features, such as supporting compacted topics
3. Are distributed under different licenses

The Apache Kafka implementation has a pluggable architecture, such
that storage backend plugins are required, but not provided by the
Kafka project.
Aiven is developing and distributing these plugins [3] which, when
combined with the Apache Kafka open-source implementation, provide the
Tiered Storage functionality.
You can also avoid depending on the mentioned vendors by developing or
finding alternative plugins for the Apache Kafka implementation.

You will need to make the comparison yourself as to which
implementation suits your use-case.

To directly answer your questions:
1. The Confluent implementation is mature and ready for production
[1]. The Apache Kafka implementation is still considered early-access
[4].
2. Using one of the vendors is not required, but may save you the
development and maintenance costs associated with implementing a
solution yourself.
3. The Confluent implementation uses an in-memory cache [5]. The
Apache Kafka implementation serves directly from the plugin without
caching [6], and the Aiven plugin has support for in-memory caching
and disk caching [7].

Hope this clears things up!
Greg Harris

[1] https://docs.confluent.io/platform/current/clusters/tiered-storage.html
[2] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
[3] https://github.com/Aiven-Open/tiered-storage-for-apache-kafka
[4] 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
[5] https://developer.confluent.io/courses/architecture/tiered-storage/
[6] 
https://github.com/apache/kafka/blob/b9a5b4a8053c1fa65e27a9f93440194b0dd5eec4/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1367
[7] 
https://github.com/Aiven-Open/tiered-storage-for-apache-kafka?tab=readme-ov-file#local-cache



On Fri, Mar 8, 2024 at 10:21 AM Lukas Zimmerman <lukaszi...@gmail.com> wrote:
>
> Hey!
>
> I came across the Tiered Storage feature in Kafka and found this feature
> quite exciting! It seems like it could help with dealing with the retention
> of large amounts of data in Kafka topics.
>
> I have several questions:
>
> 1/ Is it still in early access? When will it be considered ready for
> production?
> 2/ It appears this might require a module from a vendor, such as Aiven or
> Confluent. Am I mistaken?
> 3/ Assuming I've offloaded data to S3, when I need to read the offloaded
> data back using a Kafka consumer, does the broker read the data directly
> from S3, or is the data first moved back to the broker's disks before being
> read?
>
> Thank you very much for your help :-)

Reply via email to