Radoslaw Gruchalski created KAFKA-3726:
------------------------------------------
Summary: Enable cold storage option
Key: KAFKA-3726
URL: https://issues.apache.org/jira/browse/KAFKA-3726
Project: Kafka
Issue Type: Wish
Reporter: Radoslaw Gruchalski
This JIRA builds up on the cold storage article I have published on Medium. The
copy of the article attached here.
The need for cold storage or an "indefinite" log seems to be quite often
discussed on the user mailing list.
The cold storage idea would enable the opportunity for the operator to keep the
raw Kafka offset files in a third party storage and allow retrieving the data
back for re-consumption.
The two possible options for enabling such functionality are, from the article:
First approach: if Kafka provided a notification mechanism and could trigger a
program when a segment file is to be discarded, it would become feasible to
provide a standard method of moving data to cold storage in reaction to those
events. Once the program finishes backing the segments up, it could tell Kafka
“it is now safe to delete these segments”.
The second option is to provide an additional value for the log.cleanup.policy
setting, call it cold-storage. In case of this value, Kafka would move the
segment files — which otherwise would be deleted — to another destination on
the server. They can be picked up from there and moved to the cold storage.
Both have their limitations. The former one is simply a mechanism exposed to
allow operator building up the tooling necessary to enable this. Events could
be published in a manner similar to Mesos Event Bus
(https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself
could provide a control topic on which such info would be published. The
outcome is, the operator can subscribe to the event bus and get notified about,
at least, two events:
- log segment is complete and can be backed up
- partition leader changed
These two, together with an option to keep the log segment safe from compaction
for a certain amount of time, would be sufficient to reliably implement cold
storage.
The latter option, {{log.cleanup.policy}} setting would be more complete
feature but it is also much more difficult to implement. All brokers would
have keep the backup of the data in the cold storage significantly increasing
the size requirements, also, the de-duplication of the data for the replicated
data would be left completely to the operator.
In any case, the thing to stay away from is having Kafka to deal with the
physical aspect of moving the data to and back from the cold storage. This is
not Kafka's task. The intent is to provide a method for reliable cold storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)