Werner Daehn created KAFKA-6563:
-----------------------------------
Summary: Kafka online backup
Key: KAFKA-6563
URL: https://issues.apache.org/jira/browse/KAFKA-6563
Project: Kafka
Issue Type: Improvement
Components: core
Reporter: Werner Daehn
If you consider Kafka to be a "database" with "transaction logs", you need a
way for backup/recovery just like databases do.
The beauty of such a solution would be to enable Kafka for smaller scenarios
where you do not want to have a large cluster. You could even use a single node
Kafka. In worst case you lose all data since the backup and you have to ask the
sources to send that data again - for most that is possible.
Currently you have multiple options, none of which are good.
# Setup Kafka fault tolerant and with replication factors: Needs a larger
server and does not prevent many types of problems like software bugs, deleting
a topic by accident,...
# Mirror Kafka: Very expensive.
# Shutdown Kafka, disk copy, startup Kafka
# Add a database before Kafka as a primary persistence: Very very expensive
and forfeits the idea of Kafka
I wonder what really is needed for an online backup strategy. If I am not
mistaken it is relatively little.
* A command that causes Kafka to switch to new files so that the file
containing all past data do not change any longer.
* Export of the current Zookeeper values, unless they can be recreated from
the transaction log files anyhow.
* Then you can backup the Kafka files
* A command that tells that the backup is finished to cleanup things.
* Later a way to merge the recovered backup instance with the Kafka log
written since then up to a certain point. Example: Backup was taken at
midnight, delete topic was done a 11:00. You start with the backup, apply the
logs until 10:59 and then you bring up Kafka fully online again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)