[
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733244#comment-14733244
]
Gwen Shapira commented on KAFKA-2510:
-------------------------------------
No, the issue I'm trying to prevent is definitely not resolved by controlled
shutdown.
Here's the scenario (I realize it sounds a bit contrived, but I've seen it
happen twice):
* Shut down an entire Kafka cluster for maintenance (Kafka upgrade, OS upgrade,
hardware upgrade, whatever)
* Sysadmin deploys a configuration change via automated tool. The tool replaces
all the configuration on the machine, including server.properties.
* Unfortunately, the new server.properties has a typo in the logs.dir
parameter, pointing to the wrong location.
* Bring up the cluster. Everything looks normal for a while, but all historical
data is gone. By the time you realize what went wrong, you face the choice of
either getting your old data back and losing the last few hours / days of new
data, or saying goodbye to your history.
It is obviously the sysadmin fault for misconfiguring, but most other
datastores would refuse to start under similar scenarios (i.e they have
multiple sources of truth regarding the existing data and will not start under
mismatches). It looks like we have the ability to make Kafka safer for our
users, and I don't see a reason not to do so.
> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -----------------------------------------------------------------------------
>
> Key: KAFKA-2510
> URL: https://issues.apache.org/jira/browse/KAFKA-2510
> Project: Kafka
> Issue Type: Bug
> Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and
> partitions.
> We have information about existing topics, partitions and their ISR in
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and
> doesn't have any data or directory for the partition, the broker will issue a
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required
> here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)