[jira] [Updated] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper

2021-04-17 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-9118:

Parent: (was: KAFKA-9119)
Issue Type: Improvement  (was: Sub-task)

> LogDirFailureHandler shouldn't use Zookeeper
> 
>
> Key: KAFKA-9118
> URL: https://issues.apache.org/jira/browse/KAFKA-9118
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> As described in 
> [KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:
> {noformat}
> 2. A log directory stops working on a broker during runtime
> - The controller watches the path /log_dir_event_notification for new znode.
> - The broker detects offline log directories during runtime.
> - The broker takes actions as if it has received StopReplicaRequest for this 
> replica. More specifically, the replica is no longer considered leader and is 
> removed from any replica fetcher thread. (The clients will receive a 
> UnknownTopicOrPartitionException at this point)
> - The broker notifies the controller by creating a sequential znode under 
> path /log_dir_event_notification with data of the format {"version" : 1, 
> "broker" : brokerId, "event" : LogDirFailure}.
> - The controller reads the znode to get the brokerId and finds that the event 
> type is LogDirFailure.
> - The controller deletes the notification znode
> - The controller sends LeaderAndIsrRequest to that broker to query the state 
> of all topic partitions on the broker. The LeaderAndIsrResponse from this 
> broker will specify KafkaStorageException for those partitions that are on 
> the bad log directories.
> - The controller updates the information of offline replicas in memory and 
> trigger leader election as appropriate.
> - The controller removes offline replicas from ISR in the ZK and sends 
> LeaderAndIsrRequest with updated ISR to be used by partition leaders.
> - The controller propagates the information of offline replicas to brokers by 
> sending UpdateMetadataRequest.
> {noformat}
> Instead of the notification ZNode we should use a Kafka protocol that sends a 
> notification message to the controller with the offline partitions. The 
> controller then updates the information of offline replicas in memory and 
> trigger leader election, then removes the replicas from ISR in ZK and sends a 
> LAIR and an UpdateMetadataRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper

2019-10-30 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe updated KAFKA-9118:

Parent: KAFKA-9119
Issue Type: Sub-task  (was: Improvement)

> LogDirFailureHandler shouldn't use Zookeeper
> 
>
> Key: KAFKA-9118
> URL: https://issues.apache.org/jira/browse/KAFKA-9118
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> As described in 
> [KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:
> {noformat}
> 2. A log directory stops working on a broker during runtime
> - The controller watches the path /log_dir_event_notification for new znode.
> - The broker detects offline log directories during runtime.
> - The broker takes actions as if it has received StopReplicaRequest for this 
> replica. More specifically, the replica is no longer considered leader and is 
> removed from any replica fetcher thread. (The clients will receive a 
> UnknownTopicOrPartitionException at this point)
> - The broker notifies the controller by creating a sequential znode under 
> path /log_dir_event_notification with data of the format {"version" : 1, 
> "broker" : brokerId, "event" : LogDirFailure}.
> - The controller reads the znode to get the brokerId and finds that the event 
> type is LogDirFailure.
> - The controller deletes the notification znode
> - The controller sends LeaderAndIsrRequest to that broker to query the state 
> of all topic partitions on the broker. The LeaderAndIsrResponse from this 
> broker will specify KafkaStorageException for those partitions that are on 
> the bad log directories.
> - The controller updates the information of offline replicas in memory and 
> trigger leader election as appropriate.
> - The controller removes offline replicas from ISR in the ZK and sends 
> LeaderAndIsrRequest with updated ISR to be used by partition leaders.
> - The controller propagates the information of offline replicas to brokers by 
> sending UpdateMetadataRequest.
> {noformat}
> Instead of the notification ZNode we should use a Kafka protocol that sends a 
> notification message to the controller with the offline partitions. The 
> controller then updates the information of offline replicas in memory and 
> trigger leader election, then removes the replicas from ISR in ZK and sends a 
> LAIR and an UpdateMetadataRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper

2019-10-30 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass updated KAFKA-9118:
---
Description: 
As described in 
[KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:

{noformat}
2. A log directory stops working on a broker during runtime

- The controller watches the path /log_dir_event_notification for new znode.
- The broker detects offline log directories during runtime.
- The broker takes actions as if it has received StopReplicaRequest for this 
replica. More specifically, the replica is no longer considered leader and is 
removed from any replica fetcher thread. (The clients will receive a 
UnknownTopicOrPartitionException at this point)
- The broker notifies the controller by creating a sequential znode under path 
/log_dir_event_notification with data of the format {"version" : 1, "broker" : 
brokerId, "event" : LogDirFailure}.
- The controller reads the znode to get the brokerId and finds that the event 
type is LogDirFailure.
- The controller deletes the notification znode
- The controller sends LeaderAndIsrRequest to that broker to query the state of 
all topic partitions on the broker. The LeaderAndIsrResponse from this broker 
will specify KafkaStorageException for those partitions that are on the bad log 
directories.
- The controller updates the information of offline replicas in memory and 
trigger leader election as appropriate.
- The controller removes offline replicas from ISR in the ZK and sends 
LeaderAndIsrRequest with updated ISR to be used by partition leaders.
- The controller propagates the information of offline replicas to brokers by 
sending UpdateMetadataRequest.
{noformat}

Instead of the notification ZNode we should use a Kafka protocol that sends a 
notification message to the controller with the offline partitions. The 
controller then updates the information of offline replicas in memory and 
trigger leader election, then removes the replicas from ISR in ZK and sends a 
LAIR and an UpdateMetadataRequest.

  was:
As described in 
[KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:




> LogDirFailureHandler shouldn't use Zookeeper
> 
>
> Key: KAFKA-9118
> URL: https://issues.apache.org/jira/browse/KAFKA-9118
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> As described in 
> [KIP-112|https://cwiki.apache.org/confluence/display/KAFKA/KIP-112%3A+Handle+disk+failure+for+JBOD#KIP-112:HandlediskfailureforJBOD-Zookeeper]:
> {noformat}
> 2. A log directory stops working on a broker during runtime
> - The controller watches the path /log_dir_event_notification for new znode.
> - The broker detects offline log directories during runtime.
> - The broker takes actions as if it has received StopReplicaRequest for this 
> replica. More specifically, the replica is no longer considered leader and is 
> removed from any replica fetcher thread. (The clients will receive a 
> UnknownTopicOrPartitionException at this point)
> - The broker notifies the controller by creating a sequential znode under 
> path /log_dir_event_notification with data of the format {"version" : 1, 
> "broker" : brokerId, "event" : LogDirFailure}.
> - The controller reads the znode to get the brokerId and finds that the event 
> type is LogDirFailure.
> - The controller deletes the notification znode
> - The controller sends LeaderAndIsrRequest to that broker to query the state 
> of all topic partitions on the broker. The LeaderAndIsrResponse from this 
> broker will specify KafkaStorageException for those partitions that are on 
> the bad log directories.
> - The controller updates the information of offline replicas in memory and 
> trigger leader election as appropriate.
> - The controller removes offline replicas from ISR in the ZK and sends 
> LeaderAndIsrRequest with updated ISR to be used by partition leaders.
> - The controller propagates the information of offline replicas to brokers by 
> sending UpdateMetadataRequest.
> {noformat}
> Instead of the notification ZNode we should use a Kafka protocol that sends a 
> notification message to the controller with the offline partitions. The 
> controller then updates the information of offline replicas in memory and 
> trigger leader election, then removes the replicas from ISR in ZK and sends a 
> LAIR and an UpdateMetadataRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)