[ https://issues.apache.org/jira/browse/KAFKA-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Yu updated KAFKA-8516: ------------------------------ Description: Currently, in Kafka internals, a leader is responsible for all the read and write operations requested by the user. This naturally incurs a bottleneck since one replica, as the leader, would experience a significantly heavier workload than other replicas and also means that all client commands must pass through a chokepoint. If a leader fails, all processing effectively comes to a halt until another leader election. In order to help solve this problem, we could think about redesigning Kafka core so that any replica is able to do read and write operations as well. That is, the system be changed so that _all_ replicas have read/write permissions. This has multiple positives. Notably the following: * Workload can be more evenly distributed since leader replicas are weighted more than follower replicas (in this new design, all replicas are equal) * Some failures would not be as catastrophic as in the leader-follower paradigm. There is no one single "leader". If one replica goes down, others are still able to read/write as needed. Processing could continue without interruption. The implementation for such a change like this will be very extensive and discussion would be needed to decide if such an improvement as described above would warrant such a drastic redesign of Kafka internals. was: Currently, in Kafka internals, a leader is responsible for all the read and write operations requested by the user. This naturally incurs a bottleneck since one replica, as the leader, would experience a significantly heavier workload than other replicas and also means that all client commands must pass through a chokepoint. If a leader fails, all processing effectively comes to a halt until another leader election. In order to help solve this problem, we could think about redesigning Kafka core so that any replica is able to do read and write operations as well. That is, the system be changed so that _all_ replicas have read/write permissions. This has multiple positives. Notably the following: * Workload can be more evenly distributed since leader replicas are weighted more than follower replicas (in this new design, all partitions are equal) * Some failures would not be as catastrophic as in the leader-follower paradigm. There is no one single "leader". If one replica goes down, others are still able to read/write as needed. Processing could continue without interruption. The implementation for such a change like this will be very extensive and discussion would be needed to decide if such an improvement as described above would warrant such a drastic redesign of Kafka internals. > Consider allowing all replicas to have read/write permissions > ------------------------------------------------------------- > > Key: KAFKA-8516 > URL: https://issues.apache.org/jira/browse/KAFKA-8516 > Project: Kafka > Issue Type: Improvement > Reporter: Richard Yu > Priority: Major > > Currently, in Kafka internals, a leader is responsible for all the read and > write operations requested by the user. This naturally incurs a bottleneck > since one replica, as the leader, would experience a significantly heavier > workload than other replicas and also means that all client commands must > pass through a chokepoint. If a leader fails, all processing effectively > comes to a halt until another leader election. In order to help solve this > problem, we could think about redesigning Kafka core so that any replica is > able to do read and write operations as well. That is, the system be changed > so that _all_ replicas have read/write permissions. > > This has multiple positives. Notably the following: > * Workload can be more evenly distributed since leader replicas are weighted > more than follower replicas (in this new design, all replicas are equal) > * Some failures would not be as catastrophic as in the leader-follower > paradigm. There is no one single "leader". If one replica goes down, others > are still able to read/write as needed. Processing could continue without > interruption. > The implementation for such a change like this will be very extensive and > discussion would be needed to decide if such an improvement as described > above would warrant such a drastic redesign of Kafka internals. -- This message was sent by Atlassian JIRA (v7.6.3#76005)