[
https://issues.apache.org/jira/browse/KAFKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598615#comment-14598615
]
Allen Wang commented on KAFKA-1215:
-----------------------------------
We have a working solution now for rack aware assignment. It is based on
current patch for this JIRA but with some improvement. The key idea of the
solution is:
- Rack ID is a String instead of integer
- For replica assignment, add an extra parameter of Map[Int, String] to
assignReplicasToBrokers() method which maps broker ID to rack ID
- Before doing the rack aware assignment, sort the broker list such that they
are interlaced according to the rack. In other words, adjacent brokers should
not be in the same rack if possible . For example, assuming 6 brokers mapping
to 3 racks:
0 -> "rack1", 1 -> "rack1", 2 -> "rack2", 3 -> "rack2", 4 -> "rack3", 5 ->
"rack3"
The sorted broker list could be (0, 2, 4, 1, 3, 5)
- Apply the same assignment algorithm to assign replicas, with the addition of
skipping a broker if its rack is already used for the same partition (similar
to what has been done in current patch)
The benefit of this approach is that replica distribution is kept as even as
possible to all the racks and brokers.
With regard to KAFKA-1792, an easy solution is to restrict replica movement
within the same rack, which I think should work in most practical cases. It
will also have added benefit that usually replicas move faster within a rack.
So basically we can apply the same algorithm described in KAFKA-1792 for each
rack. For example, if there are three racks, then apply the algorithm three
times, each time with broker list and assignment for that specific rack. Again,
we assume the broker to rack mapping will be available in the method signature.
The open question is how to obtain broker to rack mapping. The information can
be supplied when Kafka registers the broker with ZooKeeper which means some
information has to be added to ZooKeeper. However, it could be that the rack
information is already available in a deployment independent way. For example,
for some deployment, the rack information may be available in a database. What
we can do is to abstract out the API required to obtain rack information in an
interface and allow user to supply an implementation in command line or at
broker start up (to handle auto topic creation).
> Rack-Aware replica assignment option
> ------------------------------------
>
> Key: KAFKA-1215
> URL: https://issues.apache.org/jira/browse/KAFKA-1215
> Project: Kafka
> Issue Type: Improvement
> Components: replication
> Affects Versions: 0.8.0
> Reporter: Joris Van Remoortere
> Assignee: Jun Rao
> Fix For: 0.9.0
>
> Attachments: rack_aware_replica_assignment_v1.patch,
> rack_aware_replica_assignment_v2.patch
>
>
> Adding a rack-id to kafka config. This rack-id can be used during replica
> assignment by using the max-rack-replication argument in the admin scripts
> (create topic, etc.). By default the original replication assignment
> algorithm is used because max-rack-replication defaults to -1.
> max-rack-replication > -1 is not honored if you are doing manual replica
> assignment (preffered).
> If this looks good I can add some test cases specific to the rack-aware
> assignment.
> I can also port this to trunk. We are currently running 0.8.0 in production
> and need this, so i wrote the patch against that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)