Hi,

Ran into this as I was going through the new config files for data centers
and racks.
(I may have some comments on those configuration models but will send them
later.)

Turning to RackAwareStrategy.java:

The comment on the top of RackAwareStrategy says:

/*
 * This Replication Strategy returns the nodes responsible for a given
 * key but respects rack awareness. It places one replica in a
 * different data center from the first (if there is any such data center),
 * and remaining replicas in different racks in the same datacenter as
 * the first.
 */

However, the code -- as it is written today -- *seems* to be actually doing
something like the following:

/*
 * This Replication Strategy returns the nodes responsible for a given
 * key but "respects" rack awareness. It places one replica in a
 * different data center from the first (if there is any such data center),
 * and *one* replica in different rack but in the same data center
 * (if there is any such rack), and it spreads the remaining replicas
 * on nodes along the ring, distinct from the first two non-primary
replicas.
 */

It may make sense to clarify this important semantic difference by updating
the commment (along the above lines) to better reflect to code.

Alternatively, the code inside the first while loop in
calculateNaturalEndpoints can be changed to implement some other semantics
that would be more suitable.

In general, with the introduction of data center configurations, the
semantics of this class need to be clarified so the strategy for placing
endpoints on "Set<InetAddress> endpoints" can be implemented accordingly.

There are other issues to think about. For example, for quorum write
(consistency.quorum) to work faster, shouldn't the first replicas be as
close as possible (i.e. on the same rack)?  The whole point of choosing this
level of consistency is to improve performance. Right?

I hope this helps, and I hope I've not missed something completely obvious.

Best regards,
- m.

Reply via email to