Re: Reviewing . . . RackAwareStrategy.java . . . ( rev 954657 )

Masood Mortazavi Tue, 15 Jun 2010 12:48:30 -0700

On Tue, Jun 15, 2010 at 10:55 AM, Jonathan Ellis <[email protected]> wrote:

> On Mon, Jun 14, 2010 at 11:03 PM, Masood Mortazavi
> <[email protected]> wrote:
> > The comment on the top of RackAwareStrategy says:
>
> You are correct.  RAS sort of works under other conditions but it is
> primarily intended for 2 DCs and RF=3.  I will update the comment in
> question.
>

An orthogonal but related problem is the following . . .

Currently, each replica placement strategy involves its own configuration
extensions, along with a great deal of repeated and intertwined code among
the strategies. (For example, all "strategies" currently need to iterate
through nodes. This is a common funcationality.)

The current approach not only affects construction of replica placement
strategies but also complicates their semantics.

It may be possible to refactor the code as follows:

(1) Each node has a set of properties assigned to it through the
configuration (right now, in the trunk, those properties are the "rack" and
"DC" position of a node but it should be possible to add any number of other
properties, and they should really all be in the same configuration file,
not separated as they are, today, in two or more separate files).

(2) Once these physical properties are assigned/defined for each node, a
pluggablity architecture would allow whoever extends the node properties, to
plug-in a node "Examiner" as a complement to any additional properties.

(3) In the iteration that's common to all replica placement search logic,
the "Examiner" will either "pass" or "fail" an (iterated) node as a replica
place for a given primary based on the properties of that node.

Although such refactoring is not entirely trivial, it will lead to less
repetition across "strategies", better factoring of concerns and more
reliable code, I believe.

It will also make maintenance and extension of strategies much easier . . .

>
> > There are other issues to think about. For example, for quorum write
> > (consistency.quorum) to work faster, shouldn't the first replicas be as
> > close as possible (i.e. on the same rack)?  The whole point of choosing
> this
> > level of consistency is to improve performance. Right?
>
> No, the point is to improve reliability (there are a number of failure
> scenarios that will result in losing an entire rack at once).
>

Yes, I understand that.

What I was trying to say is that, if we agree to the above, we should select
the other-DC and other-Rack replica after we have selected all "near"
replicas.

(I imagine that, during actual replication, the replica placement list is
iterated sequentially and taht the first replica will have to be the nearest
and then the farther and farther replicas are chosen and put on the list.)

Thanks,
- m.

>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Reviewing . . . RackAwareStrategy.java . . . ( rev 954657 )

Reply via email to