Here's a summary of my earlier comments: In a more flexible architecture, one must be able to provide or plug-in configuration elements along with one's replica placement strategy plug-in, in order to extend or complement the availability semantics along with one's plug-in code.
As the base of the plug-in, Cassandra can provide the iterators of tokens/nodes, that the strategy must use to walk to find the "place" of replicas for a given token/node. Cassandra will also need to provide a handle to "private" configuration files, to the strategy code. - m. On Tue, Jun 15, 2010 at 12:47 PM, Masood Mortazavi < masoodmortaz...@gmail.com> wrote: > On Tue, Jun 15, 2010 at 10:55 AM, Jonathan Ellis <jbel...@gmail.com>wrote: > >> On Mon, Jun 14, 2010 at 11:03 PM, Masood Mortazavi >> <masoodmortaz...@gmail.com> wrote: >> > The comment on the top of RackAwareStrategy says: >> >> You are correct. RAS sort of works under other conditions but it is >> primarily intended for 2 DCs and RF=3. I will update the comment in >> question. >> > > > An orthogonal but related problem is the following . . . > > Currently, each replica placement strategy involves its own configuration > extensions, along with a great deal of repeated and intertwined code among > the strategies. (For example, all "strategies" currently need to iterate > through nodes. This is a common funcationality.) > > The current approach not only affects construction of replica placement > strategies but also complicates their semantics. > > It may be possible to refactor the code as follows: > > (1) Each node has a set of properties assigned to it through the > configuration (right now, in the trunk, those properties are the "rack" and > "DC" position of a node but it should be possible to add any number of other > properties, and they should really all be in the same configuration file, > not separated as they are, today, in two or more separate files). > > (2) Once these physical properties are assigned/defined for each node, a > pluggablity architecture would allow whoever extends the node properties, to > plug-in a node "Examiner" as a complement to any additional properties. > > (3) In the iteration that's common to all replica placement search logic, > the "Examiner" will either "pass" or "fail" an (iterated) node as a replica > place for a given primary based on the properties of that node. > > Although such refactoring is not entirely trivial, it will lead to less > repetition across "strategies", better factoring of concerns and more > reliable code, I believe. > > It will also make maintenance and extension of strategies much easier . . . > > > > >> >> > There are other issues to think about. For example, for quorum write >> > (consistency.quorum) to work faster, shouldn't the first replicas be as >> > close as possible (i.e. on the same rack)? The whole point of choosing >> this >> > level of consistency is to improve performance. Right? >> >> No, the point is to improve reliability (there are a number of failure >> scenarios that will result in losing an entire rack at once). >> > > > Yes, I understand that. > > What I was trying to say is that, if we agree to the above, we should > select the other-DC and other-Rack replica after we have selected all "near" > replicas. > > (I imagine that, during actual replication, the replica placement list is > iterated sequentially and taht the first replica will have to be the nearest > and then the farther and farther replicas are chosen and put on the list.) > > Thanks, > - m. > > > >> >> -- >> >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > >