Re: Reviewing . . . RackAwareStrategy.java . . . ( rev 954657 )
On Tue, Jun 15, 2010 at 10:55 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Jun 14, 2010 at 11:03 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: The comment on the top of RackAwareStrategy says: You are correct. RAS sort of works under other conditions but it is primarily intended for 2 DCs and RF=3. I will update the comment in question. An orthogonal but related problem is the following . . . Currently, each replica placement strategy involves its own configuration extensions, along with a great deal of repeated and intertwined code among the strategies. (For example, all strategies currently need to iterate through nodes. This is a common funcationality.) The current approach not only affects construction of replica placement strategies but also complicates their semantics. It may be possible to refactor the code as follows: (1) Each node has a set of properties assigned to it through the configuration (right now, in the trunk, those properties are the rack and DC position of a node but it should be possible to add any number of other properties, and they should really all be in the same configuration file, not separated as they are, today, in two or more separate files). (2) Once these physical properties are assigned/defined for each node, a pluggablity architecture would allow whoever extends the node properties, to plug-in a node Examiner as a complement to any additional properties. (3) In the iteration that's common to all replica placement search logic, the Examiner will either pass or fail an (iterated) node as a replica place for a given primary based on the properties of that node. Although such refactoring is not entirely trivial, it will lead to less repetition across strategies, better factoring of concerns and more reliable code, I believe. It will also make maintenance and extension of strategies much easier . . . There are other issues to think about. For example, for quorum write (consistency.quorum) to work faster, shouldn't the first replicas be as close as possible (i.e. on the same rack)? The whole point of choosing this level of consistency is to improve performance. Right? No, the point is to improve reliability (there are a number of failure scenarios that will result in losing an entire rack at once). Yes, I understand that. What I was trying to say is that, if we agree to the above, we should select the other-DC and other-Rack replica after we have selected all near replicas. (I imagine that, during actual replication, the replica placement list is iterated sequentially and taht the first replica will have to be the nearest and then the farther and farther replicas are chosen and put on the list.) Thanks, - m. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Secondary indexing and 0.6/0.7 integration with Datanucleus
What issue were you trying to link? :) On Tue, Jun 15, 2010 at 6:56 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm implementing a Datanucleus plugin for Cassandra. I'm finished with the basic functionality, and everything seems to work pretty well. Now my issue is performing secondary indexing on fields within my data. I have outlined some of the issues I'm facing in this post. http://www.datanucleus.org/servlet/forum/viewthread_thread,6087_lastpage,yes#32610 Essentially, for each operand the user specifies, I will need to make a trip to Cassandra, load the key columns, then perform an intersection with the result from my previous read. Eventually at the end of all the intersections, I will have a list of keys I will then load. This obviously requires several trips to Cassandra, where from my understanding of secondary indexing, I would only need to make one trip for multiple operands over a column family. I've read over this issue. http://issues.apache.org/jira/browse/CASSANDRA-32610 And it seems to solve a lot of my woes. Is it possible/recommended to patch the current code base of 0.6.2 to perform this functionality? Thanks, Todd -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Secondary indexing and 0.6/0.7 integration with Datanucleus
No chance that 749 can be backported to 0.6, sorry. On Tue, Jun 15, 2010 at 10:35 PM, Todd Nine t...@spidertracks.co.nz wrote: Lets try that again. This is the intended issue. https://issues.apache.org/jira/browse/CASSANDRA-749 thanks, Todd On Tue, 2010-06-15 at 20:02 -0500, Jonathan Ellis wrote: What issue were you trying to link? :) On Tue, Jun 15, 2010 at 6:56 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm implementing a Datanucleus plugin for Cassandra. I'm finished with the basic functionality, and everything seems to work pretty well. Now my issue is performing secondary indexing on fields within my data. I have outlined some of the issues I'm facing in this post. http://www.datanucleus.org/servlet/forum/viewthread_thread,6087_lastpage,yes#32610 Essentially, for each operand the user specifies, I will need to make a trip to Cassandra, load the key columns, then perform an intersection with the result from my previous read. Eventually at the end of all the intersections, I will have a list of keys I will then load. This obviously requires several trips to Cassandra, where from my understanding of secondary indexing, I would only need to make one trip for multiple operands over a column family.I've read over this issue. http://issues.apache.org/jira/browse/CASSANDRA-32610 And it seems to solve a lot of my woes. Is it possible/recommended to patch the current code base of 0.6.2 to perform this functionality? Thanks, Todd -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Secondary indexing and 0.6/0.7 integration with Datanucleus
No problem, I didn't want to implement my own solution if an existing one could easily be applied. Since I'll be creating CF that represent secondary indexes, I'll need to perform range scans over the keys of those secondary index CFs. The column names within the CF's are the row keys of the primary table. Is there a way I can get the intersection of all of the column names from multiple ranges scans over different column families in one result set? Otherwise I'll need to make multiple trips and create the intersection myself in my plugin. Here is an example of what I'm trying to do. CF: Person key1: { firstName: John lastName: Smith email: smi...@foo.com } key2: { firstName: Jane lastName: Smith email: smi...@foo.com } key3: { firstName: Jane lastName: Doe email: smi...@foo.com } My secondary index tables would be the following CF: Person_LastName Smith:{ key1: 0x00 key2: 0x00 } Doe: { key3:0x00 } CF: Person_Email smi...@foo.com:{ key1:0x00 key2:0x00 key3:0x00 } If my input is something similar to lastName == 'Smith' email == smi...@foo.com, I would return all columns from key Smith in CF Person_LastName, and all columns from key smi...@foo.com in CF Person_Email. The intersection of the two sets is key1, and key2, and have cassandra only return those rows. Thanks, Todd On Tue, 2010-06-15 at 23:38 -0500, Jonathan Ellis wrote: No chance that 749 can be backported to 0.6, sorry. On Tue, Jun 15, 2010 at 10:35 PM, Todd Nine t...@spidertracks.co.nz wrote: Lets try that again. This is the intended issue. https://issues.apache.org/jira/browse/CASSANDRA-749 thanks, Todd On Tue, 2010-06-15 at 20:02 -0500, Jonathan Ellis wrote: What issue were you trying to link? :) On Tue, Jun 15, 2010 at 6:56 PM, Todd Nine t...@spidertracks.co.nz wrote: Hi all, I'm implementing a Datanucleus plugin for Cassandra. I'm finished with the basic functionality, and everything seems to work pretty well. Now my issue is performing secondary indexing on fields within my data. I have outlined some of the issues I'm facing in this post. http://www.datanucleus.org/servlet/forum/viewthread_thread,6087_lastpage,yes#32610 Essentially, for each operand the user specifies, I will need to make a trip to Cassandra, load the key columns, then perform an intersection with the result from my previous read. Eventually at the end of all the intersections, I will have a list of keys I will then load. This obviously requires several trips to Cassandra, where from my understanding of secondary indexing, I would only need to make one trip for multiple operands over a column family.I've read over this issue. http://issues.apache.org/jira/browse/CASSANDRA-32610 And it seems to solve a lot of my woes. Is it possible/recommended to patch the current code base of 0.6.2 to perform this functionality? Thanks, Todd