Wish list [from "users survey" thread]

Jonathan Ellis Mon, 23 Nov 2009 11:46:04 -0800

Since the users survey is getting long [thanks, everyone!] I thought
these deserved a separate thread.


The next release is currently tagged in JIRA as 0.9.  All the issues
linked here except #342 are tagged for 0.9, which is more of a guess
than a promise -- this being open source, a lot could change,
depending on who steps up to help out between now and then.

In order I saw them in the survey thread:

1. Increment/decrement: "atomic" is a dirty word in a system
emphasizing availability, but incr/decr can be provided in an
"eventually consistent" manner with vector clocks.  There are other
possible approaches but this is probably the best fit for us.  We'd
want to allow ColumnFamilies with either traditional (for Cassandra)
long timestamps, or vector clocks, but not mixed.  The bad news is,
this is a very substantial change and will probably not be in 0.9
unless someone steps up to do the work.  (This would also cover
"flexible conflict resolution," which came up as well.)

2. Load balancing: 0.5 will have basic load balancing, and we'll keep
improving it from there

3. Decommission: also in 0.5

4. Map/reduce support: I'd also like to see Cassandra able to be an
input source for Hadoop; Jeff Hodges did some work here but it never
really got beyond proof-of-concept
(https://issues.apache.org/jira/browse/CASSANDRA-342). I'd like to see
this in 0.9 but it requires knowing both Hadoop and Cassandra well
which raises the bar.  I can help with the Cassandra side if someone
who knows Hadoop wants help here.

5. ColumnFamily / Keyspace definitions w/o restart: another
long-standing like-to-have.
https://issues.apache.org/jira/browse/CASSANDRA-44.  Volunteers
welcome.

6. Cluster management / monitoring: We've exposed everything through
JMX, which is slightly cumbersome but it's a standard and it's there.
OpenNMS, Nagios, and Munin at least offer some level of JMX support.
So I think the right approach here is to contribute Cassandra
awareness to those projects.

7. Column versioning a la Bigtable: I do not think this is a good fit
for Cassandra, but I am open to suggestions as to how to accomplish it
cleanly.

8. Speed: 0.5 has some pretty significant improvements here (2x write
throughput in Rackspace's tests), and we're definitely going to keep
improving this.

9. Design documentation: also agreed.  Chris has started on this
(http://wiki.apache.org/cassandra/ArchitectureSSTable) and I will try
to at least sketch out some more this week.

10. insert of multiple rows at once: Tim Huske is working on this now
for https://issues.apache.org/jira/browse/CASSANDRA-336.

11. remove_slice_range / remove_key_range: Gary Dusbabek is working on
this for https://issues.apache.org/jira/browse/CASSANDRA-293.

12. "a way to group columns together across keys": You mean like
get_range_slice in trunk? If so, this will be in 0.5 (for
order-preserving partitioners only)

13. Secondary indexing and transactions: I think this is outside the
scope of Cassandra proper, but a layer implementing these on top of
Cassandra the way Google did for App Engine on top of Bigtable (see
http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx)
would be feasible.

14. Caching: I think a "query cache" is definitely worth trying.
Volunteers welcome.

15. Bulk delete: I'm working on this for
https://issues.apache.org/jira/browse/CASSANDRA-531.

16. Simple query front-end: There are a couple you can contribute to
-- http://github.com/driftx/chiton in GTK, and a web-based one in
contrib/cassandra_browser [trunk only]

As always, feel free to ask on cassandra-dev or IRC if you'd like to
work on any of these and want some pointers as to where to start. :)

-Jonathan

Wish list [from "users survey" thread]

Reply via email to