Cassandra auto shards, so you just need to point at your cluster and cassandra does the rest. You should read up on different partitioners though before you go live in production, because its not too easy to switch once you make that decision.
http://wiki.apache.org/cassandra/StorageConfiguration#Partitioner Ray Slakinski On 2010-01-28, at 7:29 PM, Suhail Doshi wrote: > Another piece I am interested in is how cassandra distributes the data > automatically. In MySQL you need to shard and you'd pick the shard to > request info from--how does that translate in cassandra? > > On Thu, Jan 28, 2010 at 7:23 PM, Suhail Doshi <suh...@mixpanel.com> wrote: > >> We've started to use Cassandra in production and just have one node right >> now. Here's one of our ColumnFamilys: >> >> 16G Jan 28 22:28 SomeIndex-5467-Index.db >> 196M Jan 28 22:32 SomeIndex-5487-Index.db >> >> The first bottle neck you encounter is reads--writes are extremely fast even >> with one node. >> >> My question is, is the size of the *-Index.db files the amount of RAM you >> need available for Cassandra to do reads fast? >> >> What are some configuration options you would need to tweak besides the >> JVM's max memory size being larger. Is there any default configurations >> commonly missed? >> >> Next, if you provision more nodes will Cassandra distribute the data in >> memory so I don't need a single 16 GB node? Is there anything I need to >> build in my application logic to make this work correctly. Ideally, if I had >> a 16 GB index, I'd want it spread across 4 4GB nodes. Can any client connect >> to any one node request info and it will get the info back from a node that >> has that part of the index in memory? >> >> What's the best way to do efficient reads? >> >> Suhail >> >>