We've started to use Cassandra in production and just have one node right now. Here's one of our ColumnFamilys:
16G Jan 28 22:28 SomeIndex-5467-Index.db 196M Jan 28 22:32 SomeIndex-5487-Index.db The first bottle neck you encounter is reads--writes are extremely fast even with one node. My question is, is the size of the *-Index.db files the amount of RAM you need available for Cassandra to do reads fast? What are some configuration options you would need to tweak besides the JVM's max memory size being larger. Is there any default configurations commonly missed? Next, if you provision more nodes will Cassandra distribute the data in memory so I don't need a single 16 GB node? Is there anything I need to build in my application logic to make this work correctly. Ideally, if I had a 16 GB index, I'd want it spread across 4 4GB nodes. Can any client connect to any one node request info and it will get the info back from a node that has that part of the index in memory? What's the best way to do efficient reads? Suhail