Ok, here are the common Cassandra misconceptions, and their sources, gleaned from experience and talking to various people.
Not listed in any particular order. 1. A key is global, and data in different column families must be related. - BigTable paper - key precedence in Thrift API 2. Table is like a row-oriented table - the name - somewhat fixed by changing to keyspace 3. Keyspace is not like a database (in SQL/CouchDB/MongoDB) - because it's not called that 4. Columns are literally columnar - the name - column sets are stored per key, not per column family (unlike relational DBs) - column name as a piece of data is unusual (esp. in relational DBs) 5. Columns are versioned - BigTable paper 6. Super columns are magical - Name has no precendence anywhere - Super columns do not have timestamps unlike columns - Other MVAs are not fully recursive; just have values 7. Difference between column family, column, and super column is not clear - Everything has "column" in the name - "super", "family", and "" are not well-understood 8. Cassandra uses Paxos - BigTable paper 9. Cassandra uses client-side conflict resolution - Dynamo paper A lot of things to get wrong, right off the bat. Maybe this makes it clear why the BigTable references were not helpful to us? For a new user, it provides as many wrong assumptions as correct assumptions. Evan -- Evan Weaver
