Cassandra data model misconceptions, and their sources

Evan Weaver Mon, 17 Aug 2009 11:11:14 -0700

Ok, here are the common Cassandra misconceptions, and their sources,
gleaned from experience and talking to various people.


Not listed in any particular order.

1. A key is global, and data in different column families must be related.
  - BigTable paper
  - key precedence in Thrift API

2. Table is like a row-oriented table
  - the name
  - somewhat fixed by changing to keyspace

3. Keyspace is not like a database (in SQL/CouchDB/MongoDB)
  - because it's not called that

4. Columns are literally columnar
  - the name
  - column sets are stored per key, not per column family (unlike
relational DBs)
  - column name as a piece of data is unusual (esp. in relational DBs)

5. Columns are versioned
  - BigTable paper

6. Super columns are magical
  - Name has no precendence anywhere
  - Super columns do not have timestamps unlike columns
  - Other MVAs are not fully recursive; just have values

7. Difference between column family, column, and super column is not clear
  - Everything has "column" in the name
  - "super", "family", and "" are not well-understood

8. Cassandra uses Paxos
  - BigTable paper

9. Cassandra uses client-side conflict resolution
  - Dynamo paper

A lot of things to get wrong, right off the bat.

Maybe this makes it clear why the BigTable references were not helpful
to us? For a new user, it provides as many wrong assumptions as
correct assumptions.

Evan

-- 
Evan Weaver

Cassandra data model misconceptions, and their sources

Reply via email to