[Cassandra Wiki] Update of "CassandraLimitations" by St uHood

Apache Wiki Tue, 25 May 2010 14:49:00 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraLimitations" page has been changed by StuHood.
http://wiki.apache.org/cassandra/CassandraLimitations?action=diff&rev1=9&rev2=10

--------------------------------------------------

  == Artifacts of the current code base ==
   * The byte[] size of a value can't be more than 2^31-1.
   * Cassandra's compaction code currently deserializes an entire row (per 
columnfamily) at a time.  So all the data from a given columnfamily/key pair 
must fit in memory.  Fixing this is relatively easy since columns are stored 
in-order on disk so there is really no reason you have to deserialize 
row-at-a-time except that that is easier with the current encapsulation of 
functionality.  This will be fixed in 
https://issues.apache.org/jira/browse/CASSANDRA-16
+    * A related limitation is that an entire row cannot be larger than 2^31-1 
bytes, since the length of rows is serialized to disk using an integer.
   * Cassandra has two levels of indexes: key and column.  But in super 
columnfamilies there is a third level of subcolumns; these are not indexed, and 
any request for a subcolumn deserializes _all_ the subcolumns in that 
supercolumn.  So you want to avoid a data model that requires large numbers of 
subcolumns.  https://issues.apache.org/jira/browse/CASSANDRA-598 is open to 
remove this limitation.
   * <<Anchor(streaming)>>Cassandra's public API is based on Thrift, which 
offers no streaming abilities -- any value written or fetched has to fit in 
memory.  This is inherent to Thrift's design; I don't see it changing.  So 
adding large object support to Cassandra would need a special API that manually 
split the large objects up into pieces.  Jonathan Ellis sketched out one 
approach in http://issues.apache.org/jira/browse/CASSANDRA-265.  As a 
workaround in the meantime, you can manually split files into chunks of 
whatever size you are comfortable with -- at least one person is using 64MB -- 
and making a file correspond to a row, with the chunks as column values.
   * Thrift will crash Cassandra if you send random or malicious data to it.  
This makes exposing the Cassandra port directly to the outside internet a Bad 
Idea.  See http://issues.apache.org/jira/browse/CASSANDRA-475 and 
http://issues.apache.org/jira/browse/THRIFT-601 for details.

[Cassandra Wiki] Update of "CassandraLimitations" by St uHood

Reply via email to