[Cassandra Wiki] Update of "CassandraLimitations" by Ro bertColi

Apache Wiki Tue, 25 May 2010 15:41:02 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "CassandraLimitations" page has been changed by RobertColi.
The comment on this change is: docs generally shouldn't be in the first 
person.. replaced an instance of "I" with a third person phrasing...
http://wiki.apache.org/cassandra/CassandraLimitations?action=diff&rev1=10&rev2=11

--------------------------------------------------

   * Cassandra's compaction code currently deserializes an entire row (per 
columnfamily) at a time.  So all the data from a given columnfamily/key pair 
must fit in memory.  Fixing this is relatively easy since columns are stored 
in-order on disk so there is really no reason you have to deserialize 
row-at-a-time except that that is easier with the current encapsulation of 
functionality.  This will be fixed in 
https://issues.apache.org/jira/browse/CASSANDRA-16
     * A related limitation is that an entire row cannot be larger than 2^31-1 
bytes, since the length of rows is serialized to disk using an integer.
   * Cassandra has two levels of indexes: key and column.  But in super 
columnfamilies there is a third level of subcolumns; these are not indexed, and 
any request for a subcolumn deserializes _all_ the subcolumns in that 
supercolumn.  So you want to avoid a data model that requires large numbers of 
subcolumns.  https://issues.apache.org/jira/browse/CASSANDRA-598 is open to 
remove this limitation.
-  * <<Anchor(streaming)>>Cassandra's public API is based on Thrift, which 
offers no streaming abilities -- any value written or fetched has to fit in 
memory.  This is inherent to Thrift's design; I don't see it changing.  So 
adding large object support to Cassandra would need a special API that manually 
split the large objects up into pieces.  Jonathan Ellis sketched out one 
approach in http://issues.apache.org/jira/browse/CASSANDRA-265.  As a 
workaround in the meantime, you can manually split files into chunks of 
whatever size you are comfortable with -- at least one person is using 64MB -- 
and making a file correspond to a row, with the chunks as column values.
+  * <<Anchor(streaming)>>Cassandra's public API is based on Thrift, which 
offers no streaming abilities -- any value written or fetched has to fit in 
memory.  This is inherent to Thrift's design and is therefore unlikely to 
change.  So adding large object support to Cassandra would need a special API 
that manually split the large objects up into pieces. A potential approach is 
described in http://issues.apache.org/jira/browse/CASSANDRA-265.  As a 
workaround in the meantime, you can manually split files into chunks of 
whatever size you are comfortable with -- at least one person is using 64MB -- 
and making a file correspond to a row, with the chunks as column values.
   * Thrift will crash Cassandra if you send random or malicious data to it.  
This makes exposing the Cassandra port directly to the outside internet a Bad 
Idea.  See http://issues.apache.org/jira/browse/CASSANDRA-475 and 
http://issues.apache.org/jira/browse/THRIFT-601 for details.
  
  == Obsolete Limitations ==

[Cassandra Wiki] Update of "CassandraLimitations" by Ro bertColi

Reply via email to