future very wide row support

Dan Kinder Mon, 31 Aug 2015 15:05:34 -0700

Hi,

My understanding is that wide row support (i.e. many columns/CQL-rows/cells
per partition key) has gotten much better in the past few years; even
though the theoretical of 2 billion has been much higher than practical for
a long time, it seems like now Cassandra is able to handle these better
(ex. incremental compactions so Cassandra doesn't OOM).


So I'm wondering:

   - With more recent improvements (say, including up to 2.2 or maybe 3.0),
   is the practical limit still much lower than 2 billion? Do we have any idea
   what limits us in this regard? (Maybe repair is still another bottleneck?)
   - Is the 2 billion limit a SSTable limitation?
   https://issues.apache.org/jira/browse/CASSANDRA-7447 seems to indicate
   that it might be. Is there any future work we think will increase this
   limit?

A couple of caveats:

I am aware that even if such a large partition is possible it may not
usually be practical because it works against Cassandra's primary feature
of sharding data to multiple nodes and parallelize access. However some
analytics/batch processing use-cases could benefit from the guarantee that
a certain set of data is together on a node. It can also make certain data
modeling situations a bit easier, where currently we just need to model
around the limitation. Also, 2 billion rows for small columns only adds up
to data in the tens of gigabytes, and use of larger nodes these days means
that practically one node could hold much larger partitions. And lastly,
there are just cases where the 99.999% of partition keys are going to be
pretty small, but there are potential outliers that could be very large; it
would be great for Cassandra to handle these even if it is suboptimal,
helping us all avoid having to model around such exceptions.

Well, this turned into something of an essay... thanks for reading and glad
to receive input on this.

future very wide row support

Reply via email to