Hi, My understanding is that wide row support (i.e. many columns/CQL-rows/cells per partition key) has gotten much better in the past few years; even though the theoretical of 2 billion has been much higher than practical for a long time, it seems like now Cassandra is able to handle these better (ex. incremental compactions so Cassandra doesn't OOM).
So I'm wondering: - With more recent improvements (say, including up to 2.2 or maybe 3.0), is the practical limit still much lower than 2 billion? Do we have any idea what limits us in this regard? (Maybe repair is still another bottleneck?) - Is the 2 billion limit a SSTable limitation? https://issues.apache.org/jira/browse/CASSANDRA-7447 seems to indicate that it might be. Is there any future work we think will increase this limit? A couple of caveats: I am aware that even if such a large partition is possible it may not usually be practical because it works against Cassandra's primary feature of sharding data to multiple nodes and parallelize access. However some analytics/batch processing use-cases could benefit from the guarantee that a certain set of data is together on a node. It can also make certain data modeling situations a bit easier, where currently we just need to model around the limitation. Also, 2 billion rows for small columns only adds up to data in the tens of gigabytes, and use of larger nodes these days means that practically one node could hold much larger partitions. And lastly, there are just cases where the 99.999% of partition keys are going to be pretty small, but there are potential outliers that could be very large; it would be great for Cassandra to handle these even if it is suboptimal, helping us all avoid having to model around such exceptions. Well, this turned into something of an essay... thanks for reading and glad to receive input on this.