Re: how many rows can one partion key hold?
As a general, rough guideline, I would suggest that a partition be kept down to thousands or tens of thousands of rows, probably not more than 100K rows per partition, and physical size kept to tens of thousands to hundreds of thousands or maybe a few megabytes or ten megabytes maximum per partition. So, a partition with millions of rows or tens of megabytes is probably going to stress Cassandra too heavily. Of course, all of this depends on your actual data model and your actual data values and your actual access patterns. For example, if each row might be very large blobs, then a fewer number of rows would be warranted. Or if each row is very tiny, maybe even using compact storage, then a much larger row count is feasible. And if you are doing heavy updates, then tombstone generation and compaction become issues, that would argue for a smaller partition size. And if you tend to do a lot of repair, you want to keep your row count per partition down, but not too tiny. Moderation is the message - avoid the extremes, both too large and too small. This is not a matter of absolute limits per se, but common sense and operational efficiency. And unfortunately performance and capacity of Cassandra are highly non-linear, as are the values of most data, so exact prediction of size and performance, particularly Java heap pressure are quite unpredictable and require a proof of concept even to determine rough recommendations for your specific app. -- Jack Krupansky On Fri, Feb 27, 2015 at 1:44 AM, wateray wate...@163.com wrote: Hi all, My team is using Cassandra as our database. We have one question as below. As we know, the row with the some partition key will be stored in the some node. But how many rows can one partition key hold? What is it depend on? The node's volume or partition data size or partition rows size(the number of rows)? When one partition's data is extreme large, the write/read will slow? Can anyone show me some exist usecases. thanks!
Re: how many rows can one partion key hold?
you might want to read here http://wiki.apache.org/cassandra/CassandraLimitations jason On Fri, Feb 27, 2015 at 2:44 PM, wateray wate...@163.com wrote: Hi all, My team is using Cassandra as our database. We have one question as below. As we know, the row with the some partition key will be stored in the some node. But how many rows can one partition key hold? What is it depend on? The node's volume or partition data size or partition rows size(the number of rows)? When one partition's data is extreme large, the write/read will slow? Can anyone show me some exist usecases. thanks!
Re: how many rows can one partion key hold?
Also, note that repairs will be slower for larger rows and AFAIK also require slightly more memory. Also, to avoid many tombstones it could be worth to consider bucketing your partitions by time. Cheers, Jens On Fri, Feb 27, 2015 at 7:44 AM, wateray wate...@163.com wrote: Hi all, My team is using Cassandra as our database. We have one question as below. As we know, the row with the some partition key will be stored in the some node. But how many rows can one partition key hold? What is it depend on? The node's volume or partition data size or partition rows size(the number of rows)? When one partition's data is extreme large, the write/read will slow? Can anyone show me some exist usecases. thanks! -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: how many rows can one partion key hold?
When one partition's data is extreme large, the write/read will slow? This is actually a good question, If a partition has near 2 billion rows, will writes or reads get too slow? My understanding is it shouldn't, as data is indexed inside a partition and when you read or write you are doing a binary search, so it should take log (n) time for the operation. However, my practical experience tells me it can be a problem depending on the number of reads you do and how you do them. It your binary search takes 2 more steps, but for 1 billion reads, it could be considerably slow. Also, this search could be done on disk, as it depends a lot on how your cache is configured. Having a small amount per partition could be a Cassandra anti-pattern though, mainly if your reads can go across many partitions. I think there is no correct answer here, it depends on your data and on your application, IMHO. -Marcelo From: user@cassandra.apache.org Subject: Re: how many rows can one partion key hold? you might want to read here http://wiki.apache.org/cassandra/CassandraLimitations jason On Fri, Feb 27, 2015 at 2:44 PM, wateray wate...@163.com wrote: Hi all, My team is using Cassandra as our database. We have one question as below. As we know, the row with the some partition key will be stored in the some node. But how many rows can one partition key hold? What is it depend on? The node's volume or partition data size or partition rows size(the number of rows)? When one partition's data is extreme large, the write/read will slow? Can anyone show me some exist usecases. thanks!