Re: calculating sizes on disk
I finally got the math right for the partition index after tracing through SSTableWriter.IndexWriter.append(DecoratedKey key, RowIndexEntry indexEntry). I should also note that I am working off of the source for 1.2.9. Here is the break down for what gets written to disk in the append() call (my keys are 4 bytes while column names and values are both 8 bytes). // key key length - 2 bytes key - 4 bytes // index entry index entry position - 8 bytes index entry size - 4 bytes // when the index entry contains columns index, the following entries will be written ceil(total_row_size / column_index_size_in_kb) times local deletion time - 4 bytes marked for delete at - 8 bytes columns index entry first name length - 2 bytes columns index entry first name - 8 bytes columns index entry last name length - 2 bytes columns index entry last name - 8 bytes columns index entry offset - 8 bytes columns index entry width - 8 bytes I also went through the serialization code for bloom filters, but I do not understand the math. Even with my slightly improved understanding, I am still uncertain about how effective any sizing analysis will be since the numbers of rows and columns will vary per SSTable. On Fri, Dec 6, 2013 at 3:53 PM, John Sanda john.sa...@gmail.com wrote: I have done that, but it only gets me so far because the cluster and app that manages it is run by 3rd parties. Ideally, I would like to provide my end users with a formula or heuristic for establishing some sort of baselines that at least gives them a general idea for planning. Generating data as you have suggested and as I have done is helpful, but it is hard for users to extrapolate out from that. On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote: Not sure what your end setup will be, but I would probably just spin up a cluster and fill it with typical data to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/ -- - John -- - John -- - John
Re: calculating sizes on disk
Nice work John. If you learn any more, please share. S On Sat, Dec 7, 2013 at 11:50 AM, John Sanda john.sa...@gmail.com wrote: I finally got the math right for the partition index after tracing through SSTableWriter.IndexWriter.append(DecoratedKey key, RowIndexEntry indexEntry). I should also note that I am working off of the source for 1.2.9. Here is the break down for what gets written to disk in the append() call (my keys are 4 bytes while column names and values are both 8 bytes). // key key length - 2 bytes key - 4 bytes // index entry index entry position - 8 bytes index entry size - 4 bytes // when the index entry contains columns index, the following entries will be written ceil(total_row_size / column_index_size_in_kb) times local deletion time - 4 bytes marked for delete at - 8 bytes columns index entry first name length - 2 bytes columns index entry first name - 8 bytes columns index entry last name length - 2 bytes columns index entry last name - 8 bytes columns index entry offset - 8 bytes columns index entry width - 8 bytes I also went through the serialization code for bloom filters, but I do not understand the math. Even with my slightly improved understanding, I am still uncertain about how effective any sizing analysis will be since the numbers of rows and columns will vary per SSTable. On Fri, Dec 6, 2013 at 3:53 PM, John Sanda john.sa...@gmail.com wrote: I have done that, but it only gets me so far because the cluster and app that manages it is run by 3rd parties. Ideally, I would like to provide my end users with a formula or heuristic for establishing some sort of baselines that at least gives them a general idea for planning. Generating data as you have suggested and as I have done is helpful, but it is hard for users to extrapolate out from that. On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote: Not sure what your end setup will be, but I would probably just spin up a cluster and fill it with typical data to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/ -- - John -- - John -- - John
Re: calculating sizes on disk
I have found that in (limited) practice that it's fairly hard to estimate due to compression and compaction behaviour. I think measuring and extrapolating (with an understanding of the datastructures) is the most effective. Tim Sent from my phone On 6 Dec 2013 20:54, John Sanda john.sa...@gmail.com wrote: I have done that, but it only gets me so far because the cluster and app that manages it is run by 3rd parties. Ideally, I would like to provide my end users with a formula or heuristic for establishing some sort of baselines that at least gives them a general idea for planning. Generating data as you have suggested and as I have done is helpful, but it is hard for users to extrapolate out from that. On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote: Not sure what your end setup will be, but I would probably just spin up a cluster and fill it with typical data to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/ -- - John -- - John
Re: calculating sizes on disk
I should have also mentioned that I have tried using the calculations from the storage sizing post. My lack of success may be due to the post basing things off of Cassandra 0.8 as well as a lack of understanding in how to do some of the calculations. On Fri, Dec 6, 2013 at 3:08 PM, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/ -- - John -- - John
Re: calculating sizes on disk
Not sure what your end setup will be, but I would probably just spin up a cluster and fill it with typical data to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data size [2] Cassandra Storage Sizing -- - John
Re: calculating sizes on disk
I have done that, but it only gets me so far because the cluster and app that manages it is run by 3rd parties. Ideally, I would like to provide my end users with a formula or heuristic for establishing some sort of baselines that at least gives them a general idea for planning. Generating data as you have suggested and as I have done is helpful, but it is hard for users to extrapolate out from that. On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote: Not sure what your end setup will be, but I would probably just spin up a cluster and fill it with typical data to and measure the size on disk. __ Sent from iPhone On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote: I am trying to do some disk capacity planning. I have been referring the datastax docs[1] and this older blog post[2]. I have a column family with the following, row key - 4 bytes column name - 8 bytes column value - 8 bytes max number of non-deleted columns per row - 20160 Is there an effective way to calculate the sizes (or at least a decent approximation) of the bloom filters and partition indexes on disk? [1] Calculating user data sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/ -- - John -- - John