Re: calculating sizes on disk

2013-12-07 Thread John Sanda
I finally got the math right for the partition index after tracing through
SSTableWriter.IndexWriter.append(DecoratedKey key, RowIndexEntry
indexEntry). I should also note that I am working off of the source for
1.2.9. Here is the break down for what gets written to disk in the append()
call (my keys are 4 bytes  while column names and values are both 8 bytes).

// key
key length - 2 bytes
key - 4 bytes

// index entry
index entry position - 8 bytes
index entry size - 4 bytes

// when the index entry contains columns index, the following entries will
be written ceil(total_row_size / column_index_size_in_kb) times
local deletion time - 4 bytes
marked for delete at - 8 bytes
columns index entry first name length - 2 bytes
columns index entry first name - 8 bytes
columns index entry last name length - 2 bytes
columns index entry last name - 8 bytes
columns index entry offset - 8 bytes
columns index entry width - 8 bytes

I also went through the serialization code for bloom filters, but I do not
understand the math. Even with my slightly improved understanding, I am
still uncertain about how effective any sizing analysis will be since the
numbers of rows and columns will vary per SSTable.


On Fri, Dec 6, 2013 at 3:53 PM, John Sanda john.sa...@gmail.com wrote:

 I have done that, but it only gets me so far because the cluster and app
 that manages it is run by 3rd parties. Ideally, I would like to provide my
 end users with a formula or heuristic for establishing some sort of
 baselines that at least gives them a general idea for planning. Generating
 data as you have suggested and as I have done is helpful, but it is hard
 for users to extrapolate out from that.


 On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote:

 Not sure what your end setup will be, but I would probably just spin up a
 cluster and fill it with typical data to and measure the size on disk.

 __
 Sent from iPhone

 On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote:

 I am trying to do some disk capacity planning. I have been referring the
 datastax docs[1] and this older blog post[2]. I have a column family with
 the following,

 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160

 Is there an effective way to calculate the sizes (or at least a decent
 approximation) of the bloom filters and partition indexes on disk?

 [1] Calculating user data 
 sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html
 [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/

 --

 - John




 --

 - John




-- 

- John


Re: calculating sizes on disk

2013-12-07 Thread Steven Siebert
Nice work John. If you learn any more, please share.

S


On Sat, Dec 7, 2013 at 11:50 AM, John Sanda john.sa...@gmail.com wrote:

 I finally got the math right for the partition index after tracing through
 SSTableWriter.IndexWriter.append(DecoratedKey key, RowIndexEntry
 indexEntry). I should also note that I am working off of the source for
 1.2.9. Here is the break down for what gets written to disk in the append()
 call (my keys are 4 bytes  while column names and values are both 8 bytes).

 // key
 key length - 2 bytes
 key - 4 bytes

 // index entry
 index entry position - 8 bytes
 index entry size - 4 bytes

 // when the index entry contains columns index, the following entries will
 be written ceil(total_row_size / column_index_size_in_kb) times
 local deletion time - 4 bytes
 marked for delete at - 8 bytes
 columns index entry first name length - 2 bytes
 columns index entry first name - 8 bytes
 columns index entry last name length - 2 bytes
 columns index entry last name - 8 bytes
 columns index entry offset - 8 bytes
 columns index entry width - 8 bytes

 I also went through the serialization code for bloom filters, but I do not
 understand the math. Even with my slightly improved understanding, I am
 still uncertain about how effective any sizing analysis will be since the
 numbers of rows and columns will vary per SSTable.


 On Fri, Dec 6, 2013 at 3:53 PM, John Sanda john.sa...@gmail.com wrote:

 I have done that, but it only gets me so far because the cluster and app
 that manages it is run by 3rd parties. Ideally, I would like to provide my
 end users with a formula or heuristic for establishing some sort of
 baselines that at least gives them a general idea for planning. Generating
 data as you have suggested and as I have done is helpful, but it is hard
 for users to extrapolate out from that.


 On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote:

 Not sure what your end setup will be, but I would probably just spin up
 a cluster and fill it with typical data to and measure the size on disk.

 __
 Sent from iPhone

 On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote:

 I am trying to do some disk capacity planning. I have been referring the
 datastax docs[1] and this older blog post[2]. I have a column family with
 the following,

 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160

 Is there an effective way to calculate the sizes (or at least a decent
 approximation) of the bloom filters and partition indexes on disk?

 [1] Calculating user data 
 sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html
 [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/

 --

 - John




 --

 - John




 --

 - John



Re: calculating sizes on disk

2013-12-07 Thread Tim Wintle
I have found that in (limited) practice that it's fairly hard to estimate
due to compression and compaction behaviour. I think measuring and
extrapolating (with an understanding of the datastructures) is the most
effective.

Tim

Sent from my phone
On 6 Dec 2013 20:54, John Sanda john.sa...@gmail.com wrote:

 I have done that, but it only gets me so far because the cluster and app
 that manages it is run by 3rd parties. Ideally, I would like to provide my
 end users with a formula or heuristic for establishing some sort of
 baselines that at least gives them a general idea for planning. Generating
 data as you have suggested and as I have done is helpful, but it is hard
 for users to extrapolate out from that.


 On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote:

 Not sure what your end setup will be, but I would probably just spin up a
 cluster and fill it with typical data to and measure the size on disk.

 __
 Sent from iPhone

 On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote:

 I am trying to do some disk capacity planning. I have been referring the
 datastax docs[1] and this older blog post[2]. I have a column family with
 the following,

 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160

 Is there an effective way to calculate the sizes (or at least a decent
 approximation) of the bloom filters and partition indexes on disk?

 [1] Calculating user data 
 sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html
 [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/

 --

 - John




 --

 - John



Re: calculating sizes on disk

2013-12-06 Thread John Sanda
I should have also mentioned that I have tried using the calculations from
the storage sizing post. My lack of success may be due to the post basing
things off of Cassandra 0.8 as well as a lack of understanding in how to do
some of the calculations.


On Fri, Dec 6, 2013 at 3:08 PM, John Sanda john.sa...@gmail.com wrote:

 I am trying to do some disk capacity planning. I have been referring the
 datastax docs[1] and this older blog post[2]. I have a column family with
 the following,

 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160

 Is there an effective way to calculate the sizes (or at least a decent
 approximation) of the bloom filters and partition indexes on disk?

 [1] Calculating user data 
 sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html
 [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/

 --

 - John




-- 

- John


Re: calculating sizes on disk

2013-12-06 Thread Jacob Rhoden
Not sure what your end setup will be, but I would probably just spin up a 
cluster and fill it with typical data to and measure the size on disk.

__
Sent from iPhone

 On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote:
 
 I am trying to do some disk capacity planning. I have been referring the 
 datastax docs[1] and this older blog post[2]. I have a column family with the 
 following,
 
 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160
 
 Is there an effective way to calculate the sizes (or at least a decent 
 approximation) of the bloom filters and partition indexes on disk?
 
 [1] Calculating user data size
 [2] Cassandra Storage Sizing
 
 -- 
 
 - John


Re: calculating sizes on disk

2013-12-06 Thread John Sanda
I have done that, but it only gets me so far because the cluster and app
that manages it is run by 3rd parties. Ideally, I would like to provide my
end users with a formula or heuristic for establishing some sort of
baselines that at least gives them a general idea for planning. Generating
data as you have suggested and as I have done is helpful, but it is hard
for users to extrapolate out from that.


On Fri, Dec 6, 2013 at 3:47 PM, Jacob Rhoden jacob.rho...@me.com wrote:

 Not sure what your end setup will be, but I would probably just spin up a
 cluster and fill it with typical data to and measure the size on disk.

 __
 Sent from iPhone

 On 7 Dec 2013, at 6:08 am, John Sanda john.sa...@gmail.com wrote:

 I am trying to do some disk capacity planning. I have been referring the
 datastax docs[1] and this older blog post[2]. I have a column family with
 the following,

 row key - 4 bytes
 column name - 8 bytes
 column value - 8 bytes
 max number of non-deleted columns per row - 20160

 Is there an effective way to calculate the sizes (or at least a decent
 approximation) of the bloom filters and partition indexes on disk?

 [1] Calculating user data 
 sizehttp://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/architecture/../../cassandra/architecture/architecturePlanningUserData_t.html
 [2] Cassandra Storage Sizing http://btoddb-cass-storage.blogspot.com/

 --

 - John




-- 

- John