Re: Cassandra cluster HW spec (commit log directory vs data file directory)
No. We built a pluggable cache provider for memcache. On Sun, Oct 30, 2011 at 7:31 PM, Mohit Anchlia wrote: > On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet > wrote: > > > > > > On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean > > wrote: > >> > >> Hey Chris, > >> > >> Thanks for sharing all the info. > >> I have few questions: > >> 1. What are you doing with so much memory :) ? How much of it do you > >> allocate for heap ? > > > > max heap is 12GB. we use the rest for cache. we run memcache on each node > > and allocate the remaining to that. > > Is this using off heap cache of Cassandra? > > > > >> > >> 2. What your network speed ? Do you use trunks ? Do you have a > dedicated > >> VLAN for gossip/store traffic ? > >> > > No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's. > > > >> > >> Cheers, > >> Sorin > >> > >> > >> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet > >> wrote: > >>> > >>> RE: RAID0 Recommendation > >>> Cassandra supports multiple data file directories. Because we do > >>> compactions, it's just much easier to deal with (1) data file > directory that > >>> is stripped across all disks as 1 volume (RAID0). There are other ways > to > >>> accomplish this though. At Twitter we use software raid (RAID0 & > RAID10). > >>> We own the physical hardware and have found that even with hardware > raid, > >>> software raid in Linux actually faster. The reason being is: > >>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 > >>> We have found that using far-copies is much faster over near-copies. We > >>> set the i/o scheduler to noop at the moment. We might move back to CFQ > with > >>> more tuning in the future. > >>> We use RAID10 for cases where we need better disk performance if we are > >>> hitting the disk often, sacrificing storage. We initially thought RAID0 > >>> should be faster over RAID10 until we found out about the near vs far > >>> layouts. > >>> RE: Hardware > >>> This is going to depend on how well your automated infrastructure is, > but > >>> we chose the path of finding the cheapest servers we could get from > >>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". > >>> We are in the process of making changes to our servers, I'll report > back > >>> in when we have more details to share. > >>> I wouldn't recommend 75 CFs. It could work but just seems too complex. > >>> Another recommendation for clusters, always go big. You will be > thankful > >>> in the future for this. Even if you can do this on 3-6 nodes, go much > larger > >>> for future expansion. If you own your hardware and racks, I recommend > making > >>> sure to size out the rack diversity and # of nodes per rack. Also take > into > >>> account the replication factor when doing this. RF=3, should be min of > 3 > >>> racks, and # of nodes per rack should be divisible by the replication > >>> factor. This has worked out pretty well for us. Our biggest problems > today > >>> are adding 100s of nodes to existing clusters at once. I'm not sure > how many > >>> other companies are having this problem, but it's certainly on our > radar to > >>> improve, if you get to that point :) > >>> > >>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe > >>> wrote: > > Hi everyone, > > I am currently in the process of writing a hardware proposal for a > Cassandra cluster for storing a lot of monitoring time series data. My > workload is write intensive and my data set is extremely varied in > types of > variables and insertion rate for these variables (I will have to > handle an > order of 2 million variables coming in, each at very different rates > - the > majority of them will come at very low rates but there are many that > will > come at higher rates constant rates and a few coming in with huge > spikes in > rates). These variables correspond to all basic C++ types and arrays > of > these types. The highest insertion rates are received for basic > types, out > of which U32 variables seem to be the most prevalent (e.g. I recorded > 2 > million U32 vars were inserted in 8 mins of operation while 600.000 > doubles > and 170.000 strings were inserted during the same time. Note this > measurement was only for a subset of the total data currently taken > in). > > At the moment I am partitioning the data in Cassandra in 75 CFs (each > CF > corresponds to a logical partitioning of the set of variables > mentioned > before - but this partitioning is not related with the amount of data > or > rates...it is somewhat random). These 75 CFs account for ~1 million > of the > variables I need to store. I have a 3 node Cassandra 0.8.5 cluster > (each > node is a 4 real core with 4 GB RAM and split commit log directory > and data > file directory between two RAID arrays with HDDs). I can handle the > load in > this configuration but the average CPU usage of the Cassand
Re: Cassandra cluster HW spec (commit log directory vs data file directory)
On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet wrote: > > > On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean > wrote: >> >> Hey Chris, >> >> Thanks for sharing all the info. >> I have few questions: >> 1. What are you doing with so much memory :) ? How much of it do you >> allocate for heap ? > > max heap is 12GB. we use the rest for cache. we run memcache on each node > and allocate the remaining to that. Is this using off heap cache of Cassandra? > >> >> 2. What your network speed ? Do you use trunks ? Do you have a dedicated >> VLAN for gossip/store traffic ? >> > No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's. > >> >> Cheers, >> Sorin >> >> >> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet >> wrote: >>> >>> RE: RAID0 Recommendation >>> Cassandra supports multiple data file directories. Because we do >>> compactions, it's just much easier to deal with (1) data file directory that >>> is stripped across all disks as 1 volume (RAID0). There are other ways to >>> accomplish this though. At Twitter we use software raid (RAID0 & RAID10). >>> We own the physical hardware and have found that even with hardware raid, >>> software raid in Linux actually faster. The reason being is: >>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 >>> We have found that using far-copies is much faster over near-copies. We >>> set the i/o scheduler to noop at the moment. We might move back to CFQ with >>> more tuning in the future. >>> We use RAID10 for cases where we need better disk performance if we are >>> hitting the disk often, sacrificing storage. We initially thought RAID0 >>> should be faster over RAID10 until we found out about the near vs far >>> layouts. >>> RE: Hardware >>> This is going to depend on how well your automated infrastructure is, but >>> we chose the path of finding the cheapest servers we could get from >>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". >>> We are in the process of making changes to our servers, I'll report back >>> in when we have more details to share. >>> I wouldn't recommend 75 CFs. It could work but just seems too complex. >>> Another recommendation for clusters, always go big. You will be thankful >>> in the future for this. Even if you can do this on 3-6 nodes, go much larger >>> for future expansion. If you own your hardware and racks, I recommend making >>> sure to size out the rack diversity and # of nodes per rack. Also take into >>> account the replication factor when doing this. RF=3, should be min of 3 >>> racks, and # of nodes per rack should be divisible by the replication >>> factor. This has worked out pretty well for us. Our biggest problems today >>> are adding 100s of nodes to existing clusters at once. I'm not sure how many >>> other companies are having this problem, but it's certainly on our radar to >>> improve, if you get to that point :) >>> >>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe >>> wrote: Hi everyone, I am currently in the process of writing a hardware proposal for a Cassandra cluster for storing a lot of monitoring time series data. My workload is write intensive and my data set is extremely varied in types of variables and insertion rate for these variables (I will have to handle an order of 2 million variables coming in, each at very different rates - the majority of them will come at very low rates but there are many that will come at higher rates constant rates and a few coming in with huge spikes in rates). These variables correspond to all basic C++ types and arrays of these types. The highest insertion rates are received for basic types, out of which U32 variables seem to be the most prevalent (e.g. I recorded 2 million U32 vars were inserted in 8 mins of operation while 600.000 doubles and 170.000 strings were inserted during the same time. Note this measurement was only for a subset of the total data currently taken in). At the moment I am partitioning the data in Cassandra in 75 CFs (each CF corresponds to a logical partitioning of the set of variables mentioned before - but this partitioning is not related with the amount of data or rates...it is somewhat random). These 75 CFs account for ~1 million of the variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each node is a 4 real core with 4 GB RAM and split commit log directory and data file directory between two RAID arrays with HDDs). I can handle the load in this configuration but the average CPU usage of the Cassandra nodes is slightly above 50%. As I will need to add 12 more CFs (corresponding to another ~ 1 million variables) plus potentially other data later, it is clear that I need better hardware (also for the retrieval part). I am looking at Dell servers (Power Edge etc) Questions: 1. Is anyone using Dell HW for their Ca
Re: Cassandra cluster HW spec (commit log directory vs data file directory)
On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean wrote: > Hey Chris, > > Thanks for sharing all the info. > I have few questions: > 1. What are you doing with so much memory :) ? How much of it do you > allocate for heap ? > max heap is 12GB. we use the rest for cache. we run memcache on each node and allocate the remaining to that. > 2. What your network speed ? Do you use trunks ? Do you have a dedicated > VLAN for gossip/store traffic ? > > No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's. > Cheers, > Sorin > > > On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet wrote: > >> RE: RAID0 Recommendation >> >> Cassandra supports multiple data file directories. Because we do >> compactions, it's just much easier to deal with (1) data file directory >> that is stripped across all disks as 1 volume (RAID0). There are other ways >> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10). >> >> We own the physical hardware and have found that even with hardware raid, >> software raid in Linux actually faster. The reason being is: >> >> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 >> >> We have found that using far-copies is much faster over near-copies. We >> set the i/o scheduler to noop at the moment. We might move back to CFQ with >> more tuning in the future. >> >> We use RAID10 for cases where we need better disk performance if we are >> hitting the disk often, sacrificing storage. We initially thought RAID0 >> should be faster over RAID10 until we found out about the near vs far >> layouts. >> >> RE: Hardware >> >> This is going to depend on how well your automated infrastructure is, but >> we chose the path of finding the cheapest servers we could get from >> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". >> >> We are in the process of making changes to our servers, I'll report back >> in when we have more details to share. >> >> I wouldn't recommend 75 CFs. It could work but just seems too complex. >> >> Another recommendation for clusters, always go big. You will be thankful >> in the future for this. Even if you can do this on 3-6 nodes, go much >> larger for future expansion. If you own your hardware and racks, I >> recommend making sure to size out the rack diversity and # of nodes per >> rack. Also take into account the replication factor when doing this. RF=3, >> should be min of 3 racks, and # of nodes per rack should be divisible by >> the replication factor. This has worked out pretty well for us. Our biggest >> problems today are adding 100s of nodes to existing clusters at once. I'm >> not sure how many other companies are having this problem, but it's >> certainly on our radar to improve, if you get to that point :) >> >> >> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote: >> >>> Hi everyone, >>> >>> I am currently in the process of writing a hardware proposal for a >>> Cassandra cluster for storing a lot of monitoring time series data. My >>> workload is write intensive and my data set is extremely varied in types of >>> variables and insertion rate for these variables (I will have to handle an >>> order of 2 million variables coming in, each at very different rates - the >>> majority of them will come at very low rates but there are many that will >>> come at higher rates constant rates and a few coming in with huge spikes in >>> rates). These variables correspond to all basic C++ types and arrays of >>> these types. The highest insertion rates are received for basic types, out >>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2 >>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles >>> and 170.000 strings were inserted during the same time. Note this >>> measurement was only for a subset of the total data currently taken in). >>> >>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF >>> corresponds to a logical partitioning of the set of variables mentioned >>> before - but this partitioning is not related with the amount of data or >>> rates...it is somewhat random). These 75 CFs account for ~1 million of the >>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each >>> node is a 4 real core with 4 GB RAM and split commit log directory and data >>> file directory between two RAID arrays with HDDs). I can handle the load in >>> this configuration but the average CPU usage of the Cassandra nodes is >>> slightly above 50%. As I will need to add 12 more CFs (corresponding to >>> another ~ 1 million variables) plus potentially other data later, it is >>> clear that I need better hardware (also for the retrieval part). >>> >>> I am looking at Dell servers (Power Edge etc) >>> >>> Questions: >>> >>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they >>> behave? Anybody care to share their configurations or tips for buying, what >>> to avoid etc? >>> >>> 2. Obviously I am going to keep to the advice
Very slow writes in Cassandra
Hello Cassandra users, I'm newbie in NoSQL and Cassandara in particular. At the moment doing some benchmarking with Cassandra and experiencing very slow write throughput. As it is said, Cassandra can perform hundreds of thousands of inserts per second, however I'm not observing this: 1) when I send 100 thousand inserts simultaneously via 8 CQL clients, then throughput is ~14470 inserts per second. 2) when I do the same via 8 Thrift clients, then throughput is ~16300 inserts per seconds. I think Cassandra performance can be improved, but I don't know what to tune. Please take a look at the test conditions below and advise something. Thank you. Tests conditions: 1. Cassandra cluster is deployed on three machines, each machine has 8 cores Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, RAM is 16GB, network speed is 1000Mb/s. 2. The data sample is set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] = '1.0'; set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['security'] = 'AA1'; set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['price'] = '47.1'; set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['volume'] = '300.0'; set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1'; set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] = '2.0'; set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['security'] = 'AA1'; set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['price'] = '44.89'; set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['volume'] = '310.0'; set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1'; set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] = '3.0'; set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['security'] = 'AA2'; set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['price'] = '0.35'; 3. Commit log is written on the local hard drive, the data is written on Lustre. 4. Keyspace description Keyspace: MD: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] Column Families: ColumnFamily: MM Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds:20.0/14400 Memtable thresholds: 2.3247/1440/496 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Built indexes: [] Thanks in advance. Evgeny.
Re: What does a cluster throttled by the network look like ?
You are answering your own question here. If you are running at 80% of network bandwidth, you are saturating your network. AFAIK - most distributed databases are running on gigabit, not 100mb. I recommend you upgrade your switch (and nics if necessary). Gigabit is insanely cheap now. In the extremes, some distributed databases have dedicated gigabit for internode traffic vs external traffic. On Oct 30, 2011 3:15 PM, "Philippe" wrote: > What I do see is that each server in the cluster has about 60-80Mb/s > traffic in & out and I'm running on 100Mb ethernet.
Re: Cassandra cluster HW spec (commit log directory vs data file directory)
Dne 30.10.2011 23:34, Sorin Julean napsal(a): Hey Chris, Thanks for sharing all the info. I have few questions: 1. What are you doing with so much memory :) ? cassandra eats memory like there is no tomorrow on large databases. It keeps some structures in memory which depends on database size. 2. What your network speed ? 100 mbit is failure 3. Do you have a dedicated VLAN for gossip/store traffic ? We share hadoop + cassandra on VLAN due to low budget. It is best to have them separated. Hadoop is very heavy on network.
Re: Cassandra cluster HW spec (commit log directory vs data file directory)
Hey Chris, Thanks for sharing all the info. I have few questions: 1. What are you doing with so much memory :) ? How much of it do you allocate for heap ? 2. What your network speed ? Do you use trunks ? Do you have a dedicated VLAN for gossip/store traffic ? Cheers, Sorin On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet wrote: > RE: RAID0 Recommendation > > Cassandra supports multiple data file directories. Because we do > compactions, it's just much easier to deal with (1) data file directory > that is stripped across all disks as 1 volume (RAID0). There are other ways > to accomplish this though. At Twitter we use software raid (RAID0 & RAID10). > > We own the physical hardware and have found that even with hardware raid, > software raid in Linux actually faster. The reason being is: > > http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 > > We have found that using far-copies is much faster over near-copies. We > set the i/o scheduler to noop at the moment. We might move back to CFQ with > more tuning in the future. > > We use RAID10 for cases where we need better disk performance if we are > hitting the disk often, sacrificing storage. We initially thought RAID0 > should be faster over RAID10 until we found out about the near vs far > layouts. > > RE: Hardware > > This is going to depend on how well your automated infrastructure is, but > we chose the path of finding the cheapest servers we could get from > Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". > > We are in the process of making changes to our servers, I'll report back > in when we have more details to share. > > I wouldn't recommend 75 CFs. It could work but just seems too complex. > > Another recommendation for clusters, always go big. You will be thankful > in the future for this. Even if you can do this on 3-6 nodes, go much > larger for future expansion. If you own your hardware and racks, I > recommend making sure to size out the rack diversity and # of nodes per > rack. Also take into account the replication factor when doing this. RF=3, > should be min of 3 racks, and # of nodes per rack should be divisible by > the replication factor. This has worked out pretty well for us. Our biggest > problems today are adding 100s of nodes to existing clusters at once. I'm > not sure how many other companies are having this problem, but it's > certainly on our radar to improve, if you get to that point :) > > > On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote: > >> Hi everyone, >> >> I am currently in the process of writing a hardware proposal for a >> Cassandra cluster for storing a lot of monitoring time series data. My >> workload is write intensive and my data set is extremely varied in types of >> variables and insertion rate for these variables (I will have to handle an >> order of 2 million variables coming in, each at very different rates - the >> majority of them will come at very low rates but there are many that will >> come at higher rates constant rates and a few coming in with huge spikes in >> rates). These variables correspond to all basic C++ types and arrays of >> these types. The highest insertion rates are received for basic types, out >> of which U32 variables seem to be the most prevalent (e.g. I recorded 2 >> million U32 vars were inserted in 8 mins of operation while 600.000 doubles >> and 170.000 strings were inserted during the same time. Note this >> measurement was only for a subset of the total data currently taken in). >> >> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF >> corresponds to a logical partitioning of the set of variables mentioned >> before - but this partitioning is not related with the amount of data or >> rates...it is somewhat random). These 75 CFs account for ~1 million of the >> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each >> node is a 4 real core with 4 GB RAM and split commit log directory and data >> file directory between two RAID arrays with HDDs). I can handle the load in >> this configuration but the average CPU usage of the Cassandra nodes is >> slightly above 50%. As I will need to add 12 more CFs (corresponding to >> another ~ 1 million variables) plus potentially other data later, it is >> clear that I need better hardware (also for the retrieval part). >> >> I am looking at Dell servers (Power Edge etc) >> >> Questions: >> >> 1. Is anyone using Dell HW for their Cassandra clusters? How do they >> behave? Anybody care to share their configurations or tips for buying, what >> to avoid etc? >> >> 2. Obviously I am going to keep to the advice on the >> http://wiki.apache.org/cassandra/CassandraHardware and split the >> commmitlog and data on separate disks. I was going to use SSD for commitlog >> but then did some more research and found out that it doesn't make sense to >> use SSDs for sequential appends because it won't have a performance >> advantage with respect to rotational media.
What does a cluster throttled by the network look like ?
Dear all, I'm working with a 12-node, RF=3 cluster on low-end hardware (core i5 with 16GB of RAM & SATA disks). I'm using a BOP and each node has a load between 50GB and 100GB (yes, I apparently did not set my tokens right... I'll fix that later). I'm hitting the cluster with a little over 100 concurrent threads during reads & writes in counter columns & plain columns. When I peek at my various servers & look at my munin graphs, it looks like everything is great : iostat -dmx shows peeks of 5% utilization at worst and vmstat shows peeks of 30% CPU utilization, no swapping and plenty of caching. Munin shows me that the JVM's heap is prancing around 50% of its allocated amount (and I have no row caching for now). So my thinking has been that the cluster is under-utilized but when I increase the number of threads hitting the cluster, I do not see the throughput increase which has been puzzling. I do see an occasional TimedOutException now that I've made my batches smaller (to dozens of mutations rather than hundreds) and my slices smaller too. What I do see is that each server in the cluster has about 60-80Mb/s traffic in & out and I'm running on 100Mb ethernet. I seem to remember that you can only get to a little over 80% of the available ethernet bandwidth in real-life. Could my performance be limited by the Network Adapters ? Thanks
Re: Cassandra cluster HW spec (commit log directory vs data file directory)
Hi Chris, Thanks for your post. I can see you guys handle extremely large amounts of data compared to my system. Yes I will own the racks and the machines but the problem is I am limited by actual physical space in our data center (believe it or not) and also the budget. It would be hard for me to justify acquisition of more than 3-4 machines, that's why I will need to find a system that empties Cassandra and transfers the data to another mass storage system. Thanks for the RAID10 suggestion...I'll look into that! I've seen everybody warns me about the number of CFs si I'll listen to you guys and reduce the number. Yeah, it would be nice to hear about your HW evolution.I will report back as well once I finish my proposal! Cheers, Alex On Sun, Oct 30, 2011 at 4:00 AM, Chris Goffinet wrote: > RE: RAID0 Recommendation > > Cassandra supports multiple data file directories. Because we do > compactions, it's just much easier to deal with (1) data file directory > that is stripped across all disks as 1 volume (RAID0). There are other ways > to accomplish this though. At Twitter we use software raid (RAID0 & RAID10). > > We own the physical hardware and have found that even with hardware raid, > software raid in Linux actually faster. The reason being is: > > http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 > > We have found that using far-copies is much faster over near-copies. We > set the i/o scheduler to noop at the moment. We might move back to CFQ with > more tuning in the future. > > We use RAID10 for cases where we need better disk performance if we are > hitting the disk often, sacrificing storage. We initially thought RAID0 > should be faster over RAID10 until we found out about the near vs far > layouts. > > RE: Hardware > > This is going to depend on how well your automated infrastructure is, but > we chose the path of finding the cheapest servers we could get from > Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". > > We are in the process of making changes to our servers, I'll report back > in when we have more details to share. > > I wouldn't recommend 75 CFs. It could work but just seems too complex. > > Another recommendation for clusters, always go big. You will be thankful > in the future for this. Even if you can do this on 3-6 nodes, go much > larger for future expansion. If you own your hardware and racks, I > recommend making sure to size out the rack diversity and # of nodes per > rack. Also take into account the replication factor when doing this. RF=3, > should be min of 3 racks, and # of nodes per rack should be divisible by > the replication factor. This has worked out pretty well for us. Our biggest > problems today are adding 100s of nodes to existing clusters at once. I'm > not sure how many other companies are having this problem, but it's > certainly on our radar to improve, if you get to that point :) > > > On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote: > >> Hi everyone, >> >> I am currently in the process of writing a hardware proposal for a >> Cassandra cluster for storing a lot of monitoring time series data. My >> workload is write intensive and my data set is extremely varied in types of >> variables and insertion rate for these variables (I will have to handle an >> order of 2 million variables coming in, each at very different rates - the >> majority of them will come at very low rates but there are many that will >> come at higher rates constant rates and a few coming in with huge spikes in >> rates). These variables correspond to all basic C++ types and arrays of >> these types. The highest insertion rates are received for basic types, out >> of which U32 variables seem to be the most prevalent (e.g. I recorded 2 >> million U32 vars were inserted in 8 mins of operation while 600.000 doubles >> and 170.000 strings were inserted during the same time. Note this >> measurement was only for a subset of the total data currently taken in). >> >> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF >> corresponds to a logical partitioning of the set of variables mentioned >> before - but this partitioning is not related with the amount of data or >> rates...it is somewhat random). These 75 CFs account for ~1 million of the >> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each >> node is a 4 real core with 4 GB RAM and split commit log directory and data >> file directory between two RAID arrays with HDDs). I can handle the load in >> this configuration but the average CPU usage of the Cassandra nodes is >> slightly above 50%. As I will need to add 12 more CFs (corresponding to >> another ~ 1 million variables) plus potentially other data later, it is >> clear that I need better hardware (also for the retrieval part). >> >> I am looking at Dell servers (Power Edge etc) >> >> Questions: >> >> 1. Is anyone using Dell HW for their Cassandra clusters? How do they >> behave? Anybody care to s
ByteBuffer as an initial serializer to read columns with mixed datatypes ?
I have a mix of byte[] & Integer column names/ values within a CF rows. So should ByteBuffer be my initial choice for the serializer while making the read query to the database for the mixed datatypes & then I should retrieve the byte[] or Integer from ByteBuffer using the ByteBuffer api's getInt() method ? Is this a preferable way to read columns with integer/ byte[] names, initially as bytebuffer(s) & later converting them to Integer or byte[] ?
Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE
If your summary data is frequently accessed, you will probably be best off storing the two sets of data separately (either in separate column families or with different key-prefixes). This will give you the greatest cache-locality for your summary data, which you say is popular. If your summary data is very well cached, then it won't matter that it's might require two disk-seeks to get summary+details, because your summary data is usually in cache anyhow. If you want a more specific recommendation that that, we'd need to see answers to the following questions: (a) how big is the summary data (total, per row)? (average, max) (b) how big is the detail data (total, per row)? (average, max) (b) what is the read/write traffic to the summary data? ..detail data? A side note about caches... IMO, you're better off getting the cache behavior you want through physical ordering than through more explicit caching. This is because most modern databases (cassandra included) go through the OS buffer cache already, and there is some amount of duplicating of data involved in trying to application cache data. If your application cache hitrate is very high (90%+) this can work out, but if it's lower (50%) it can sometimes have poor effects on the cache efficiency of both the application cache and OS buffer cache (because of data being duplicated in both caches).