Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Chris Goffinet
No. We built a pluggable cache provider for memcache.

On Sun, Oct 30, 2011 at 7:31 PM, Mohit Anchlia wrote:

> On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet 
> wrote:
> >
> >
> > On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean 
> > wrote:
> >>
> >> Hey Chris,
> >>
> >>  Thanks for sharing all  the info.
> >>  I have few questions:
> >>  1. What are you doing with so much memory :) ? How much of it do you
> >> allocate for heap ?
> >
> > max heap is 12GB. we use the rest for cache. we run memcache on each node
> > and allocate the remaining to that.
>
> Is this using off heap cache of Cassandra?
>
> >
> >>
> >>  2. What your network speed ? Do you use trunks ? Do you have a
> dedicated
> >> VLAN for gossip/store traffic ?
> >>
> > No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.
> >
> >>
> >> Cheers,
> >> Sorin
> >>
> >>
> >> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet 
> >> wrote:
> >>>
> >>> RE: RAID0 Recommendation
> >>> Cassandra supports multiple data file directories. Because we do
> >>> compactions, it's just much easier to deal with (1) data file
> directory that
> >>> is stripped across all disks as 1 volume (RAID0). There are other ways
> to
> >>> accomplish this though. At Twitter we use software raid (RAID0 &
> RAID10).
> >>> We own the physical hardware and have found that even with hardware
> raid,
> >>> software raid in Linux actually faster. The reason being is:
> >>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
> >>> We have found that using far-copies is much faster over near-copies. We
> >>> set the i/o scheduler to noop at the moment. We might move back to CFQ
> with
> >>> more tuning in the future.
> >>> We use RAID10 for cases where we need better disk performance if we are
> >>> hitting the disk often, sacrificing storage. We initially thought RAID0
> >>> should be faster over RAID10 until we found out about the near vs far
> >>> layouts.
> >>> RE: Hardware
> >>> This is going to depend on how well your automated infrastructure is,
> but
> >>> we chose the path of finding the cheapest servers we could get from
> >>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
> >>> We are in the process of making changes to our servers, I'll report
> back
> >>> in when we have more details to share.
> >>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
> >>> Another recommendation for clusters, always go big. You will be
> thankful
> >>> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger
> >>> for future expansion. If you own your hardware and racks, I recommend
> making
> >>> sure to size out the rack diversity and # of nodes per rack. Also take
> into
> >>> account the replication factor when doing this. RF=3, should be min of
> 3
> >>> racks, and # of nodes per rack should be divisible by the replication
> >>> factor. This has worked out pretty well for us. Our biggest problems
> today
> >>> are adding 100s of nodes to existing clusters at once. I'm not sure
> how many
> >>> other companies are having this problem, but it's certainly on our
> radar to
> >>> improve, if you get to that point :)
> >>>
> >>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe 
> >>> wrote:
> 
>  Hi everyone,
> 
>  I am currently in the process of writing a hardware proposal for a
>  Cassandra cluster for storing a lot of monitoring time series data. My
>  workload is write intensive and my data set is extremely varied in
> types of
>  variables and insertion rate for these variables (I will have to
> handle an
>  order of 2 million variables coming in, each at very different rates
> - the
>  majority of them will come at very low rates but there are many that
> will
>  come at higher rates constant rates and a few coming in with huge
> spikes in
>  rates). These variables correspond to all basic C++ types and arrays
> of
>  these types. The highest insertion rates are received for basic
> types, out
>  of which U32 variables seem to be the most prevalent (e.g. I recorded
> 2
>  million U32 vars were inserted in 8 mins of operation while 600.000
> doubles
>  and 170.000 strings were inserted during the same time. Note this
>  measurement was only for a subset of the total data currently taken
> in).
> 
>  At the moment I am partitioning the data in Cassandra in 75 CFs (each
> CF
>  corresponds to a logical partitioning of the set of variables
> mentioned
>  before - but this partitioning is not related with the amount of data
> or
>  rates...it is somewhat random). These 75 CFs account for ~1 million
> of the
>  variables I need to store. I have a 3 node Cassandra 0.8.5 cluster
> (each
>  node is a 4 real core with 4 GB RAM and split commit log directory
> and data
>  file directory between two RAID arrays with HDDs). I can handle the
> load in
>  this configuration but the average CPU usage of the Cassand

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Mohit Anchlia
On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet  wrote:
>
>
> On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean 
> wrote:
>>
>> Hey Chris,
>>
>>  Thanks for sharing all  the info.
>>  I have few questions:
>>  1. What are you doing with so much memory :) ? How much of it do you
>> allocate for heap ?
>
> max heap is 12GB. we use the rest for cache. we run memcache on each node
> and allocate the remaining to that.

Is this using off heap cache of Cassandra?

>
>>
>>  2. What your network speed ? Do you use trunks ? Do you have a dedicated
>> VLAN for gossip/store traffic ?
>>
> No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.
>
>>
>> Cheers,
>> Sorin
>>
>>
>> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet 
>> wrote:
>>>
>>> RE: RAID0 Recommendation
>>> Cassandra supports multiple data file directories. Because we do
>>> compactions, it's just much easier to deal with (1) data file directory that
>>> is stripped across all disks as 1 volume (RAID0). There are other ways to
>>> accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>>> We own the physical hardware and have found that even with hardware raid,
>>> software raid in Linux actually faster. The reason being is:
>>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>>> We have found that using far-copies is much faster over near-copies. We
>>> set the i/o scheduler to noop at the moment. We might move back to CFQ with
>>> more tuning in the future.
>>> We use RAID10 for cases where we need better disk performance if we are
>>> hitting the disk often, sacrificing storage. We initially thought RAID0
>>> should be faster over RAID10 until we found out about the near vs far
>>> layouts.
>>> RE: Hardware
>>> This is going to depend on how well your automated infrastructure is, but
>>> we chose the path of finding the cheapest servers we could get from
>>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>>> We are in the process of making changes to our servers, I'll report back
>>> in when we have more details to share.
>>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>>> Another recommendation for clusters, always go big. You will be thankful
>>> in the future for this. Even if you can do this on 3-6 nodes, go much larger
>>> for future expansion. If you own your hardware and racks, I recommend making
>>> sure to size out the rack diversity and # of nodes per rack. Also take into
>>> account the replication factor when doing this. RF=3, should be min of 3
>>> racks, and # of nodes per rack should be divisible by the replication
>>> factor. This has worked out pretty well for us. Our biggest problems today
>>> are adding 100s of nodes to existing clusters at once. I'm not sure how many
>>> other companies are having this problem, but it's certainly on our radar to
>>> improve, if you get to that point :)
>>>
>>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe 
>>> wrote:

 Hi everyone,

 I am currently in the process of writing a hardware proposal for a
 Cassandra cluster for storing a lot of monitoring time series data. My
 workload is write intensive and my data set is extremely varied in types of
 variables and insertion rate for these variables (I will have to handle an
 order of 2 million variables coming in, each at very different rates - the
 majority of them will come at very low rates but there are many that will
 come at higher rates constant rates and a few coming in with huge spikes in
 rates). These variables correspond to all basic C++ types and arrays of
 these types. The highest insertion rates are received for basic types, out
 of which U32 variables seem to be the most prevalent (e.g. I recorded 2
 million U32 vars were inserted in 8 mins of operation while 600.000 doubles
 and 170.000 strings were inserted during the same time. Note this
 measurement was only for a subset of the total data currently taken in).

 At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
 corresponds to a logical partitioning of the set of variables mentioned
 before - but this partitioning is not related with the amount of data or
 rates...it is somewhat random). These 75 CFs account for ~1 million of the
 variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
 node is a 4 real core with 4 GB RAM and split commit log directory and data
 file directory between two RAID arrays with HDDs). I can handle the load in
 this configuration but the average CPU usage of the Cassandra nodes is
 slightly above 50%. As I will need to add 12 more CFs (corresponding to
 another ~ 1 million variables) plus potentially other data later, it is
 clear that I need better hardware (also for the retrieval part).

 I am looking at Dell servers (Power Edge etc)

 Questions:

 1. Is anyone using Dell HW for their Ca

Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Chris Goffinet
On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean wrote:

> Hey Chris,
>
>  Thanks for sharing all  the info.
>  I have few questions:
>  1. What are you doing with so much memory :) ? How much of it do you
> allocate for heap ?
>

max heap is 12GB. we use the rest for cache. we run memcache on each node
and allocate the remaining to that.


>  2. What your network speed ? Do you use trunks ? Do you have a dedicated
> VLAN for gossip/store traffic ?
>
> No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.



> Cheers,
> Sorin
>
>
> On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet wrote:
>
>> RE: RAID0 Recommendation
>>
>> Cassandra supports multiple data file directories. Because we do
>> compactions, it's just much easier to deal with (1) data file directory
>> that is stripped across all disks as 1 volume (RAID0). There are other ways
>> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>>
>> We own the physical hardware and have found that even with hardware raid,
>> software raid in Linux actually faster. The reason being is:
>>
>> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>>
>> We have found that using far-copies is much faster over near-copies. We
>> set the i/o scheduler to noop at the moment. We might move back to CFQ with
>> more tuning in the future.
>>
>> We use RAID10 for cases where we need better disk performance if we are
>> hitting the disk often, sacrificing storage. We initially thought RAID0
>> should be faster over RAID10 until we found out about the near vs far
>> layouts.
>>
>> RE: Hardware
>>
>> This is going to depend on how well your automated infrastructure is, but
>> we chose the path of finding the cheapest servers we could get from
>> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>>
>> We are in the process of making changes to our servers, I'll report back
>> in when we have more details to share.
>>
>> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>>
>> Another recommendation for clusters, always go big. You will be thankful
>> in the future for this. Even if you can do this on 3-6 nodes, go much
>> larger for future expansion. If you own your hardware and racks, I
>> recommend making sure to size out the rack diversity and # of nodes per
>> rack. Also take into account the replication factor when doing this. RF=3,
>> should be min of 3 racks, and # of nodes per rack should be divisible by
>> the replication factor. This has worked out pretty well for us. Our biggest
>> problems today are adding 100s of nodes to existing clusters at once. I'm
>> not sure how many other companies are having this problem, but it's
>> certainly on our radar to improve, if you get to that point :)
>>
>>
>> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote:
>>
>>> Hi everyone,
>>>
>>> I am currently in the process of writing a hardware proposal for a
>>> Cassandra cluster for storing a lot of monitoring time series data. My
>>> workload is write intensive and my data set is extremely varied in types of
>>> variables and insertion rate for these variables (I will have to handle an
>>> order of 2 million variables coming in, each at very different rates - the
>>> majority of them will come at very low rates but there are many that will
>>> come at higher rates constant rates and a few coming in with huge spikes in
>>> rates). These variables correspond to all basic C++ types and arrays of
>>> these types. The highest insertion rates are received for basic types, out
>>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>>> and 170.000 strings were inserted during the same time. Note this
>>> measurement was only for a subset of the total data currently taken in).
>>>
>>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>>> corresponds to a logical partitioning of the set of variables mentioned
>>> before - but this partitioning is not related with the amount of data or
>>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>>> file directory between two RAID arrays with HDDs). I can handle the load in
>>> this configuration but the average CPU usage of the Cassandra nodes is
>>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>>> another ~ 1 million variables) plus potentially other data later, it is
>>> clear that I need better hardware (also for the retrieval part).
>>>
>>> I am looking at Dell servers (Power Edge etc)
>>>
>>> Questions:
>>>
>>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>>> behave? Anybody care to share their configurations or tips for buying, what
>>> to avoid etc?
>>>
>>> 2. Obviously I am going to keep to the advice

Very slow writes in Cassandra

2011-10-30 Thread Evgeny
Hello Cassandra users,

I'm newbie in NoSQL and Cassandara in particular. At the moment doing some
benchmarking with Cassandra and experiencing very slow write throughput.

As it is said, Cassandra can perform hundreds of thousands of inserts per
second, however I'm not observing this: 1) when I send 100 thousand inserts
simultaneously via 8 CQL clients, then throughput is ~14470 inserts per 
second.
2) when I do the same via 8 Thrift clients, then throughput is ~16300 inserts
per seconds.

I think Cassandra performance can be improved, but I don't know what to tune.
Please take a look at the test conditions below and advise something. 
Thank you.

Tests conditions:

   1. Cassandra cluster is deployed on three machines, each machine has 8 
   cores Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, RAM is 16GB, 
   network speed is 1000Mb/s.

   2. The data sample is

set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
'1.0'; 
set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
'AA1'; 
set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['price'] = 
'47.1'; 
set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['volume'] =
'300.0'; 
set MM[utf8('1:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1'; 
set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
'2.0'; 
set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
'AA1'; 
set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['price'] = 
'44.89'; 
set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['volume'] =
'310.0'; 
set MM[utf8('2:exc_source_algo:2010010500.00:ENTER:0')]['se'] = '1'; 
set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['order_id'] =
'3.0'; 
set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['security'] =
'AA2'; 
set MM[utf8('3:exc_source_algo:2010010500.00:ENTER:0')]['price'] = 
'0.35';

  3. Commit log is written on the local hard drive, the data is written on 
  Lustre.
   
   4. Keyspace description Keyspace: MD: Replication Strategy:
org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true
Options: [datacenter1:1] 
Column Families: ColumnFamily: MM Key Validation Class:
org.apache.cassandra.db.marshal.BytesType Default column value validator:
org.apache.cassandra.db.marshal.BytesType Columns sorted by:
org.apache.cassandra.db.marshal.BytesType Row cache size / save period in
seconds: 0.0/0 Key cache size / save period in seconds:20.0/14400 Memtable
thresholds: 2.3247/1440/496 (millions of ops/minutes/MB) GC grace
seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0
Replicate on write: true Built indexes: []

Thanks in advance.

Evgeny.



Re: What does a cluster throttled by the network look like ?

2011-10-30 Thread David Jeske
You are answering your own question here. If you are running at 80% of
network bandwidth, you are saturating your network.

AFAIK - most distributed databases are running on gigabit, not 100mb. I
recommend you upgrade your switch (and nics if necessary). Gigabit is
insanely cheap now. In the extremes, some distributed databases have
dedicated gigabit for internode traffic vs external traffic.

On Oct 30, 2011 3:15 PM, "Philippe"  wrote:
> What I do see is that each server in the cluster has about 60-80Mb/s
> traffic in & out and I'm running on 100Mb ethernet.


Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Radim Kolar

Dne 30.10.2011 23:34, Sorin Julean napsal(a):

Hey Chris,

 Thanks for sharing all  the info.
 I have few questions:
 1. What are you doing with so much memory :) ?
cassandra eats memory like there is no tomorrow on large databases. It 
keeps some structures in memory which depends on database size.


 2. What your network speed ? 100 mbit is failure

3. Do you have a dedicated VLAN for gossip/store traffic ?
We share hadoop + cassandra on VLAN due to low budget. It is best to 
have them separated. Hadoop is very heavy on network.


Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Sorin Julean
Hey Chris,

 Thanks for sharing all  the info.
 I have few questions:
 1. What are you doing with so much memory :) ? How much of it do you
allocate for heap ?
 2. What your network speed ? Do you use trunks ? Do you have a dedicated
VLAN for gossip/store traffic ?

Cheers,
Sorin


On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet wrote:

> RE: RAID0 Recommendation
>
> Cassandra supports multiple data file directories. Because we do
> compactions, it's just much easier to deal with (1) data file directory
> that is stripped across all disks as 1 volume (RAID0). There are other ways
> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>
> We own the physical hardware and have found that even with hardware raid,
> software raid in Linux actually faster. The reason being is:
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>
> We have found that using far-copies is much faster over near-copies. We
> set the i/o scheduler to noop at the moment. We might move back to CFQ with
> more tuning in the future.
>
> We use RAID10 for cases where we need better disk performance if we are
> hitting the disk often, sacrificing storage. We initially thought RAID0
> should be faster over RAID10 until we found out about the near vs far
> layouts.
>
> RE: Hardware
>
> This is going to depend on how well your automated infrastructure is, but
> we chose the path of finding the cheapest servers we could get from
> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>
> We are in the process of making changes to our servers, I'll report back
> in when we have more details to share.
>
> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>
> Another recommendation for clusters, always go big. You will be thankful
> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger for future expansion. If you own your hardware and racks, I
> recommend making sure to size out the rack diversity and # of nodes per
> rack. Also take into account the replication factor when doing this. RF=3,
> should be min of 3 racks, and # of nodes per rack should be divisible by
> the replication factor. This has worked out pretty well for us. Our biggest
> problems today are adding 100s of nodes to existing clusters at once. I'm
> not sure how many other companies are having this problem, but it's
> certainly on our radar to improve, if you get to that point :)
>
>
> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote:
>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>> Cassandra cluster for storing a lot of monitoring time series data. My
>> workload is write intensive and my data set is extremely varied in types of
>> variables and insertion rate for these variables (I will have to handle an
>> order of 2 million variables coming in, each at very different rates - the
>> majority of them will come at very low rates but there are many that will
>> come at higher rates constant rates and a few coming in with huge spikes in
>> rates). These variables correspond to all basic C++ types and arrays of
>> these types. The highest insertion rates are received for basic types, out
>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>> and 170.000 strings were inserted during the same time. Note this
>> measurement was only for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>> file directory between two RAID arrays with HDDs). I can handle the load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> behave? Anybody care to share their configurations or tips for buying, what
>> to avoid etc?
>>
>> 2. Obviously I am going to keep to the advice on the
>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>> commmitlog and data on separate disks. I was going to use SSD for commitlog
>> but then did some more research and found out that it doesn't make sense to
>> use SSDs for sequential appends because it won't have a performance
>> advantage with respect to rotational media.

What does a cluster throttled by the network look like ?

2011-10-30 Thread Philippe
Dear all,
I'm working with a 12-node, RF=3 cluster on low-end hardware (core i5 with
16GB of RAM & SATA disks).
I'm using a BOP and each node has a load between 50GB and 100GB (yes, I
apparently did not set my tokens right... I'll fix that later).

I'm hitting the cluster with a little over 100 concurrent threads during
reads & writes in counter columns & plain columns. When I peek at my
various servers & look at my munin graphs, it looks like everything is
great : iostat -dmx shows peeks of 5% utilization at worst and vmstat shows
peeks of 30% CPU utilization, no swapping and plenty of caching. Munin
shows me that the JVM's heap is prancing around 50% of its allocated amount
(and I have no row caching for now).

So my thinking has been that the cluster is under-utilized but when I
increase the number of threads hitting the cluster, I do not see the
throughput increase which has been puzzling. I do see an occasional
TimedOutException now that I've made my batches smaller (to dozens of
mutations rather than hundreds) and my slices smaller too.

What I do see is that each server in the cluster has about 60-80Mb/s
traffic in & out and I'm running on 100Mb ethernet. I seem to remember that
you can only get to a little over 80% of the available ethernet bandwidth
in real-life.
Could my performance be limited by the Network Adapters ?

Thanks


Re: Cassandra cluster HW spec (commit log directory vs data file directory)

2011-10-30 Thread Alexandru Dan Sicoe
Hi Chris,
 Thanks for your post. I can see you guys handle extremely large amounts of
data compared to my system. Yes I will own the racks and the machines but
the problem is I am limited by actual physical space in our data center
(believe it or not) and also the budget. It would be hard for me to justify
acquisition of more than 3-4 machines, that's why I will need to find a
system that empties Cassandra and transfers the data to another mass
storage system. Thanks for the RAID10 suggestion...I'll look into that!
I've seen everybody warns me about the number of CFs si I'll listen to you
guys and reduce the number.
 Yeah, it would be nice to hear about your HW evolution.I will report
back as well once I finish my proposal!

Cheers,
Alex

On Sun, Oct 30, 2011 at 4:00 AM, Chris Goffinet wrote:

> RE: RAID0 Recommendation
>
> Cassandra supports multiple data file directories. Because we do
> compactions, it's just much easier to deal with (1) data file directory
> that is stripped across all disks as 1 volume (RAID0). There are other ways
> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>
> We own the physical hardware and have found that even with hardware raid,
> software raid in Linux actually faster. The reason being is:
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>
> We have found that using far-copies is much faster over near-copies. We
> set the i/o scheduler to noop at the moment. We might move back to CFQ with
> more tuning in the future.
>
> We use RAID10 for cases where we need better disk performance if we are
> hitting the disk often, sacrificing storage. We initially thought RAID0
> should be faster over RAID10 until we found out about the near vs far
> layouts.
>
> RE: Hardware
>
> This is going to depend on how well your automated infrastructure is, but
> we chose the path of finding the cheapest servers we could get from
> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>
> We are in the process of making changes to our servers, I'll report back
> in when we have more details to share.
>
> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>
> Another recommendation for clusters, always go big. You will be thankful
> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger for future expansion. If you own your hardware and racks, I
> recommend making sure to size out the rack diversity and # of nodes per
> rack. Also take into account the replication factor when doing this. RF=3,
> should be min of 3 racks, and # of nodes per rack should be divisible by
> the replication factor. This has worked out pretty well for us. Our biggest
> problems today are adding 100s of nodes to existing clusters at once. I'm
> not sure how many other companies are having this problem, but it's
> certainly on our radar to improve, if you get to that point :)
>
>
> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote:
>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>> Cassandra cluster for storing a lot of monitoring time series data. My
>> workload is write intensive and my data set is extremely varied in types of
>> variables and insertion rate for these variables (I will have to handle an
>> order of 2 million variables coming in, each at very different rates - the
>> majority of them will come at very low rates but there are many that will
>> come at higher rates constant rates and a few coming in with huge spikes in
>> rates). These variables correspond to all basic C++ types and arrays of
>> these types. The highest insertion rates are received for basic types, out
>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>> and 170.000 strings were inserted during the same time. Note this
>> measurement was only for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>> file directory between two RAID arrays with HDDs). I can handle the load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> behave? Anybody care to s

ByteBuffer as an initial serializer to read columns with mixed datatypes ?

2011-10-30 Thread Ertio Lew
I have a mix of byte[] & Integer column names/ values within a CF rows. So
should ByteBuffer be my initial choice for the serializer while making the
read query to the database for the mixed datatypes & then I should retrieve
the byte[] or Integer from ByteBuffer using the ByteBuffer api's getInt()
method ?

Is this a preferable way to read columns with integer/
byte[] names, initially as bytebuffer(s) & later converting them to Integer
or byte[] ?


Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE

2011-10-30 Thread David Jeske
If your summary data is frequently accessed, you will probably be best off
storing the two sets of data separately (either in separate column families
or with different key-prefixes). This will give you the greatest
cache-locality for your summary data, which you say is popular. If your
summary data is very well cached, then it won't matter that it's might
require two disk-seeks to get summary+details, because your summary data is
usually in cache anyhow.

If you want a more specific recommendation that that, we'd need to see
answers to the following questions:

(a) how big is the summary data (total, per row)? (average, max)
(b) how big is the detail data (total, per row)? (average, max)
(b) what is the read/write traffic to the summary data? ..detail data?

A side note about caches... IMO, you're better off getting the cache
behavior you want through physical ordering than through more explicit
caching. This is because most modern databases (cassandra included) go
through the OS buffer cache already, and there is some amount of
duplicating of data involved in trying to application cache data. If your
application cache hitrate is very high (90%+) this can work out, but if
it's lower (50%) it can sometimes have poor effects on the cache efficiency
of both the application cache and OS buffer cache (because of data being
duplicated in both caches).