Re: commodity server spec

2011-09-06 Thread China Stoffen
In general, more smaller is better than fewer big. Probably go for
what's cost-effective.

Cost effective solution is few and fat servers because it also saves hosting 
cost.



The exception to that would be if you're truly only caring about
writes and have *very* few reads that are not latency critical (so
you're okay with waiting for several disk seeks on reads and the
number of reads is low enough that serving them from platters will
work). In such cases it might make sense to have fewer Big Fat
Machines with lots of memory and a lot of disk space. But... even so.
I would not recommend huge 48 tb nodes... unless you really know what
you're doing.

I want writes should be as fast as possible but reads are not necessary to be 
in milliseconds. 
If you don't recommend 48tb then how much max disk space I can go with?






- Original Message -
From: Peter Schuller peter.schul...@infidyne.com
To: user@cassandra.apache.org; China Stoffen chinastof...@yahoo.com
Cc: 
Sent: Saturday, September 3, 2011 1:08 PM
Subject: Re: commodity server spec

 Is there any recommendation about commodity server hardware specs if 100TB
 database size is expected and its heavily write application.
 Should I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and
 total of 3 servers of this spec. Or many smaller commodity servers are
 recommended?

In general, more smaller is better than fewer big. Probably go for
what's cost-effective.

In your case, 100 TB is *quite* big. I would definitely recommend
against doing anything like your 3 server setup. You'll probably want
100-1000 small servers.

The exception to that would be if you're truly only caring about
writes and have *very* few reads that are not latency critical (so
you're okay with waiting for several disk seeks on reads and the
number of reads is low enough that serving them from platters will
work). In such cases it might make sense to have fewer Big Fat
Machines with lots of memory and a lot of disk space. But... even so.
I would not recommend huge 48 tb nodes... unless you really know what
you're doing.

In reality, more information about your use-case would be required to
offer terribly useful advice.

-- 
/ Peter Schuller (@scode on twitter)


Re: commodity server spec

2011-09-06 Thread Bill

Mongodb, last time I looked does not scale horizontally.

I've seen reasonable behavour putting Cassandra database tables onto 
remote filers, but you absolutely have to test against the SAN 
configuration and carefully manage things like concurrent reader/writer 
settings, the fs and cassandra caches, etc. You generally won't be 
recommended to use a NAS/SAN for this class of system.


The commitlogs work best on attached (dedicated) disk.

Bill

On 04/09/11 14:08, China Stoffen wrote:

Then what will be the sweetspot for Cassandra? I am more interested in
Cassandra because my application is write heavy.

Till now what I have understood is that Cassandra will not work best for
SANs too?

P.S
Mongodb is also a nosql database and designed for horizontal scaling
then how its good for the same hardware for which Cassandra is not a
good candidate?


- Original Message -
From: Bill b...@dehora.net
To: user@cassandra.apache.org
Cc:
Sent: Sunday, September 4, 2011 4:34 AM
Subject: Re: commodity server spec

[100% agree with Chris]

China, the machines you're describing sound nice for
mongodb/postgres/mysql, but probably not the sweetspot for Cassandra.

Obviously (well depending on near term load) you don't want to get
burned on excess footprint. But a realistic, don't lose data, be fairly
available deployment is going to span at least 2 racks/power supplies
and have data replicated offsite (at least as passive for DR). So I
would consider 6-9 relatively weaker servers rather than 3 scale up
joints. You'll save some capex, and the amount of opex overhead is
probably worth it traded off against the operational risk. 3 is an
awkward number to operate for anything that needs to be available
(although many people seem to start with that, I am guessing because
triplication is traditionally understood under failure) as it
immediately puts 50% extra load on the remaining 2 when one node goes
away. One will go away, even transiently, when it is upgraded, crashes,
gets into a funk due to compaction or garbage collection, and load will
then be shunted onto the other 2 - remember Cassandra has no
backoff/throttling in place. I'd allow for something breaking at some
point (dbs even the mature ones, fail from time to time) and 2 doesn't
give you much room to maneuver in production.

Bill


On 03/09/11 23:05, Chris Goffinet wrote:
  It will also depend on how long you can handle recovery time. So imagine
  this case:
 
  3 nodes w/ RF of 3
  Each node has 30TB of space used (you never want to fill up entire node).
  If one node fails and you must recover, that will take over 3.6 days in
  just transferring data alone. That's with a sustained 800megabit/s
  (100MB/s). In the real world it's going to fluctuate so add some
  padding. Also, since you will be saturating one of the other nodes, now
  you're network latency performance is suffering and you only have 1
  machine to handle the remaining traffic while you're recovering. And if
  you want to expand the cluster in the future (more nodes), the amount of
  data to transfer is going to be very large and most likely days to add
  machines. From my experience it's must better to have a larger cluster
  setup upfront for future growth than getting by with 6-12 nodes at the
  start. You will feel less pain, easier to manage node failures (bad
  disks, mem, etc).
 
  3 nodes with RF of 1 wouldn't make sense.
 
 
  On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen chinastof...@yahoo.com
mailto:chinastof...@yahoo.com
  mailto:chinastof...@yahoo.com mailto:chinastof...@yahoo.com wrote:
 
  Many small servers would drive up the hosting cost way too high so
  want to avoid this solution if we can.
 
  - Original Message -
  From: Radim Kolar h...@sendmail.cz mailto:h...@sendmail.cz
mailto:h...@sendmail.cz mailto:h...@sendmail.cz
  To: user@cassandra.apache.org mailto:user@cassandra.apache.org
mailto:user@cassandra.apache.org mailto:user@cassandra.apache.org
  Cc:
  Sent: Saturday, September 3, 2011 9:37 AM
  Subject: Re: commodity server spec
 
  many smaller servers way better
 
 





Re: commodity server spec

2011-09-04 Thread China Stoffen
Then what will be the sweetspot for Cassandra? I am more interested in 
Cassandra because my application is write heavy.

Till now what I have understood is that Cassandra will not work best for SANs 
too?

P.S
Mongodb is also a nosql database and designed for horizontal scaling then how 
its good for the same hardware for which Cassandra is not a good candidate?




- Original Message -
From: Bill b...@dehora.net
To: user@cassandra.apache.org
Cc: 
Sent: Sunday, September 4, 2011 4:34 AM
Subject: Re: commodity server spec

[100% agree with Chris]

China, the machines you're describing sound nice for mongodb/postgres/mysql, 
but probably not the sweetspot for Cassandra.

Obviously (well depending on near term load) you don't want to get burned on 
excess footprint. But a realistic, don't lose data, be fairly available 
deployment is going to span at least 2 racks/power supplies and have data 
replicated offsite (at least as passive for DR). So I would consider 6-9 
relatively weaker servers rather than 3 scale up joints. You'll save some 
capex, and the amount of opex overhead is probably worth it traded off against 
the operational risk. 3 is an awkward number to operate for anything that needs 
to be available (although many people seem to start with that, I am guessing 
because triplication is traditionally understood under failure) as it 
immediately puts 50% extra load on the remaining 2 when one node goes away. One 
will go away, even transiently, when it is upgraded, crashes, gets into a funk 
due to compaction or garbage collection, and load will then be shunted onto the 
other 2 - remember Cassandra has no
 backoff/throttling in place. I'd allow for something breaking at some point 
(dbs even the mature ones, fail from time to time) and 2 doesn't give you much 
room to maneuver in production.

Bill


On 03/09/11 23:05, Chris Goffinet wrote:
 It will also depend on how long you can handle recovery time. So imagine
 this case:
 
 3 nodes w/ RF of 3
 Each node has 30TB of space used (you never want to fill up entire node).
 If one node fails and you must recover, that will take over 3.6 days in
 just transferring data alone. That's with a sustained 800megabit/s
 (100MB/s). In the real world it's going to fluctuate so add some
 padding. Also, since you will be saturating one of the other nodes, now
 you're network latency performance is suffering and you only have 1
 machine to handle the remaining traffic while you're recovering. And if
 you want to expand the cluster in the future (more nodes), the amount of
 data to transfer is going to be very large and most likely days to add
 machines. From my experience it's must better to have a larger cluster
 setup upfront for future growth than getting by with 6-12 nodes at the
 start. You will feel less pain, easier to manage node failures (bad
 disks, mem, etc).
 
 3 nodes with RF of 1 wouldn't make sense.
 
 
 On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen chinastof...@yahoo.com
 mailto:chinastof...@yahoo.com wrote:
 
     Many small servers would drive up the hosting cost way too high so
     want to avoid this solution if we can.
 
     - Original Message -
     From: Radim Kolar h...@sendmail.cz mailto:h...@sendmail.cz
     To: user@cassandra.apache.org mailto:user@cassandra.apache.org
     Cc:
     Sent: Saturday, September 3, 2011 9:37 AM
     Subject: Re: commodity server spec
 
     many smaller servers way better
 
 

Re: commodity server spec

2011-09-03 Thread Peter Schuller
 Is there any recommendation about commodity server hardware specs if 100TB
 database size is expected and its heavily write application.
 Should I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and
 total of 3 servers of this spec. Or many smaller commodity servers are
 recommended?

In general, more smaller is better than fewer big. Probably go for
what's cost-effective.

In your case, 100 TB is *quite* big. I would definitely recommend
against doing anything like your 3 server setup. You'll probably want
100-1000 small servers.

The exception to that would be if you're truly only caring about
writes and have *very* few reads that are not latency critical (so
you're okay with waiting for several disk seeks on reads and the
number of reads is low enough that serving them from platters will
work). In such cases it might make sense to have fewer Big Fat
Machines with lots of memory and a lot of disk space. But... even so.
I would not recommend huge 48 tb nodes... unless you really know what
you're doing.

In reality, more information about your use-case would be required to
offer terribly useful advice.

-- 
/ Peter Schuller (@scode on twitter)


Re: commodity server spec

2011-09-03 Thread China Stoffen
Many small servers would drive up the hosting cost way too high so want to 
avoid this solution if we can.



- Original Message -
From: Radim Kolar h...@sendmail.cz
To: user@cassandra.apache.org
Cc: 
Sent: Saturday, September 3, 2011 9:37 AM
Subject: Re: commodity server spec

many smaller servers way better


Re: commodity server spec

2011-09-03 Thread Chris Goffinet
It will also depend on how long you can handle recovery time. So imagine
this case:

3 nodes w/ RF of 3
Each node has 30TB of space used (you never want to fill up entire node).
If one node fails and you must recover, that will take over 3.6 days in
just transferring data alone. That's with a sustained 800megabit/s
(100MB/s). In the real world it's going to fluctuate so add some padding.
Also, since you will be saturating one of the other nodes, now you're
network latency performance is suffering and you only have 1 machine to
handle the remaining traffic while you're recovering. And if you want to
expand the cluster in the future (more nodes), the amount of data to
transfer is going to be very large and most likely days to add machines.
From my experience it's must better to have a larger cluster setup upfront
for future growth than getting by with 6-12 nodes at the start. You will
feel less pain, easier to manage node failures (bad disks, mem, etc).

3 nodes with RF of 1 wouldn't make sense.


On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen chinastof...@yahoo.comwrote:

 Many small servers would drive up the hosting cost way too high so want to
 avoid this solution if we can.

 - Original Message -
 From: Radim Kolar h...@sendmail.cz
 To: user@cassandra.apache.org
 Cc:
 Sent: Saturday, September 3, 2011 9:37 AM
 Subject: Re: commodity server spec

 many smaller servers way better



Re: commodity server spec

2011-09-03 Thread Bill

[100% agree with Chris]

China, the machines you're describing sound nice for 
mongodb/postgres/mysql, but probably not the sweetspot for Cassandra.


Obviously (well depending on near term load) you don't want to get 
burned on excess footprint. But a realistic, don't lose data, be fairly 
available deployment is going to span at least 2 racks/power supplies 
and have data replicated offsite (at least as passive for DR). So I 
would consider 6-9 relatively weaker servers rather than 3 scale up 
joints. You'll save some capex, and the amount of opex overhead is 
probably worth it traded off against the operational risk. 3 is an 
awkward number to operate for anything that needs to be available 
(although many people seem to start with that, I am guessing because 
triplication is traditionally understood under failure) as it 
immediately puts 50% extra load on the remaining 2 when one node goes 
away. One will go away, even transiently, when it is upgraded, crashes, 
gets into a funk due to compaction or garbage collection, and load will 
then be shunted onto the other 2 - remember Cassandra has no 
backoff/throttling in place. I'd allow for something breaking at some 
point (dbs even the mature ones, fail from time to time) and 2 doesn't 
give you much room to maneuver in production.


Bill


On 03/09/11 23:05, Chris Goffinet wrote:

It will also depend on how long you can handle recovery time. So imagine
this case:

3 nodes w/ RF of 3
Each node has 30TB of space used (you never want to fill up entire node).
If one node fails and you must recover, that will take over 3.6 days in
just transferring data alone. That's with a sustained 800megabit/s
(100MB/s). In the real world it's going to fluctuate so add some
padding. Also, since you will be saturating one of the other nodes, now
you're network latency performance is suffering and you only have 1
machine to handle the remaining traffic while you're recovering. And if
you want to expand the cluster in the future (more nodes), the amount of
data to transfer is going to be very large and most likely days to add
machines. From my experience it's must better to have a larger cluster
setup upfront for future growth than getting by with 6-12 nodes at the
start. You will feel less pain, easier to manage node failures (bad
disks, mem, etc).

3 nodes with RF of 1 wouldn't make sense.


On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen chinastof...@yahoo.com
mailto:chinastof...@yahoo.com wrote:

Many small servers would drive up the hosting cost way too high so
want to avoid this solution if we can.

- Original Message -
From: Radim Kolar h...@sendmail.cz mailto:h...@sendmail.cz
To: user@cassandra.apache.org mailto:user@cassandra.apache.org
Cc:
Sent: Saturday, September 3, 2011 9:37 AM
Subject: Re: commodity server spec

many smaller servers way better






commodity server spec

2011-09-02 Thread China Stoffen
Hi,
Is there any recommendation about commodity server hardware specs
 if 100TB database size is expected and its heavily write application.
Should
 I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and 
total of 3 servers of this spec. Or many smaller commodity servers are 
recommended?

Thanks.
China

Re: commodity server spec

2011-09-02 Thread Radim Kolar

many smaller servers way better