Cassandra Demo/Tutorial Applications

2010-03-12 Thread Krishna Sankar
I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate insights:

A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?
C)  What would be the goal of demo apps ? Tutorial to help folks learn the
ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
belongs to the latter; Twissandra most probably belongs to the former.
D)  Hadoop on Cassandra might be a good demo/tutorial
E)  How would one structure the infrastructure for the demo/tutorials ? What
assumptions can we make in creating them ? As AMIs to be run in EC2 ? Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ? What is an optimum configuration
for learning and demo ? We need to make it simple (to reflect the domain)
but not simpler.
F)  Am looking for ideas from developers and users - hence the cross
posting. I hope apache mailer is smart enough to dedup - will find it soon
...

Cheers
k/




Re: Cassandra hardware - balancing CPU/memory/iops/disk space

2010-03-06 Thread Krishna Sankar
Eric,
Couple of thoughts:
1. Hardware
Definitely dual quad core
12 X 4 DIMMS. This is the sweet spot for memory. I have many
machines with this config and some with the 12 X 2 configs
I haven¹t found the need for SATA and the higher price
Make sure you get good NICs
Are you using any virtualization layer ? I assume these are bare
metal with Ubuntu or RedHat.
2. Scaling
Naturally you should look at horizontal scaling than vertical.
An estimate of the application characteristics and data
properties would be helpful to get a first estimate
I think eventually you will end up with multiple boxes anyway,
so my philosophy has been to buy multiple optimal boxes
We are working on scaling characteristics (memory, network and
storage), unfortunately way too early to make any inferences
HTH.
Cheers
k/
On 3/5/10 Fri Mar 5, 10, Rosenberry, Eric eric.rosenbe...@iovation.com
wrote:

 I am looking for advice from others that are further along in deploying
 Cassandra in production environments than we are.  I want to know what you are
 finding your bottlenecks to be.  I would feel silly purchasing dual processor
 quad core 2.93ghz Nehalem machines with 192 gigs of RAM just to find out that
 the two local SATA disks kept all that CPU and RAM from being useful (clearly
 that example would be a dumb).
  
 I need to spec out hardware for an ³optimal² Cassandra node (though our
 read/write characteristics are not yet fully defined so let¹s go with an
 ³average² configuration).
  
 My main concern is finding the right balance of:
 ·Available CPU
 
 ·RAM amount
 
 ·RAM speed (think Nehalem architecture where memory comes in a few
 speeds, though I doubt this is much of a concern as it is mainly dictated by
 which processor you buy and how many slots you populate)
 
 ·Total iops available (i.e. number of disks)
 
 ·Total disk space available (depending on the ratio of iops/space
 deciding on SAS vs. SATA and various rotational speeds)
 
  
 My current thinking is 1U boxes with four 3.5 inch disks since that seems to
 be a readily available config.  One big question is should I go with a single
 processor Nehalem system to go with those four disks, or would two CPU¹s be
 useful, and also, how much RAM is appropriate to match?  I am making the
 assumption that Cassandra nodes are going to be disk bound as they must do a
 random read to answer any given query (i.e. indexes in RAM, but all data lives
 on disk?).
  
 The other big decision is what type of hard disks others are finding to
 provide the optimal ratio of iops to available space?  SAS or SATA?  And what
 rotational speed?
  
 Let me throw out here an actual hardware config and feel free to tell me the
 error of my ways:
 ·A SuperMicro SuperServer 6016T-NTRF configured as follows:
 
 o  2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem architecture
 (this proc provides a lot of bang for the buck, faster procs get more
 expensive quickly)
 
 o  Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig DIMMS
 seem to be the price sweet spot)
 
 o  Dual on board 1 gigabit NIC¹s (perhaps one for client connections and the
 other for cluster communication?)
 
 o  Dual power supplies (I don¹t want to lose half my cluster due to a failure
 on one power leg)
 
 o  4x 1TB SATA disks (this is a complete SWAG)
 
 o  No RAID controller (all just single individual disks presented to the OS) ­
 Though is there any down side to using a RAID controller with RAID 0 (perhaps
 one single disk for the log for sequential io¹s, and 3x disks in a stripe for
 the random io¹s)
 
 o  The on-board IPMI based OOB controller (so we can kick the boxes remotely
 if need be)
 
 ·http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm
 
  
 I can¹t help but think the above config has way too much RAM and CPU and not
 enough iops capacity.  My understanding is that Cassandra does not cache much
 in RAM though?
  
 Any thoughts are appreciated.  Thanks.
  
 -Eric
 ___
 Eric Rosenberry
 Sr. Infrastructure Architect | Chief Bit Plumber
  
  
 iovation
 111 SW Fifth Avenue
 Suite 3200
 Portland, OR 97204
 www.iovation.com http://www.iovation.com/
  
 The information contained in this email message may be privileged,
 confidential and protected from disclosure. If you are not the intended
 recipient, any dissemination, distribution or copying is strictly prohibited.
 If you think that you have received this email message in error, please notify
 the sender by reply email and delete the message and any attachments.
 



Re: [VOTE] Graduation

2010-01-25 Thread Krishna Sankar
+1 from me. 
On Jan 25, 2010, at Mon Jan 25, 10, Eric Evans wrote:

 
 There was some additional discussion[1] concerning Cassandra's
 graduation on the incubator list, and as a result we've altered the
 initial resolution to expand the size of the PMC by three to include our
 active mentors (new draft attached).
 
 I propose a vote for Cassandra's graduation to a top-level project.
 
 We'll leave this open for 72 hours, and assuming it passes, we can then
 take it to a vote with the Incubator PMC.
 
 +1 from me!
 
 
 [1] http://thread.gmane.org/gmane.comp.apache.incubator.general/24427
 
 -- 
 Eric Evans
 eev...@rackspace.com
 cassandra-resolution.txt