understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
I have a very basic question which I have been unable to find in
online documentation on cassandra.

It seems like every node in a cassandra cluster contains all the data
ever stored in the cluster (i.e., all nodes are identical).  I don't
understand how you can scale this on commodity servers with merely
internal hard disks.   In other words, if I want to store 5 TB of
data, does that each node need a hard disk capacity of 5 TB??

With HBase, memcached and other nosql solutions it is more clear how
data is spilt up in the cluster and replicated for fault tolerance.
Again, please excuse the rather basic question.


Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
there are two numbers to look at, N the numbers of hosts in the ring
(cluster) and R the number of replicas for each data item. R is configurable
per column family.
Typically for large clusters N  R. For very small clusters if makes sense
for R to be close to N in which case cassandra is useful so the database
doesn't have a single a single point of failure but not so much b/c of the
size of the data. But for large clusters it rarely makes sense to have N=R,
usually N  R.

On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby jonathan.co...@gmail.comwrote:

 I have a very basic question which I have been unable to find in
 online documentation on cassandra.

 It seems like every node in a cassandra cluster contains all the data
 ever stored in the cluster (i.e., all nodes are identical).  I don't
 understand how you can scale this on commodity servers with merely
 internal hard disks.   In other words, if I want to store 5 TB of
 data, does that each node need a hard disk capacity of 5 TB??

 With HBase, memcached and other nosql solutions it is more clear how
 data is spilt up in the cluster and replicated for fault tolerance.
 Again, please excuse the rather basic question.




-- 
/Ran


unsubscribe

2010-12-09 Thread Massimo Carro
Massimo Carro

www.liquida.it - www.liquida.com


Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
Thanks Ran.  This helps a little but unfortunately I'm still a bit
fuzzy for me.  So is it not true that each node contains all the data
in the cluster? I haven't come across any information on how clustered
data is coordinated in cassandra.  how does my query get directed to
the right node?

On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory ran...@gmail.com wrote:
 there are two numbers to look at, N the numbers of hosts in the ring
 (cluster) and R the number of replicas for each data item. R is configurable
 per column family.
 Typically for large clusters N  R. For very small clusters if makes sense
 for R to be close to N in which case cassandra is useful so the database
 doesn't have a single a single point of failure but not so much b/c of the
 size of the data. But for large clusters it rarely makes sense to have N=R,
 usually N  R.

 On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby jonathan.co...@gmail.com
 wrote:

 I have a very basic question which I have been unable to find in
 online documentation on cassandra.

 It seems like every node in a cassandra cluster contains all the data
 ever stored in the cluster (i.e., all nodes are identical).  I don't
 understand how you can scale this on commodity servers with merely
 internal hard disks.   In other words, if I want to store 5 TB of
 data, does that each node need a hard disk capacity of 5 TB??

 With HBase, memcached and other nosql solutions it is more clear how
 data is spilt up in the cluster and replicated for fault tolerance.
 Again, please excuse the rather basic question.



 --
 /Ran



Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory

 So is it not true that each node contains all the data in the cluster?

No, not in the general case, in fact rarely is it the case. Usually RN. In
my case I have N=6 and R=2.
You configure R per CF under ReplicationFactor (v0.6.*)
or replication_factor (v0.7.*).
http://wiki.apache.org/cassandra/StorageConfiguration


On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby jonathan.co...@gmail.comwrote:

 Thanks Ran.  This helps a little but unfortunately I'm still a bit
 fuzzy for me.  So is it not true that each node contains all the data
 in the cluster? I haven't come across any information on how clustered
 data is coordinated in cassandra.  how does my query get directed to
 the right node?

 On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory ran...@gmail.com wrote:
  there are two numbers to look at, N the numbers of hosts in the ring
  (cluster) and R the number of replicas for each data item. R is
 configurable
  per column family.
  Typically for large clusters N  R. For very small clusters if makes
 sense
  for R to be close to N in which case cassandra is useful so the database
  doesn't have a single a single point of failure but not so much b/c of
 the
  size of the data. But for large clusters it rarely makes sense to have
 N=R,
  usually N  R.
 
  On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby 
 jonathan.co...@gmail.com
  wrote:
 
  I have a very basic question which I have been unable to find in
  online documentation on cassandra.
 
  It seems like every node in a cassandra cluster contains all the data
  ever stored in the cluster (i.e., all nodes are identical).  I don't
  understand how you can scale this on commodity servers with merely
  internal hard disks.   In other words, if I want to store 5 TB of
  data, does that each node need a hard disk capacity of 5 TB??
 
  With HBase, memcached and other nosql solutions it is more clear how
  data is spilt up in the cluster and replicated for fault tolerance.
  Again, please excuse the rather basic question.
 
 
 
  --
  /Ran
 




-- 
/Ran


Re: understanding the cassandra storage scaling

2010-12-09 Thread Sylvain Lebresne
 This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it
 not true that each node contains all the data in the cluster?

Not at all. Basically each node is responsible of only a part of the data (a
range really). But for each data you can choose on how many nodes it is; this
is the Replication Factor.

For instance, if you choose to have RF=1, then each piece of data will be on
exactly one node (this is usually a bad idea since it offers very weak
durability guarantees but nevertheless, it can be done).

If you choose RF=3, each piece of data is on 3 nodes (independently of the
number of nodes your cluster have). You can have all data on all node, but for
that you'll have to choose RF=#{nodes in the cluster}. But this is a very
degenerate case.

 how does my query get directed to the right node?

Each node in the cluster knows the ranges of data each other nodes hold. I
suggest you watch the first video linked in this page
  http://wiki.apache.org/cassandra/ArticlesAndPresentations
It explains this and more.

--
Sylvain


Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
awesome!  Thank you guys for the really quick answers and the links to
the presentations.

On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it
 not true that each node contains all the data in the cluster?

 Not at all. Basically each node is responsible of only a part of the data (a
 range really). But for each data you can choose on how many nodes it is; this
 is the Replication Factor.

 For instance, if you choose to have RF=1, then each piece of data will be on
 exactly one node (this is usually a bad idea since it offers very weak
 durability guarantees but nevertheless, it can be done).

 If you choose RF=3, each piece of data is on 3 nodes (independently of the
 number of nodes your cluster have). You can have all data on all node, but for
 that you'll have to choose RF=#{nodes in the cluster}. But this is a very
 degenerate case.

 how does my query get directed to the right node?

 Each node in the cluster knows the ranges of data each other nodes hold. I
 suggest you watch the first video linked in this page
  http://wiki.apache.org/cassandra/ArticlesAndPresentations
 It explains this and more.

 --
 Sylvain



N to N relationships

2010-12-09 Thread Sébastien Druon
Hello,

For a specific case, we are thinking about representing a N to N
relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will
mostly contain empty elements.

We have a set of questions concerning this:
- what is the best way to represent this matrix? what would have the best
performance in reading? in writing?
  . a super column family with n column families, with n columns each
  . a column family with n columns and n lines

In the second case, we would need to extract 2 kinds of information:
- all the relations for a line: this should be no specific problem;
- all the relations for a column: in that case we would need an index for
the columns, right? and then get all the lines where the value of the column
in question is not null... is it the correct way to do?
When using indexes, say we want to add another element N+1. What impact in
terms of time would it have on the indexation job?

Thanks a lot for the answers,

Best regards,

Sébastien Druon


Re: N to N relationships

2010-12-09 Thread David Boxenhorn
How about a regular CF where keys are n...@n ?

Then, getting a matrix row would be the same cost as getting a matrix column
(N gets), and it would be very easy to add element N+1.


On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix will
 mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact in
 terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon



Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
It seems to me that secondary indexes (new in 0.7) change everything when it
comes to data modeling.

- OOP becomes obsolete
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?


Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
- OPP becomes obsolete (OOP is not obsolete!)
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?


Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
Hi!

I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
for writes. But when I shut down one of them UnavailableExceptions are thrown. 
Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it 
continues with the remaining 2 nodes and redistributes the data to the broken 
one as soons as its up again?

What may I be doing wrong?

thx
tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Thibaut Britz
Hi,

The UnavailableExceptions  will be thrown because quorum of size 2
needs at least 2 nodes to be alive (as for qurom of size 3 as well).

The data won't be automatically redistributed to other nodes.

Thibaut


On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
 for writes. But when I shut down one of them UnavailableExceptions are 
 thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB 
 that it continues with the remaining 2 nodes and redistributes the data to 
 the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn


Re: unsubscribe

2010-12-09 Thread Eric Evans
On Thu, 2010-12-09 at 11:42 +0100, Massimo Carro wrote:
 Massimo Carro
 
 www.liquida.it - www.liquida.com


http://wiki.apache.org/cassandra/FAQ#unsubscribe

-- 
Eric Evans
eev...@rackspace.com



Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Daniel Lundin
Quorum is really only useful when RF  2, since the for a quorum to
succeed RF/2+1 replicas must be available.

This means for RF = 2, consistency levels QUORUM and ALL yield the same result.

/d

On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
 for writes. But when I shut down one of them UnavailableExceptions are 
 thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB 
 that it continues with the remaining 2 nodes and redistributes the data to 
 the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn


Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig

On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.
 
 /d
 
 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!
 
 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant 
 DB that it continues with the remaining 2 nodes and redistributes the data 
 to the broken one as soons as its up again?
 
 What may I be doing wrong?
 
 thx
 tcn



RE: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Viktor Jevdokimov
With 3 nodes and RF=2 you have 3 key ranges: N1+N2, N2+N3 and N3+N1.
Killing N1 you've got only 1 alive range N2+N3 and 2/3 of the range is down for 
Quorum, which is actually all, so N1+N2 and N3+N1 fails.

-Original Message-
From: Timo Nentwig [mailto:timo.nent...@toptarif.de] 
Sent: Thursday, December 09, 2010 6:01 PM
To: user@cassandra.apache.org
Subject: Re: Quorum: killing 1 out of 3 server kills the cluster (?)


On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.
 
 /d
 
 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!
 
 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant 
 DB that it continues with the remaining 2 nodes and redistributes the data 
 to the broken one as soons as its up again?
 
 What may I be doing wrong?
 
 thx
 tcn





Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have
2 replicas. And since quorum is also 2 with that replication factor,
you cannot lose
a node, otherwise some query will end up as UnavailableException.

Again, this is not related to the total number of nodes. Even with 200
nodes, if
you use RF=2, you will have some query that fail (altough much less that what
you are probably seeing).

On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote:

 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

 2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.

 /d

 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a 
 fault-tolerant DB that it continues with the remaining 2 nodes and 
 redistributes the data to the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn




Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
In other words, if you want to use QUORUM, you need to set RF=3.

(I know because I had exactly the same problem.)

On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:

 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that
 what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same
 result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use
 quorum for writes. But when I shut down one of them UnavailableExceptions
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant
 DB that it continues with the remaining 2 nodes and redistributes the data
 to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn
 
 



Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig

On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

 In other words, if you want to use QUORUM, you need to set RF=3. 
 
 (I know because I had exactly the same problem.) 

I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), 
N1 will still remain on another node. Only if both fail, I actually lose data. 
But apparently this is not how it works...

 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.
 
 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that what
 you are probably seeing).
 
 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same 
  result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
  wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
  quorum for writes. But when I shut down one of them UnavailableExceptions 
  are thrown. Why is that? Isn't that the sense of quorum and a 
  fault-tolerant DB that it continues with the remaining 2 nodes and 
  redistributes the data to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn
 
 
 



Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
If that is what you want, use CL=ONE

On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

  In other words, if you want to use QUORUM, you need to set RF=3.
 
  (I know because I had exactly the same problem.)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or
 3), N1 will still remain on another node. Only if both fail, I actually lose
 data. But apparently this is not how it works...

  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less that
 what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  
   On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
  
   Quorum is really only useful when RF  2, since the for a quorum to
   succeed RF/2+1 replicas must be available.
  
   2/2+1==2 and I killed 1 of 3, so... don't get it.
  
   This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
  
   /d
  
   On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
   Hi!
  
   I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
  
   What may I be doing wrong?
  
   thx
   tcn
  
  
 




Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Nick Bailey
On Thu, Dec 9, 2010 at 10:43 AM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

  In other words, if you want to use QUORUM, you need to set RF=3.
 
  (I know because I had exactly the same problem.)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or
 3), N1 will still remain on another node. Only if both fail, I actually lose
 data. But apparently this is not how it works...


 No this is correct. Killing one node with a replication factor of 2 will
not cause you to lose data. You are requiring a consistency level higher
than what is available. Change your app to use CL.ONE and all data will be
available even with one machine unavailable.



  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less that
 what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  
   On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
  
   Quorum is really only useful when RF  2, since the for a quorum to
   succeed RF/2+1 replicas must be available.
  
   2/2+1==2 and I killed 1 of 3, so... don't get it.
  
   This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
  
   /d
  
   On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
   Hi!
  
   I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
  
   What may I be doing wrong?
  
   thx
   tcn
  
  
 




Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
 I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), 
 N1 will still remain on another node. Only if both fail, I actually lose 
 data. But apparently this is not how it works...

Sure, the data that N1 holds is also on another node and you won't
lose it by only losing N1.
But when you do a quorum query, you are saying to Cassandra Please
please would you fail my request
if you can't get a response from 2 nodes. So if only 1 node holding
the data is up at the moment of the
query then Cassandra, which is a very polite software, do what you
asked and fail.
If you want Cassandra to send you an answer with only one node up, use
CL=ONE (as said by David).


 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same 
  result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
  wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
  quorum for writes. But when I shut down one of them 
  UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
  quorum and a fault-tolerant DB that it continues with the remaining 2 
  nodes and redistributes the data to the broken one as soons as its up 
  again?
 
  What may I be doing wrong?
 
  thx
  tcn
 
 





Cassandra and disk space

2010-12-09 Thread Mark
I recently ran into a problem during a repair operation where my nodes 
completely ran out of space and my whole cluster was... well, 
clusterfucked.


I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of its disk 
space? Are there any normal day-to-day operations that would cause the 
any one node to double in size that I should be aware of? If on or more 
nodes to surpass the 50% mark, what should I plan to do?


Thanks for any advice


Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig

On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or 
 3), N1 will still remain on another node. Only if both fail, I actually lose 
 data. But apparently this is not how it works...
 
 Sure, the data that N1 holds is also on another node and you won't
 lose it by only losing N1.
 But when you do a quorum query, you are saying to Cassandra Please
 please would you fail my request
 if you can't get a response from 2 nodes. So if only 1 node holding
 the data is up at the moment of the
 query then Cassandra, which is a very polite software, do what you
 asked and fail.

And my application would fall back to ONE. Quorum writes will also fail so I 
would also use ONE so that the app stays up. What would I have to do make the 
data to redistribute when the broken node is up again? Simply call nodetool 
repair on it?

 If you want Cassandra to send you an answer with only one node up, use
 CL=ONE (as said by David).
 
 
 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.
 
 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that 
 what
 you are probably seeing).
 
 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 
 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.
 
 2/2+1==2 and I killed 1 of 3, so... don't get it.
 
 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.
 
 /d
 
 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!
 
 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them 
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
 quorum and a fault-tolerant DB that it continues with the remaining 2 
 nodes and redistributes the data to the broken one as soons as its up 
 again?
 
 What may I be doing wrong?
 
 thx
 tcn
 
 
 
 
 



Re: N to N relationships

2010-12-09 Thread Sébastien Druon
Thanks a lot for the answer

What about the indexing when adding a new element? Is it incremental?

Thanks again

On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact in
 terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon





Re: N to N relationships

2010-12-09 Thread David Boxenhorn
What do you mean by indexing?

On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon






Re: Secondary indexes change everything?

2010-12-09 Thread Tyler Hobbs
OPP is not yet obsolete.

The included secondary indexes still aren't good at finding keys for ranges
of indexed values, such as  name  'b' and name  'c' .  This is something
that an OPP index would be good at.  Of course, you can do something similar
with one or more rows, so it's not that big of an advantage for OPP.

If you can make primary indexes useful, you might as well -- no reason to
throw that away.

The main thing that the secondary index support does is relieve you from
having to write all of the indexing code and CFs by hand.

- Tyler

On Thu, Dec 9, 2010 at 8:23 AM, David Boxenhorn da...@lookin2.com wrote:

 - OPP becomes obsolete (OOP is not obsolete!)
 - primary indexes become obsolete if you ever want to do a range query
 (which you probably will...), better to assign a random row id

 Taken together, it's likely that very little will remain of your old
 database schema...

 Am I right?




Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Tyler Hobbs
If you switch your writes to CL ONE when a failure occurs, you might as well
use ONE for all writes.  ONE and QUORUM behave the same when all nodes are
working correctly.

- Tyler

On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:

  I naively assume that if I kill either node that holds N1 (i.e. node 1
 or 3), N1 will still remain on another node. Only if both fail, I actually
 lose data. But apparently this is not how it works...
 
  Sure, the data that N1 holds is also on another node and you won't
  lose it by only losing N1.
  But when you do a quorum query, you are saying to Cassandra Please
  please would you fail my request
  if you can't get a response from 2 nodes. So if only 1 node holding
  the data is up at the moment of the
  query then Cassandra, which is a very polite software, do what you
  asked and fail.

 And my application would fall back to ONE. Quorum writes will also fail so
 I would also use ONE so that the app stays up. What would I have to do make
 the data to redistribute when the broken node is up again? Simply call
 nodetool repair on it?

  If you want Cassandra to send you an answer with only one node up, use
  CL=ONE (as said by David).
 
 
  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less
 that what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn
 
 
 
 
 




Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
 And my application would fall back to ONE. Quorum writes will also fail so I 
 would also use ONE so that the app stays up. What would I have to do make the 
 data to redistribute when the broken node is up again? Simply call nodetool 
 repair on it?

There is 3 mechanisms for that:
  - hinted handoff: basically, when the node is back up, the other
node will send him what he missed.
  - read-repair: whenever you read a data and an inconsistency is
detected (because one node is not up to date), it gets repaired.
  - calling nodetool repair

The two first are automatic, you have nothing to do.
Nodetool repair is usually run only periodically (say once a week) to
repair some cold data that wasn't dealt with by
the two first mechanisms.

--
Sylvain


 If you want Cassandra to send you an answer with only one node up, use
 CL=ONE (as said by David).


 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, 
 you have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that 
 what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:

 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

 2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.

 /d

 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them 
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
 quorum and a fault-tolerant DB that it continues with the remaining 2 
 nodes and redistributes the data to the broken one as soons as its up 
 again?

 What may I be doing wrong?

 thx
 tcn









Re: Cassandra and disk space

2010-12-09 Thread Peter Schuller
 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

Depending on which version you're on, you may be seeing this:

   https://issues.apache.org/jira/browse/CASSANDRA-1674

But regardless, disk space variations is a fact of life with
cassandra. Off the top of my head I'm not ready to say what the
expectations are with respect to repair under all circumstances.
Anyone?

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

Major compactions can potentially double the amount of disk, if you
have a single large column family that contributes almost all disk
space. For such clusters, regular background compaction can indeed
cause a doubling when the compaction happened to be a major one (i.e.,
happened to include all sstables).

-- 
/ Peter Schuller


Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
If you are on 0.6, repair is particularly dangerous with respect to disk
space usage.  If your replica is sufficiently out of sync, you can triple
your disk usage pretty easily.  This has been improved in 0.7, so repairs
should use about half as much disk space, on average.

In general, yes, keep your nodes under 50% disk usage at all times.  Any of:
compaction, cleanup, snapshotting, repair, or bootstrapping (the latter two
are improved in 0.7) can double your disk usage temporarily.

You should plan to add more disk space or add nodes when you get close to
this limit.  Once you go over 50%, it's more difficult to add nodes, at
least in 0.6.

- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice



Re: N to N relationships

2010-12-09 Thread Sébastien Druon
I mean if I have secondary indexes. Apparently they are calculated in the
background...

On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?

 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon







Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev

Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your 
comment, I will need 200TB of storage for 100TB of data to keep 
Cassandra running.


--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:
If you are on 0.6, repair is particularly dangerous with respect to 
disk space usage.  If your replica is sufficiently out of sync, you 
can triple your disk usage pretty easily.  This has been improved in 
0.7, so repairs should use about half as much disk space, on average.


In general, yes, keep your nodes under 50% disk usage at all times.  
Any of: compaction, cleanup, snapshotting, repair, or bootstrapping 
(the latter two are improved in 0.7) can double your disk usage 
temporarily.


You should plan to add more disk space or add nodes when you get close 
to this limit.  Once you go over 50%, it's more difficult to add 
nodes, at least in 0.6.


- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com 
mailto:static.void@gmail.com wrote:


I recently ran into a problem during a repair operation where my
nodes completely ran out of space and my whole cluster was...
well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of
its disk space? Are there any normal day-to-day operations that
would cause the any one node to double in size that I should be
aware of? If on or more nodes to surpass the 50% mark, what should
I plan to do?

Thanks for any advice




Stuck with adding nodes

2010-12-09 Thread Daniel Doubleday
Hi good people.

I underestimated load during peak times and now I'm stuck with our production 
cluster. 
Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data 
load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster

The problem derives from our quorum read / writes: At peak hours one of the 
machines (thats random) will fall behind because its a little slower than the 
others and than shortly after that it will drop most read requests. So right 
now the only way to survive is to take one machine down making every read / 
write a ALL operation. It's necessary to take one machine down because 
otherwise users will wait for timeouts from that overwhelmed machine when the 
client lib chooses it. Since we are a real time oriented thing thats a killer.

So now we tried to add 2 more nodes. Problem is that anticompaction takes to 
long. Meaning it is not done when peak hour arrives and the machine that would 
stream the data to the new node must be taken down. We tried to block the ports 
7000 and 9160 to that machine because we hoped that would stop traffic and let 
the machine end anticompaction. But that did not work because we could not cut 
the already existing connections to the other nodes.

Currently I am copying all data files (thats all existing data) from one node 
to the new nodes in hope that I could than manually assign them their new 
tokenrange (nodetool move) and do cleanup.

Obviously I will try this tomorrow (it's been a long day) on a test system but 
any advice would be highly appreciated.

Sighs and thanks.
Daniel

smeet.com
Berlin

Re: Stuck with adding nodes

2010-12-09 Thread Peter Schuller
 Currently I am copying all data files (thats all existing data) from one node 
 to the new nodes in hope that I could than manually assign them their new 
 tokenrange (nodetool move) and do cleanup.

Unless I'm misunderstanding you I believe you should be setting the
initial token. nodetool move would be for a node already in the ring.
And keep in mind that a nodetool move is currently a
decommission+bootstrap - so if you're teetering on the edge of
overload you will want to keep that in mind when moving a node to
avoid ending up in a worse situation as another node temporarily
receives more load than usual as a result of increased ring ownership.

 Obviously I will try this tomorrow (it's been a long day) on a test system 
 but any advice would be highly appreciated.

One possibility if you have additional hardware to spare temporarily,
is to add more nodes than you actually need and then, once you are
significantly over capacity, you have the flexibility to move nodes
around to an optimum position and then decommission those machines
that were only borrowed. I.e., initial bootstrap of nodes takes a
shorter amount of time because you're giving them less token space per
new node. And once all are in the ring, you're free to move things
around and then free up the hardware.

(Another option may be to implement throttling of the anti-compaction
so that it runs very slowly during peak hours, but that requires
patching cassandra or else firewall/packet filtering fu and is
probably likely to be more risky than it's worth.)

-- 
/ Peter Schuller


Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
That depends on your scenario.  In the worst case of one big CF, there's not
much that can be easily done for the disk usage of compaction and cleanup
(which is essentially compaction).

If, instead, you have several column families and no single CF makes up the
majority of your data, you can push your disk usage a bit higher.

A fundamental idea behind Cassandra's architecture is that disk space is
cheap (which, indeed, it is).  If you are particularly sensitive to this,
Cassandra might not be the best solution to your problem.  Also keep in mind
that Cassandra performs well with average disks, so you don't need to spend
a lot there.  Additionally, most people find that the replication protects
their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.

- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az wrote:

  Is there any plans to improve this in future?

 For big data clusters this could be very expensive. Based on your comment,
 I will need 200TB of storage for 100TB of data to keep Cassandra running.

 --
 Rustam.

 On 09/12/2010 17:56, Tyler Hobbs wrote:

 If you are on 0.6, repair is particularly dangerous with respect to disk
 space usage.  If your replica is sufficiently out of sync, you can triple
 your disk usage pretty easily.  This has been improved in 0.7, so repairs
 should use about half as much disk space, on average.

 In general, yes, keep your nodes under 50% disk usage at all times.  Any
 of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
 two are improved in 0.7) can double your disk usage temporarily.

 You should plan to add more disk space or add nodes when you get close to
 this limit.  Once you go over 50%, it's more difficult to add nodes, at
 least in 0.6.

 - Tyler

 On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice





Re: Cassandra and disk space

2010-12-09 Thread Scott Dworkis
i recently finished a practice expansion of 4 nodes to 5 nodes, a series 
of nodetool move, nodetool cleanup and jmx gc steps.  i found that in 
some of the steps, disk usage actually grew to 2.5x the base data size on 
one of the nodes.  i'm using 0.6.4.


-scott

On Thu, 9 Dec 2010, Rustam Aliyev wrote:


Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your comment, I 
will need 200TB of storage for 100TB of data to keep Cassandra running.

--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:
  If you are on 0.6, repair is particularly dangerous with respect to disk 
space usage.  If your replica is sufficiently out of sync, you can
  triple your disk usage pretty easily.  This has been improved in 0.7, so 
repairs should use about half as much disk space, on average.

  In general, yes, keep your nodes under 50% disk usage at all times.  Any 
of: compaction, cleanup, snapshotting, repair, or bootstrapping (the
  latter two are improved in 0.7) can double your disk usage temporarily.

  You should plan to add more disk space or add nodes when you get close to 
this limit.  Once you go over 50%, it's more difficult to add nodes,
  at least in 0.6.

  - Tyler

  On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:
I recently ran into a problem during a repair operation where my 
nodes completely ran out of space and my whole cluster was...
well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of its 
disk space? Are there any normal day-to-day operations that
would cause the any one node to double in size that I should be 
aware of? If on or more nodes to surpass the 50% mark, what should
I plan to do?

Thanks for any advice





Re: N to N relationships

2010-12-09 Thread Aaron Morton
Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.I would consider using a standard CF for the Columns and one for the Rows. The key for each would be the col / row number, each cassandra column name would be the id of the other dimension and the value whatever you want. - when storing the data update both the Column and Row CF- reading a whole row/col would be simply reading from the appropriate CF.- reading an intersection is a get_slice to either col or row CF using the column_names field to identify the other dimension.You would not need secondary indexes to serve these queries.Hope that helps.AaronOn 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:I mean if I have secondary indexes. Apparently they are calculated in the background...On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:
What do you mean by indexing? On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote:

Thanks a lot for the answerWhat about the indexing when adding a new element? Is it incremental?

Thanks againOn 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:


How about a regular CF where keys are n...@n ?Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1. 



On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote:



Hello,For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra.The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements.





We have a set of questions concerning this:- what is the best way to represent this matrix? what would have the best performance in reading? in writing?. a super column family with n column families, with n columns each





. a column family with n columns and n linesIn the second case, we would need to extract 2 kinds of information:- all the relations for a line: this should be no specific problem;





- all the relations for a column: in that case we would need an index for the columns, right? and then get all the lines where the value of the column in question is not null... is it the correct way to do?





When using indexes, say we want to add another element N+1. What impact in terms of time would it have on the indexation job?Thanks a lot for the answers,Best regards,





Sébastien Druon






Re: Secondary indexes change everything?

2010-12-09 Thread Jonathan Ellis
On Thu, Dec 9, 2010 at 12:16 PM, David Boxenhorn da...@lookin2.com wrote:
 What do you mean by, The included secondary indexes still aren't good at
 finding keys for ranges of indexed values, such as  name  'b' and name 
 'c' .?

 Do you mean that secondary indexes don't support range queries at all?

http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Ryan King
Overall, I don't think this is a crazy idea, though I think I'd prefer
cassandra to manage this setup.

The problem you will run into is that because the storage port is
assumed to be the same across the cluster you'll only be able to do
this if you can assign multiple IPs to each server (one for each
process) (I know this because I proposed something similar last year
:)).

-ryan

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The major downside is you're going to want to let each instance have
 its own dedicated commitlog spindle too, unless you just don't have
 many updates.

 On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I am quite ready to be stoned for this thread but I have been thinking
 about this for a while and I just wanted to bounce these ideas of some
 guru's.

 Cassandra does allow multiple data directories, but as far as I can
 tell no one runs in this configuration. This is something that is very
 different between the hbase architecture and the Cassandra
 architecture. HBase borrows the concept from hadoop of JBOD
 configurations. HBase has many small ish (~256 MB) regions managed
 with Zookeeper. Cassandra has a few (1 per node) large node sized
 Token Ranges managed by Gossip consensus.

 Lets say a node has 6 300 GB disks. You have the options of RAID5,
 RAID6, RAID10, or RAID0. The problem I have found with these
 configurations are major compactions (of even large minor ones) can
 take a long time. Even if your disk is not heavily utilized this is a
 lot of data to move through. Thus node joins take a long time. Node
 moves take a long time.

 The idea behind micrandra is for a 6 disk system run 6 instances of
 Cassandra, one per disk. Use the RackAwareSnitch to make sure no
 replicas live on the same node.

 The downsides
 1) we would have to manage 6x the instances of cassandra
 2) we would have some overhead for each JVM.

 The upsides ?
 1) Since disk/instance failure only degrades the overall performance
 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
 down a disk)
 2) Moves and joins have less work to do
 3) Can scale up a single node by adding a single disk to an existing
 system (assuming the ram and cpu is light)
 4) OPP would be easier to balance out hot spots (maybe not on this
 one in not an OPP)

 What does everyone thing? Does it ever make sense to run this way?




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Obscured question about data size in a Column Family

2010-12-09 Thread Joshua Partogi
Hi there,

Quoting an information in the wiki about Cassandra limitations (
http://wiki.apache.org/cassandra/CassandraLimitations):
... So all the data from a given columnfamily/key pair had to fit in memory,
or 2GB ...

Does this mean
1. A ColumnFamily can only be 2GB of data
2. A Column (key/pair) can only be 2GB of data

Thanks for the explanation.

-- 
http://twitter.com/jpartogi http://twitter.com/scrum8


Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev


That depends on your scenario.  In the worst case of one big CF, 
there's not much that can be easily done for the disk usage of 
compaction and cleanup (which is essentially compaction).


If, instead, you have several column families and no single CF makes 
up the majority of your data, you can push your disk usage a bit higher.




Is there any formula to calculate this? Let's say I have 500GB in single 
CF. So I need at least 500GB of free space for compaction. If I 
partition this CF and split it into 10 proportional CFs each 50GB, does 
it mean that I will need only 50GB of free space?


Also, is there recommended maximum of data size per node?

Thanks.

A fundamental idea behind Cassandra's architecture is that disk space 
is cheap (which, indeed, it is).  If you are particularly sensitive to 
this, Cassandra might not be the best solution to your problem.  Also 
keep in mind that Cassandra performs well with average disks, so you 
don't need to spend a lot there.  Additionally, most people find that 
the replication protects their data enough to allow them to use RAID 0 
instead of 1, 10, 5, or 6.


- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az 
mailto:rus...@code.az wrote:


Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your
comment, I will need 200TB of storage for 100TB of data to keep
Cassandra running.

--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:

If you are on 0.6, repair is particularly dangerous with respect
to disk space usage.  If your replica is sufficiently out of
sync, you can triple your disk usage pretty easily.  This has
been improved in 0.7, so repairs should use about half as much
disk space, on average.

In general, yes, keep your nodes under 50% disk usage at all
times.  Any of: compaction, cleanup, snapshotting, repair, or
bootstrapping (the latter two are improved in 0.7) can double
your disk usage temporarily.

You should plan to add more disk space or add nodes when you get
close to this limit.  Once you go over 50%, it's more difficult
to add nodes, at least in 0.6.

- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com
mailto:static.void@gmail.com wrote:

I recently ran into a problem during a repair operation where
my nodes completely ran out of space and my whole cluster
was... well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50%
of its disk space? Are there any normal day-to-day operations
that would cause the any one node to double in size that I
should be aware of? If on or more nodes to surpass the 50%
mark, what should I plan to do?

Thanks for any advice






Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
Yes, that's correct, but I wouldn't push it too far.  You'll become much
more sensitive to disk usage changes; in particular, rebalancing your
cluster will particularly difficult, and repair will also become dangerous.
Disk performance also tends to drop when a disk nears capacity.

There's no recommended maximum size -- it all depends on your access rates.
Anywhere from 10 GB to 1TB is typical.

- Tyler

On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev rus...@code.az wrote:


 That depends on your scenario.  In the worst case of one big CF, there's
 not much that can be easily done for the disk usage of compaction and
 cleanup (which is essentially compaction).

 If, instead, you have several column families and no single CF makes up the
 majority of your data, you can push your disk usage a bit higher.


 Is there any formula to calculate this? Let's say I have 500GB in single
 CF. So I need at least 500GB of free space for compaction. If I partition
 this CF and split it into 10 proportional CFs each 50GB, does it mean that I
 will need only 50GB of free space?

 Also, is there recommended maximum of data size per node?

 Thanks.


 A fundamental idea behind Cassandra's architecture is that disk space is
 cheap (which, indeed, it is).  If you are particularly sensitive to this,
 Cassandra might not be the best solution to your problem.  Also keep in mind
 that Cassandra performs well with average disks, so you don't need to spend
 a lot there.  Additionally, most people find that the replication protects
 their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.

 - Tyler

 On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az wrote:

  Is there any plans to improve this in future?

 For big data clusters this could be very expensive. Based on your comment,
 I will need 200TB of storage for 100TB of data to keep Cassandra running.

 --
  Rustam.

 On 09/12/2010 17:56, Tyler Hobbs wrote:

 If you are on 0.6, repair is particularly dangerous with respect to disk
 space usage.  If your replica is sufficiently out of sync, you can triple
 your disk usage pretty easily.  This has been improved in 0.7, so repairs
 should use about half as much disk space, on average.

 In general, yes, keep your nodes under 50% disk usage at all times.  Any
 of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
 two are improved in 0.7) can double your disk usage temporarily.

 You should plan to add more disk space or add nodes when you get close to
 this limit.  Once you go over 50%, it's more difficult to add nodes, at
 least in 0.6.

 - Tyler

 On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice






Re: Cassandra and disk space

2010-12-09 Thread Nick Bailey
Additionally, cleanup will fail to run when the disk is more than 50% full.
Another reason to stay below 50%.

On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs ty...@riptano.com wrote:

 Yes, that's correct, but I wouldn't push it too far.  You'll become much
 more sensitive to disk usage changes; in particular, rebalancing your
 cluster will particularly difficult, and repair will also become dangerous.
 Disk performance also tends to drop when a disk nears capacity.

 There's no recommended maximum size -- it all depends on your access
 rates.  Anywhere from 10 GB to 1TB is typical.

 - Tyler


 On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev rus...@code.az wrote:


 That depends on your scenario.  In the worst case of one big CF, there's
 not much that can be easily done for the disk usage of compaction and
 cleanup (which is essentially compaction).

 If, instead, you have several column families and no single CF makes up
 the majority of your data, you can push your disk usage a bit higher.


 Is there any formula to calculate this? Let's say I have 500GB in single
 CF. So I need at least 500GB of free space for compaction. If I partition
 this CF and split it into 10 proportional CFs each 50GB, does it mean that I
 will need only 50GB of free space?

 Also, is there recommended maximum of data size per node?

 Thanks.


 A fundamental idea behind Cassandra's architecture is that disk space is
 cheap (which, indeed, it is).  If you are particularly sensitive to this,
 Cassandra might not be the best solution to your problem.  Also keep in mind
 that Cassandra performs well with average disks, so you don't need to spend
 a lot there.  Additionally, most people find that the replication protects
 their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.

 - Tyler

 On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az wrote:

  Is there any plans to improve this in future?

 For big data clusters this could be very expensive. Based on your
 comment, I will need 200TB of storage for 100TB of data to keep Cassandra
 running.

 --
  Rustam.

 On 09/12/2010 17:56, Tyler Hobbs wrote:

 If you are on 0.6, repair is particularly dangerous with respect to disk
 space usage.  If your replica is sufficiently out of sync, you can triple
 your disk usage pretty easily.  This has been improved in 0.7, so repairs
 should use about half as much disk space, on average.

 In general, yes, keep your nodes under 50% disk usage at all times.  Any
 of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
 two are improved in 0.7) can double your disk usage temporarily.

 You should plan to add more disk space or add nodes when you get close to
 this limit.  Once you go over 50%, it's more difficult to add nodes, at
 least in 0.6.

 - Tyler

 On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, 
 clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes 
 to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice







Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev

Thanks Tyler, this is really useful.

Also, I noticed that you can specify multiple data file directories 
located on different disks. Let's say if I have machine with 4 x 500GB 
drives, what would be the difference between following 2 setups:


  1. each drive mounted separately and has data file dirs on it (so 4x
 data file dirs)
  2. disks are in RAID0 and mounted as one drive with one data folder on it

In other words, does splitting data folder into smaller ones bring any 
performance or stability advantages?



On 10/12/2010 00:03, Tyler Hobbs wrote:
Yes, that's correct, but I wouldn't push it too far.  You'll become 
much more sensitive to disk usage changes; in particular, rebalancing 
your cluster will particularly difficult, and repair will also become 
dangerous.  Disk performance also tends to drop when a disk nears 
capacity.


There's no recommended maximum size -- it all depends on your access 
rates.  Anywhere from 10 GB to 1TB is typical.


- Tyler

On Thu, Dec 9, 2010 at 5:52 PM, Rustam Aliyev rus...@code.az 
mailto:rus...@code.az wrote:




That depends on your scenario.  In the worst case of one big CF,
there's not much that can be easily done for the disk usage of
compaction and cleanup (which is essentially compaction).

If, instead, you have several column families and no single CF
makes up the majority of your data, you can push your disk usage
a bit higher.



Is there any formula to calculate this? Let's say I have 500GB in
single CF. So I need at least 500GB of free space for compaction.
If I partition this CF and split it into 10 proportional CFs each
50GB, does it mean that I will need only 50GB of free space?

Also, is there recommended maximum of data size per node?

Thanks.



A fundamental idea behind Cassandra's architecture is that disk
space is cheap (which, indeed, it is).  If you are particularly
sensitive to this, Cassandra might not be the best solution to
your problem.  Also keep in mind that Cassandra performs well
with average disks, so you don't need to spend a lot there. 
Additionally, most people find that the replication protects

their data enough to allow them to use RAID 0 instead of 1, 10,
5, or 6.

- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az
mailto:rus...@code.az wrote:

Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on
your comment, I will need 200TB of storage for 100TB of data
to keep Cassandra running.

--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:

If you are on 0.6, repair is particularly dangerous with
respect to disk space usage.  If your replica is
sufficiently out of sync, you can triple your disk usage
pretty easily.  This has been improved in 0.7, so repairs
should use about half as much disk space, on average.

In general, yes, keep your nodes under 50% disk usage at all
times.  Any of: compaction, cleanup, snapshotting, repair,
or bootstrapping (the latter two are improved in 0.7) can
double your disk usage temporarily.

You should plan to add more disk space or add nodes when you
get close to this limit.  Once you go over 50%, it's more
difficult to add nodes, at least in 0.6.

- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark
static.void@gmail.com
mailto:static.void@gmail.com wrote:

I recently ran into a problem during a repair operation
where my nodes completely ran out of space and my whole
cluster was... well, clusterfucked.

I want to make sure how to prevent this problem in the
future.

Should I make sure that at all times every node is under
50% of its disk space? Are there any normal day-to-day
operations that would cause the any one node to double
in size that I should be aware of? If on or more nodes
to surpass the 50% mark, what should I plan to do?

Thanks for any advice








Re: Cassandra and disk space

2010-12-09 Thread Robert Coli
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev rus...@code.az wrote:
 Thanks Tyler, this is really useful.
 [ RAID0 vs JBOD question ]
 In other words, does splitting data folder into smaller ones bring any
 performance or stability advantages?

This is getting to be a FAQ, so here's my stock answer :

There are non-zero production deployments which have experienced fail
from multiple data directories in cassandra.

There are zero production deployments which have experienced win from
multiple data directories in cassandra.

YMMV, of course!

=Rob
PS - Maybe we should remove the multiple data directory stuff, so
people don't keep getting tempted to use it?


Re: Cassandra and disk space

2010-12-09 Thread Brandon Williams
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev rus...@code.az wrote:

 Also, I noticed that you can specify multiple data file directories located
 on different disks. Let's say if I have machine with 4 x 500GB drives, what
 would be the difference between following 2 setups:

1. each drive mounted separately and has data file dirs on it (so 4x
data file dirs)
2. disks are in RAID0 and mounted as one drive with one data folder on
it

 In other words, does splitting data folder into smaller ones bring any
 performance or stability advantages?


It brings disadvantages.  Your largest CF will be limited to the size of
your smallest drive, and you won't be using them in parallel when
compacting.  RAID0 is the better option.

-Brandon


Re: [OT] shout out for riptano training

2010-12-09 Thread Sal Fuentes
I second that as well. I actually found the training to be fun (love the new
stuff in 0.7.0) and quite interesting. Now I'm looking forward to the next
Cassandra Summit. Thank you Riptano.

On Thu, Dec 9, 2010 at 2:48 PM, Dave Viner davevi...@gmail.com wrote:

 Just wanted to give a shout-out to Jonathan Ellis  the Riptano team for
 the awesome training they provided yesterday in Santa Monica.  It was
 awesome, and I'd highly recommend it for anyone who is using or seriously
 considering using Cassandra.

 Just. freakin awesome.

 Dave Viner




-- 
Salvador Fuentes Jr.


Re: N to N relationships

2010-12-09 Thread Nick Bailey
I would also recommend two column families. Storing the key as NxN would
require you to hit multiple machines to query for an entire row or column
with RandomPartitioner. Even with OPP you would need to pick row or columns
to order by and the other would require hitting multiple machines.  Two
column families avoids this and avoids any problems with choosing OPP.

On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Am assuming you have one matrix and you know the dimensions. Also as you
 say the most important queries are to get an entire column or an entire row.

 I would consider using a standard CF for the Columns and one for the Rows.
  The key for each would be the col / row number, each cassandra column name
 would be the id of the other dimension and the value whatever you want.

 - when storing the data update both the Column and Row CF
 - reading a whole row/col would be simply reading from the appropriate CF.
 - reading an intersection is a get_slice to either col or row CF using the
 column_names field to identify the other dimension.

 You would not need secondary indexes to serve these queries.

 Hope that helps.
 Aaron

 On 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:

 I mean if I have secondary indexes. Apparently they are calculated in the
 background...

 On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?


 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again



 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.



 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon








[RELEASE] 0.7.0 rc2

2010-12-09 Thread Eric Evans

I'd have thought all that turkey and stuffing would have done more
damage to momentum, but judging by the number of bug-fixes in the last
couple of weeks, that isn't the case.

As usual, I'd be remiss if I didn't point out that this is not yet a
stable release.  It's getting pretty close, but we're not ready to stick
a fork in it yet.  Be sure to test it thoroughly before upgrading
something important.

Please be sure to read through the changes[1] and release notes[2].
Report any problems you find[3], and if you have any questions, don't
hesitate to ask.

Thanks!

[1]: http://goo.gl/ZMQEe (CHANGES.txt)
[2]: http://goo.gl/R35HH (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://people.apache.org/~eevans/cassandra_0.7.0~rc2_all.deb

-- 
Eric Evans
eev...@rackspace.com




Re: Obscured question about data size in a Column Family

2010-12-09 Thread Jonathan Ellis
In = 0.6 (but not 0.7) a row could not be larger than 2GB.

2GB is still the largest possible column value.

On Thu, Dec 9, 2010 at 5:38 PM, Joshua Partogi joshua.j...@gmail.com wrote:
 Hi there,

 Quoting an information in the wiki about Cassandra limitations
 (http://wiki.apache.org/cassandra/CassandraLimitations):
 ... So all the data from a given columnfamily/key pair had to fit in memory,
 or 2GB ...

 Does this mean
 1. A ColumnFamily can only be 2GB of data
 2. A Column (key/pair) can only be 2GB of data

 Thanks for the explanation.

 --
 http://twitter.com/jpartogi




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Wenjun Che
describe_schema_versions()  returns a MapString, ListString with one
entry.  The key is an UUID and ListString has one element, which is IP of
my machine.

I think this has something to do with 'truncate' command in CLI,  I can
reproduce by:

1. create a CF with column1 as a secondary index
2. add some rows
3. truncate the CF
4. add some rows and do a query where column1=someValue and
column2=someValue.  It does not happy if the query just has
column1=someValue

thanks

On Wed, Dec 8, 2010 at 3:17 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Jonathan suggested your cluster has multiple schemas, caused by
 https://issues.apache.org/jira/browse/CASSANDRA-1824

 Can you run this API command describe_schema_versions() , it's not listed
 on the wiki yet but it will tell you how many schema versions are out there.
 pycassa supports it.

 Aaron


 On 09 Dec, 2010,at 08:19 AM, Aaron Morton aa...@thelastpicklecom wrote:

 Please send this to the list rather than me personally.

 Aaron

 Begin forwarded message:

 *From: *Wenjun Che wen...@openf.in
 *Date: *08 December 2010 4:35:10 PM
 *To: *aa...@thelastpickle.com
 *Subject: Re: NullPointerException in Beta3 and rc1*

 I created the CF on beta3 with:
 create column family RecipientChat with gc_grace=5 and comparator =
 'AsciiType' and
 column_metadata=[{column_name:recipient,validation_class:BytesType,index_type:0}]

 After I added about 5000 rows, I got the error when querying the CF with
 recipient='somevalue' and anotherColumn='anotherValue'.

 I tried truncating the CF and it was still getting the same error.

 The last thing I tried is upgrading to rc1 and saw the same error.

 Thanks






Re: Cassandra and disk space

2010-12-09 Thread Bill de hÓra
This is  true, but for larger installations I end up needing more
servers to hold the disks, more racks to hold the servers the point
where the overall cost per GB climbs (granted the cost per IOP is
probably still good).  AIUI, a chunk of that 50% is replicated data such
that the truly available space in the cluster is lower than 50% when
capacity planning?  If so, for some workloads where it's just data
pouring in with very few updates, would have me thinking I'd want a
tiered model, archiving cold data onto a filer/hdfs. 

Bill 

On Thu, 2010-12-09 at 13:26 -0600, Tyler Hobbs wrote:

 That depends on your scenario.  In the worst case of one big CF,
 there's not much that can be easily done for the disk usage of
 compaction and cleanup (which is essentially compaction).
 
 If, instead, you have several column families and no single CF makes
 up the majority of your data, you can push your disk usage a bit
 higher.
 
 A fundamental idea behind Cassandra's architecture is that disk space
 is cheap (which, indeed, it is).  If you are particularly sensitive to
 this, Cassandra might not be the best solution to your problem.  Also
 keep in mind that Cassandra performs well with average disks, so you
 don't need to spend a lot there.  Additionally, most people find that
 the replication protects their data enough to allow them to use RAID 0
 instead of 1, 10, 5, or 6.
 
 - Tyler
 
 
 On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az wrote:
 
 Is there any plans to improve this in future?
 
 For big data clusters this could be very expensive. Based on
 your comment, I will need 200TB of storage for 100TB of data
 to keep Cassandra running.
 
 --
 Rustam.
 
 
 
 On 09/12/2010 17:56, Tyler Hobbs wrote: 
 
  If you are on 0.6, repair is particularly dangerous with
  respect to disk space usage.  If your replica is
  sufficiently out of sync, you can triple your disk usage
  pretty easily.  This has been improved in 0.7, so repairs
  should use about half as much disk space, on average.
  
  In general, yes, keep your nodes under 50% disk usage at all
  times.  Any of: compaction, cleanup, snapshotting, repair,
  or bootstrapping (the latter two are improved in 0.7) can
  double your disk usage temporarily.
  
  You should plan to add more disk space or add nodes when you
  get close to this limit.  Once you go over 50%, it's more
  difficult to add nodes, at least in 0.6.
  
  - Tyler
  
  
  On Thu, Dec 9, 2010 at 11:19 AM, Mark
  static.void@gmail.com wrote:
  
  I recently ran into a problem during a repair
  operation where my nodes completely ran out of space
  and my whole cluster was... well, clusterfucked.
  
  I want to make sure how to prevent this problem in
  the future.
  
  Should I make sure that at all times every node is
  under 50% of its disk space? Are there any normal
  day-to-day operations that would cause the any one
  node to double in size that I should be aware of? If
  on or more nodes to surpass the 50% mark, what
  should I plan to do?
  
  Thanks for any advice
  
  
 
 




Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Jonathan Ellis
Can you still reproduce this with rc2, after starting with an empty
data and commitlog directory?

There used to be a bug w/ truncate + 2ary indexes but that should be fixed now.

On Thu, Dec 9, 2010 at 8:53 PM, Wenjun Che wen...@openf.in wrote:
 describe_schema_versions()  returns a MapString, ListString with one
 entry.  The key is an UUID and ListString has one element, which is IP of
 my machine.

 I think this has something to do with 'truncate' command in CLI,  I can
 reproduce by:

 1. create a CF with column1 as a secondary index
 2. add some rows
 3. truncate the CF
 4. add some rows and do a query where column1=someValue and
 column2=someValue.  It does not happy if the query just has
 column1=someValue

 thanks

 On Wed, Dec 8, 2010 at 3:17 PM, Aaron Morton aa...@thelastpickle.com
 wrote:

 Jonathan suggested your cluster has multiple schemas, caused
 by https://issues.apache.org/jira/browse/CASSANDRA-1824
 Can you run this API command describe_schema_versions() , it's not listed
 on the wiki yet but it will tell you how many schema versions are out there.
 pycassa supports it.
 Aaron

 On 09 Dec, 2010,at 08:19 AM, Aaron Morton aa...@thelastpicklecom wrote:

 Please send this to the list rather than me personally.
 Aaron
 Begin forwarded message:

 From: Wenjun Che wen...@openf.in
 Date: 08 December 2010 4:35:10 PM
 To: aa...@thelastpickle.com
 Subject: Re: NullPointerException in Beta3 and rc1

 I created the CF on beta3 with:
 create column family RecipientChat with gc_grace=5 and comparator =
 'AsciiType' and
 column_metadata=[{column_name:recipient,validation_class:BytesType,index_type:0}]

 After I added about 5000 rows, I got the error when querying the CF with
 recipient='somevalue' and anotherColumn='anotherValue'.

 I tried truncating the CF and it was still getting the same error.

 The last thing I tried is upgrading to rc1 and saw the same error.

 Thanks








-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Bill de hÓra


On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:

 The idea behind micrandra is for a 6 disk system run 6 instances of
 Cassandra, one per disk. Use the RackAwareSnitch to make sure no
 replicas live on the same node.
 
 The downsides
 1) we would have to manage 6x the instances of cassandra
 2) we would have some overhead for each JVM.
 
 The upsides ?
 1) Since disk/instance failure only degrades the overall performance
 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
 down a disk)
 2) Moves and joins have less work to do
 3) Can scale up a single node by adding a single disk to an existing
 system (assuming the ram and cpu is light)
 4) OPP would be easier to balance out hot spots (maybe not on this
 one in not an OPP)
 
 What does everyone thing? Does it ever make sense to run this way?


It might for read heavy loads.

When I looked at this, it was pointed out to me it's simpler to run
fewer bigger coarser nodes and take the entire node/server out when
something goes wrong. Basically give each Cassandra a server.

I wonder if it would be better to rethink compaction if that's what's
driving the idea. It seems to what is biting everyone, along with GC.

Bill