date:20101209

 This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it
 not true that each node contains all the data in the cluster?

Not at all. Basically each node is responsible of only a part of the data (a
range really). But for each data you can choose on how many nodes it is; this
is the Replication Factor.

For instance, if you choose to have RF=1, then each piece of data will be on
exactly one node (this is usually a bad idea since it offers very weak
durability guarantees but nevertheless, it can be done).

If you choose RF=3, each piece of data is on 3 nodes (independently of the
number of nodes your cluster have). You can have all data on all node, but for
that you'll have to choose RF=#{nodes in the cluster}. But this is a very
degenerate case.

 how does my query get directed to the right node?

Each node in the cluster knows the ranges of data each other nodes hold. I
suggest you watch the first video linked in this page
  http://wiki.apache.org/cassandra/ArticlesAndPresentations
It explains this and more.

--
Sylvain

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby

awesome!  Thank you guys for the really quick answers and the links to
the presentations.

On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it
 not true that each node contains all the data in the cluster?

 Not at all. Basically each node is responsible of only a part of the data (a
 range really). But for each data you can choose on how many nodes it is; this
 is the Replication Factor.

 For instance, if you choose to have RF=1, then each piece of data will be on
 exactly one node (this is usually a bad idea since it offers very weak
 durability guarantees but nevertheless, it can be done).

 If you choose RF=3, each piece of data is on 3 nodes (independently of the
 number of nodes your cluster have). You can have all data on all node, but for
 that you'll have to choose RF=#{nodes in the cluster}. But this is a very
 degenerate case.

 how does my query get directed to the right node?

 Each node in the cluster knows the ranges of data each other nodes hold. I
 suggest you watch the first video linked in this page
  http://wiki.apache.org/cassandra/ArticlesAndPresentations
 It explains this and more.

 --
 Sylvain

N to N relationships

2010-12-09 Thread Sébastien Druon

Hello,

For a specific case, we are thinking about representing a N to N
relationship with a NxN Matrix in Cassandra.
The relations will be only between a subset of elements, so the Matrix will
mostly contain empty elements.

We have a set of questions concerning this:
- what is the best way to represent this matrix? what would have the best
performance in reading? in writing?
  . a super column family with n column families, with n columns each
  . a column family with n columns and n lines

In the second case, we would need to extract 2 kinds of information:
- all the relations for a line: this should be no specific problem;
- all the relations for a column: in that case we would need an index for
the columns, right? and then get all the lines where the value of the column
in question is not null... is it the correct way to do?
When using indexes, say we want to add another element N+1. What impact in
terms of time would it have on the indexation job?

Thanks a lot for the answers,

Best regards,

Sébastien Druon

Re: N to N relationships

How about a regular CF where keys are n...@n ?

Then, getting a matrix row would be the same cost as getting a matrix column
(N gets), and it would be very easy to add element N+1.


On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix will
 mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact in
 terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon

Secondary indexes change everything?

It seems to me that secondary indexes (new in 0.7) change everything when it
comes to data modeling.

- OOP becomes obsolete
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?

Re: Secondary indexes change everything?

- OPP becomes obsolete (OOP is not obsolete!)
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?

Quorum: killing 1 out of 3 server kills the cluster (?)

Hi!

I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
for writes. But when I shut down one of them UnavailableExceptions are thrown. 
Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it 
continues with the remaining 2 nodes and redistributes the data to the broken 
one as soons as its up again?

What may I be doing wrong?

thx
tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Thibaut Britz

Hi,

The UnavailableExceptions  will be thrown because quorum of size 2
needs at least 2 nodes to be alive (as for qurom of size 3 as well).

The data won't be automatically redistributed to other nodes.

Thibaut


On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
 for writes. But when I shut down one of them UnavailableExceptions are 
 thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB 
 that it continues with the remaining 2 nodes and redistributes the data to 
 the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn

Re: unsubscribe

2010-12-09 Thread Eric Evans

On Thu, 2010-12-09 at 11:42 +0100, Massimo Carro wrote:
 Massimo Carro
 
 www.liquida.it - www.liquida.com


http://wiki.apache.org/cassandra/FAQ#unsubscribe

-- 
Eric Evans
eev...@rackspace.com

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Daniel Lundin

Quorum is really only useful when RF  2, since the for a quorum to
succeed RF/2+1 replicas must be available.

This means for RF = 2, consistency levels QUORUM and ALL yield the same result.

/d

On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum 
 for writes. But when I shut down one of them UnavailableExceptions are 
 thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB 
 that it continues with the remaining 2 nodes and redistributes the data to 
 the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)


On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.
 
 /d
 
 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!
 
 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant 
 DB that it continues with the remaining 2 nodes and redistributes the data 
 to the broken one as soons as its up again?
 
 What may I be doing wrong?
 
 thx
 tcn

RE: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Viktor Jevdokimov

With 3 nodes and RF=2 you have 3 key ranges: N1+N2, N2+N3 and N3+N1.
Killing N1 you've got only 1 alive range N2+N3 and 2/3 of the range is down for 
Quorum, which is actually all, so N1+N2 and N3+N1 fails.

-Original Message-
From: Timo Nentwig [mailto:timo.nent...@toptarif.de] 
Sent: Thursday, December 09, 2010 6:01 PM
To: user@cassandra.apache.org
Subject: Re: Quorum: killing 1 out of 3 server kills the cluster (?)

On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.

 /d

 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant 
 DB that it continues with the remaining 2 nodes and redistributes the data 
 to the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have
2 replicas. And since quorum is also 2 with that replication factor,
you cannot lose
a node, otherwise some query will end up as UnavailableException.

Again, this is not related to the total number of nodes. Even with 200
nodes, if
you use RF=2, you will have some query that fail (altough much less that what
you are probably seeing).

On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote:

 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

 2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.

 /d

 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them UnavailableExceptions 
 are thrown. Why is that? Isn't that the sense of quorum and a 
 fault-tolerant DB that it continues with the remaining 2 nodes and 
 redistributes the data to the broken one as soons as its up again?

 What may I be doing wrong?

 thx
 tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

In other words, if you want to use QUORUM, you need to set RF=3.

(I know because I had exactly the same problem.)

On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:

 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that
 what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same
 result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use
 quorum for writes. But when I shut down one of them UnavailableExceptions
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant
 DB that it continues with the remaining 2 nodes and redistributes the data
 to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)


On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

 In other words, if you want to use QUORUM, you need to set RF=3. 
 
 (I know because I had exactly the same problem.) 

I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), 
N1 will still remain on another node. Only if both fail, I actually lose data. 
But apparently this is not how it works...

 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.
 
 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that what
 you are probably seeing).
 
 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same 
  result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
  wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
  quorum for writes. But when I shut down one of them UnavailableExceptions 
  are thrown. Why is that? Isn't that the sense of quorum and a 
  fault-tolerant DB that it continues with the remaining 2 nodes and 
  redistributes the data to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

If that is what you want, use CL=ONE

On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

  In other words, if you want to use QUORUM, you need to set RF=3.
 
  (I know because I had exactly the same problem.)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or
 3), N1 will still remain on another node. Only if both fail, I actually lose
 data. But apparently this is not how it works...

  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less that
 what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  
   On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
  
   Quorum is really only useful when RF  2, since the for a quorum to
   succeed RF/2+1 replicas must be available.
  
   2/2+1==2 and I killed 1 of 3, so... don't get it.
  
   This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
  
   /d
  
   On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
   Hi!
  
   I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
  
   What may I be doing wrong?
  
   thx
   tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Nick Bailey

On Thu, Dec 9, 2010 at 10:43 AM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

  In other words, if you want to use QUORUM, you need to set RF=3.
 
  (I know because I had exactly the same problem.)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or
 3), N1 will still remain on another node. Only if both fail, I actually lose
 data. But apparently this is not how it works...


 No this is correct. Killing one node with a replication factor of 2 will
not cause you to lose data. You are requiring a consistency level higher
than what is available. Change your app to use CL.ONE and all data will be
available even with one machine unavailable.



  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less that
 what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  
   On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
  
   Quorum is really only useful when RF  2, since the for a quorum to
   succeed RF/2+1 replicas must be available.
  
   2/2+1==2 and I killed 1 of 3, so... don't get it.
  
   This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
  
   /d
  
   On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
   Hi!
  
   I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
  
   What may I be doing wrong?
  
   thx
   tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), 
 N1 will still remain on another node. Only if both fail, I actually lose 
 data. But apparently this is not how it works...

Sure, the data that N1 holds is also on another node and you won't
lose it by only losing N1.
But when you do a quorum query, you are saying to Cassandra Please
please would you fail my request
if you can't get a response from 2 nodes. So if only 1 node holding
the data is up at the moment of the
query then Cassandra, which is a very polite software, do what you
asked and fail.
If you want Cassandra to send you an answer with only one node up, use
CL=ONE (as said by David).


 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same 
  result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
  wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
  quorum for writes. But when I shut down one of them 
  UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
  quorum and a fault-tolerant DB that it continues with the remaining 2 
  nodes and redistributes the data to the broken one as soons as its up 
  again?
 
  What may I be doing wrong?
 
  thx
  tcn

Cassandra and disk space

2010-12-09 Thread Mark

I recently ran into a problem during a repair operation where my nodes 
completely ran out of space and my whole cluster was... well, 
clusterfucked.


I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of its disk 
space? Are there any normal day-to-day operations that would cause the 
any one node to double in size that I should be aware of? If on or more 
nodes to surpass the 50% mark, what should I plan to do?


Thanks for any advice

Re: Quorum: killing 1 out of 3 server kills the cluster (?)


On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or 
 3), N1 will still remain on another node. Only if both fail, I actually lose 
 data. But apparently this is not how it works...
 
 Sure, the data that N1 holds is also on another node and you won't
 lose it by only losing N1.
 But when you do a quorum query, you are saying to Cassandra Please
 please would you fail my request
 if you can't get a response from 2 nodes. So if only 1 node holding
 the data is up at the moment of the
 query then Cassandra, which is a very polite software, do what you
 asked and fail.

And my application would fall back to ONE. Quorum writes will also fail so I 
would also use ONE so that the app stays up. What would I have to do make the 
data to redistribute when the broken node is up again? Simply call nodetool 
repair on it?

 If you want Cassandra to send you an answer with only one node up, use
 CL=ONE (as said by David).
 
 
 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you 
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.
 
 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that 
 what
 you are probably seeing).
 
 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 
 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.
 
 2/2+1==2 and I killed 1 of 3, so... don't get it.
 
 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.
 
 /d
 
 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!
 
 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them 
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
 quorum and a fault-tolerant DB that it continues with the remaining 2 
 nodes and redistributes the data to the broken one as soons as its up 
 again?
 
 What may I be doing wrong?
 
 thx
 tcn

Re: N to N relationships

2010-12-09 Thread Sébastien Druon

Thanks a lot for the answer

What about the indexing when adding a new element? Is it incremental?

Thanks again

On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact in
 terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon

Re: N to N relationships

What do you mean by indexing?

On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon

Re: Secondary indexes change everything?

OPP is not yet obsolete.

The included secondary indexes still aren't good at finding keys for ranges
of indexed values, such as  name  'b' and name  'c' .  This is something
that an OPP index would be good at.  Of course, you can do something similar
with one or more rows, so it's not that big of an advantage for OPP.

If you can make primary indexes useful, you might as well -- no reason to
throw that away.

The main thing that the secondary index support does is relieve you from
having to write all of the indexing code and CFs by hand.

- Tyler

On Thu, Dec 9, 2010 at 8:23 AM, David Boxenhorn da...@lookin2.com wrote:

 - OPP becomes obsolete (OOP is not obsolete!)
 - primary indexes become obsolete if you ever want to do a range query
 (which you probably will...), better to assign a random row id

 Taken together, it's likely that very little will remain of your old
 database schema...

 Am I right?

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

If you switch your writes to CL ONE when a failure occurs, you might as well
use ONE for all writes.  ONE and QUORUM behave the same when all nodes are
working correctly.

- Tyler

On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote:

  I naively assume that if I kill either node that holds N1 (i.e. node 1
 or 3), N1 will still remain on another node. Only if both fail, I actually
 lose data. But apparently this is not how it works...
 
  Sure, the data that N1 holds is also on another node and you won't
  lose it by only losing N1.
  But when you do a quorum query, you are saying to Cassandra Please
  please would you fail my request
  if you can't get a response from 2 nodes. So if only 1 node holding
  the data is up at the moment of the
  query then Cassandra, which is a very polite software, do what you
  asked and fail.

 And my application would fall back to ONE. Quorum writes will also fail so
 I would also use ONE so that the app stays up. What would I have to do make
 the data to redistribute when the broken node is up again? Simply call
 nodetool repair on it?

  If you want Cassandra to send you an answer with only one node up, use
  CL=ONE (as said by David).
 
 
  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less
 that what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

 And my application would fall back to ONE. Quorum writes will also fail so I 
 would also use ONE so that the app stays up. What would I have to do make the 
 data to redistribute when the broken node is up again? Simply call nodetool 
 repair on it?

There is 3 mechanisms for that:
  - hinted handoff: basically, when the node is back up, the other
node will send him what he missed.
  - read-repair: whenever you read a data and an inconsistency is
detected (because one node is not up to date), it gets repaired.
  - calling nodetool repair

The two first are automatic, you have nothing to do.
Nodetool repair is usually run only periodically (say once a week) to
repair some cold data that wasn't dealt with by
the two first mechanisms.

--
Sylvain


 If you want Cassandra to send you an answer with only one node up, use
 CL=ONE (as said by David).


 On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:
 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, 
 you have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that 
 what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:

 On Dec 9, 2010, at 16:50, Daniel Lundin wrote:

 Quorum is really only useful when RF  2, since the for a quorum to
 succeed RF/2+1 replicas must be available.

 2/2+1==2 and I killed 1 of 3, so... don't get it.

 This means for RF = 2, consistency levels QUORUM and ALL yield the same 
 result.

 /d

 On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!

 I've 3 servers running (0.7rc1) with a replication_factor of 2 and use 
 quorum for writes. But when I shut down one of them 
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of 
 quorum and a fault-tolerant DB that it continues with the remaining 2 
 nodes and redistributes the data to the broken one as soons as its up 
 again?

 What may I be doing wrong?

 thx
 tcn

Re: Cassandra and disk space

2010-12-09 Thread Peter Schuller

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

Depending on which version you're on, you may be seeing this:

   https://issues.apache.org/jira/browse/CASSANDRA-1674

But regardless, disk space variations is a fact of life with
cassandra. Off the top of my head I'm not ready to say what the
expectations are with respect to repair under all circumstances.
Anyone?

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

Major compactions can potentially double the amount of disk, if you
have a single large column family that contributes almost all disk
space. For such clusters, regular background compaction can indeed
cause a doubling when the compaction happened to be a major one (i.e.,
happened to include all sstables).

-- 
/ Peter Schuller

Re: Cassandra and disk space

If you are on 0.6, repair is particularly dangerous with respect to disk
space usage.  If your replica is sufficiently out of sync, you can triple
your disk usage pretty easily.  This has been improved in 0.7, so repairs
should use about half as much disk space, on average.

In general, yes, keep your nodes under 50% disk usage at all times.  Any of:
compaction, cleanup, snapshotting, repair, or bootstrapping (the latter two
are improved in 0.7) can double your disk usage temporarily.

You should plan to add more disk space or add nodes when you get close to
this limit.  Once you go over 50%, it's more difficult to add nodes, at
least in 0.6.

- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice

Re: N to N relationships

2010-12-09 Thread Sébastien Druon

I mean if I have secondary indexes. Apparently they are calculated in the
background...

On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?

 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev


Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your 
comment, I will need 200TB of storage for 100TB of data to keep 
Cassandra running.


--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:
If you are on 0.6, repair is particularly dangerous with respect to 
disk space usage.  If your replica is sufficiently out of sync, you 
can triple your disk usage pretty easily.  This has been improved in 
0.7, so repairs should use about half as much disk space, on average.


In general, yes, keep your nodes under 50% disk usage at all times.  
Any of: compaction, cleanup, snapshotting, repair, or bootstrapping 
(the latter two are improved in 0.7) can double your disk usage 
temporarily.


You should plan to add more disk space or add nodes when you get close 
to this limit.  Once you go over 50%, it's more difficult to add 
nodes, at least in 0.6.


- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com 
mailto:static.void@gmail.com wrote:


I recently ran into a problem during a repair operation where my
nodes completely ran out of space and my whole cluster was...
well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of
its disk space? Are there any normal day-to-day operations that
would cause the any one node to double in size that I should be
aware of? If on or more nodes to surpass the 50% mark, what should
I plan to do?

Thanks for any advice

Stuck with adding nodes

2010-12-09 Thread Daniel Doubleday

Hi good people.

I underestimated load during peak times and now I'm stuck with our production 
cluster. 
Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data 
load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster

The problem derives from our quorum read / writes: At peak hours one of the 
machines (thats random) will fall behind because its a little slower than the 
others and than shortly after that it will drop most read requests. So right 
now the only way to survive is to take one machine down making every read / 
write a ALL operation. It's necessary to take one machine down because 
otherwise users will wait for timeouts from that overwhelmed machine when the 
client lib chooses it. Since we are a real time oriented thing thats a killer.

So now we tried to add 2 more nodes. Problem is that anticompaction takes to 
long. Meaning it is not done when peak hour arrives and the machine that would 
stream the data to the new node must be taken down. We tried to block the ports 
7000 and 9160 to that machine because we hoped that would stop traffic and let 
the machine end anticompaction. But that did not work because we could not cut 
the already existing connections to the other nodes.

Currently I am copying all data files (thats all existing data) from one node 
to the new nodes in hope that I could than manually assign them their new 
tokenrange (nodetool move) and do cleanup.

Obviously I will try this tomorrow (it's been a long day) on a test system but 
any advice would be highly appreciated.

Sighs and thanks.
Daniel

smeet.com
Berlin

Re: Stuck with adding nodes

2010-12-09 Thread Peter Schuller

 Currently I am copying all data files (thats all existing data) from one node 
 to the new nodes in hope that I could than manually assign them their new 
 tokenrange (nodetool move) and do cleanup.

Unless I'm misunderstanding you I believe you should be setting the
initial token. nodetool move would be for a node already in the ring.
And keep in mind that a nodetool move is currently a
decommission+bootstrap - so if you're teetering on the edge of
overload you will want to keep that in mind when moving a node to
avoid ending up in a worse situation as another node temporarily
receives more load than usual as a result of increased ring ownership.

 Obviously I will try this tomorrow (it's been a long day) on a test system 
 but any advice would be highly appreciated.

One possibility if you have additional hardware to spare temporarily,
is to add more nodes than you actually need and then, once you are
significantly over capacity, you have the flexibility to move nodes
around to an optimum position and then decommission those machines
that were only borrowed. I.e., initial bootstrap of nodes takes a
shorter amount of time because you're giving them less token space per
new node. And once all are in the ring, you're free to move things
around and then free up the hardware.

(Another option may be to implement throttling of the anti-compaction
so that it runs very slowly during peak hours, but that requires
patching cassandra or else firewall/packet filtering fu and is
probably likely to be more risky than it's worth.)

-- 
/ Peter Schuller

Re: Cassandra and disk space

That depends on your scenario.  In the worst case of one big CF, there's not
much that can be easily done for the disk usage of compaction and cleanup
(which is essentially compaction).

If, instead, you have several column families and no single CF makes up the
majority of your data, you can push your disk usage a bit higher.

A fundamental idea behind Cassandra's architecture is that disk space is
cheap (which, indeed, it is).  If you are particularly sensitive to this,
Cassandra might not be the best solution to your problem.  Also keep in mind
that Cassandra performs well with average disks, so you don't need to spend
a lot there.  Additionally, most people find that the replication protects
their data enough to allow them to use RAID 0 instead of 1, 10, 5, or 6.

- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az wrote:

  Is there any plans to improve this in future?

 For big data clusters this could be very expensive. Based on your comment,
 I will need 200TB of storage for 100TB of data to keep Cassandra running.

 --
 Rustam.

 On 09/12/2010 17:56, Tyler Hobbs wrote:

 If you are on 0.6, repair is particularly dangerous with respect to disk
 space usage.  If your replica is sufficiently out of sync, you can triple
 your disk usage pretty easily.  This has been improved in 0.7, so repairs
 should use about half as much disk space, on average.

 In general, yes, keep your nodes under 50% disk usage at all times.  Any
 of: compaction, cleanup, snapshotting, repair, or bootstrapping (the latter
 two are improved in 0.7) can double your disk usage temporarily.

 You should plan to add more disk space or add nodes when you get close to
 this limit.  Once you go over 50%, it's more difficult to add nodes, at
 least in 0.6.

 - Tyler

 On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:

 I recently ran into a problem during a repair operation where my nodes
 completely ran out of space and my whole cluster was... well, clusterfucked.

 I want to make sure how to prevent this problem in the future.

 Should I make sure that at all times every node is under 50% of its disk
 space? Are there any normal day-to-day operations that would cause the any
 one node to double in size that I should be aware of? If on or more nodes to
 surpass the 50% mark, what should I plan to do?

 Thanks for any advice

Re: Cassandra and disk space

2010-12-09 Thread Scott Dworkis

i recently finished a practice expansion of 4 nodes to 5 nodes, a series 
of nodetool move, nodetool cleanup and jmx gc steps.  i found that in 
some of the steps, disk usage actually grew to 2.5x the base data size on 
one of the nodes.  i'm using 0.6.4.


-scott

On Thu, 9 Dec 2010, Rustam Aliyev wrote:


Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your comment, I 
will need 200TB of storage for 100TB of data to keep Cassandra running.

--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:
  If you are on 0.6, repair is particularly dangerous with respect to disk 
space usage.  If your replica is sufficiently out of sync, you can
  triple your disk usage pretty easily.  This has been improved in 0.7, so 
repairs should use about half as much disk space, on average.

  In general, yes, keep your nodes under 50% disk usage at all times.  Any 
of: compaction, cleanup, snapshotting, repair, or bootstrapping (the
  latter two are improved in 0.7) can double your disk usage temporarily.

  You should plan to add more disk space or add nodes when you get close to 
this limit.  Once you go over 50%, it's more difficult to add nodes,
  at least in 0.6.

  - Tyler

  On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com wrote:
I recently ran into a problem during a repair operation where my 
nodes completely ran out of space and my whole cluster was...
well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50% of its 
disk space? Are there any normal day-to-day operations that
would cause the any one node to double in size that I should be 
aware of? If on or more nodes to surpass the 50% mark, what should
I plan to do?

Thanks for any advice

Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.I would consider using a standard CF for the Columns and one for the Rows. The key for each would be the col / row number, each cassandra column name would be the id of the other dimension and the value whatever you want. - when storing the data update both the Column and Row CF- reading a whole row/col would be simply reading from the appropriate CF.- reading an intersection is a get_slice to either col or row CF using the column_names field to identify the other dimension.You would not need secondary indexes to serve these queries.Hope that helps.AaronOn 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:I mean if I have secondary indexes. Apparently they are calculated in the background...On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:
What do you mean by indexing? On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote:

Thanks a lot for the answerWhat about the indexing when adding a new element? Is it incremental?

Thanks againOn 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

How about a regular CF where keys are n...@n ?Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1.

On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote:

Hello,For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra.The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements.

We have a set of questions concerning this:- what is the best way to represent this matrix? what would have the best performance in reading? in writing?. a super column family with n column families, with n columns each

. a column family with n columns and n linesIn the second case, we would need to extract 2 kinds of information:- all the relations for a line: this should be no specific problem;

- all the relations for a column: in that case we would need an index for the columns, right? and then get all the lines where the value of the column in question is not null... is it the correct way to do?

When using indexes, say we want to add another element N+1. What impact in terms of time would it have on the indexation job?Thanks a lot for the answers,Best regards,

Sébastien Druon

Re: Secondary indexes change everything?

2010-12-09 Thread Jonathan Ellis

On Thu, Dec 9, 2010 at 12:16 PM, David Boxenhorn da...@lookin2.com wrote:
 What do you mean by, The included secondary indexes still aren't good at
 finding keys for ranges of indexed values, such as  name  'b' and name 
 'c' .?

 Do you mean that secondary indexes don't support range queries at all?

http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Ryan King

Overall, I don't think this is a crazy idea, though I think I'd prefer
cassandra to manage this setup.

The problem you will run into is that because the storage port is
assumed to be the same across the cluster you'll only be able to do
this if you can assign multiple IPs to each server (one for each
process) (I know this because I proposed something similar last year
:)).

-ryan

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis jbel...@gmail.com wrote:
 The major downside is you're going to want to let each instance have
 its own dedicated commitlog spindle too, unless you just don't have
 many updates.

 On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I am quite ready to be stoned for this thread but I have been thinking
 about this for a while and I just wanted to bounce these ideas of some
 guru's.

 Cassandra does allow multiple data directories, but as far as I can
 tell no one runs in this configuration. This is something that is very
 different between the hbase architecture and the Cassandra
 architecture. HBase borrows the concept from hadoop of JBOD
 configurations. HBase has many small ish (~256 MB) regions managed
 with Zookeeper. Cassandra has a few (1 per node) large node sized
 Token Ranges managed by Gossip consensus.

 Lets say a node has 6 300 GB disks. You have the options of RAID5,
 RAID6, RAID10, or RAID0. The problem I have found with these
 configurations are major compactions (of even large minor ones) can
 take a long time. Even if your disk is not heavily utilized this is a
 lot of data to move through. Thus node joins take a long time. Node
 moves take a long time.

 The idea behind micrandra is for a 6 disk system run 6 instances of
 Cassandra, one per disk. Use the RackAwareSnitch to make sure no
 replicas live on the same node.

 The downsides
 1) we would have to manage 6x the instances of cassandra
 2) we would have some overhead for each JVM.

 The upsides ?
 1) Since disk/instance failure only degrades the overall performance
 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
 down a disk)
 2) Moves and joins have less work to do
 3) Can scale up a single node by adding a single disk to an existing
 system (assuming the ram and cpu is light)
 4) OPP would be easier to balance out hot spots (maybe not on this
 one in not an OPP)

 What does everyone thing? Does it ever make sense to run this way?




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Obscured question about data size in a Column Family

2010-12-09 Thread Joshua Partogi

Hi there,

Quoting an information in the wiki about Cassandra limitations (
http://wiki.apache.org/cassandra/CassandraLimitations):
... So all the data from a given columnfamily/key pair had to fit in memory,
or 2GB ...

Does this mean
1. A ColumnFamily can only be 2GB of data
2. A Column (key/pair) can only be 2GB of data

Thanks for the explanation.

-- 
http://twitter.com/jpartogi http://twitter.com/scrum8

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev



That depends on your scenario.  In the worst case of one big CF, 
there's not much that can be easily done for the disk usage of 
compaction and cleanup (which is essentially compaction).


If, instead, you have several column families and no single CF makes 
up the majority of your data, you can push your disk usage a bit higher.




Is there any formula to calculate this? Let's say I have 500GB in single 
CF. So I need at least 500GB of free space for compaction. If I 
partition this CF and split it into 10 proportional CFs each 50GB, does 
it mean that I will need only 50GB of free space?


Also, is there recommended maximum of data size per node?

Thanks.

A fundamental idea behind Cassandra's architecture is that disk space 
is cheap (which, indeed, it is).  If you are particularly sensitive to 
this, Cassandra might not be the best solution to your problem.  Also 
keep in mind that Cassandra performs well with average disks, so you 
don't need to spend a lot there.  Additionally, most people find that 
the replication protects their data enough to allow them to use RAID 0 
instead of 1, 10, 5, or 6.


- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev rus...@code.az 
mailto:rus...@code.az wrote:


Is there any plans to improve this in future?

For big data clusters this could be very expensive. Based on your
comment, I will need 200TB of storage for 100TB of data to keep
Cassandra running.

--
Rustam.

On 09/12/2010 17:56, Tyler Hobbs wrote:

If you are on 0.6, repair is particularly dangerous with respect
to disk space usage.  If your replica is sufficiently out of
sync, you can triple your disk usage pretty easily.  This has
been improved in 0.7, so repairs should use about half as much
disk space, on average.

In general, yes, keep your nodes under 50% disk usage at all
times.  Any of: compaction, cleanup, snapshotting, repair, or
bootstrapping (the latter two are improved in 0.7) can double
your disk usage temporarily.

You should plan to add more disk space or add nodes when you get
close to this limit.  Once you go over 50%, it's more difficult
to add nodes, at least in 0.6.

- Tyler

On Thu, Dec 9, 2010 at 11:19 AM, Mark static.void@gmail.com
mailto:static.void@gmail.com wrote:

I recently ran into a problem during a repair operation where
my nodes completely ran out of space and my whole cluster
was... well, clusterfucked.

I want to make sure how to prevent this problem in the future.

Should I make sure that at all times every node is under 50%
of its disk space? Are there any normal day-to-day operations
that would cause the any one node to double in size that I
should be aware of? If on or more nodes to surpass the 50%
mark, what should I plan to do?

Thanks for any advice

Re: Cassandra and disk space