Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-10 Thread Roland Gude
Not sure about that Hector Version, but there was a Hector Bug that Hector did 
Not stop using a Dead Node As Proxy and that it did not do proper Load 
balancing in the requests. If you enable trace Logs for Hector you can See 
which nodes it uses for requests. If there is a newer 0.6 Hector you should 
give it a try.
Furthermore i Suggest Brunhild down One Node and request data with the cli. If 
that Works it is probably the Hector bug.

Am 10.04.2011 um 06:57 schrieb Patricio Echagüe 
patric...@gmail.commailto:patric...@gmail.com:

What is the consistency level you are using ?

And as Ed said, if you can provide the stacktrace that would help too.

On Sat, Apr 9, 2011 at 7:02 PM, aaron morton 
mailto:aa...@thelastpickle.comaa...@thelastpickle.commailto:aa...@thelastpickle.com
 wrote:
btw, the nodes are a tad out of balance was that deliberate ?

http://wiki.apache.org/cassandra/Operations#Token_selectionhttp://wiki.apache.org/cassandra/Operations#Token_selection
http://wiki.apache.org/cassandra/Operations#Load_balancinghttp://wiki.apache.org/cassandra/Operations#Load_balancing


Aaron

On 10 Apr 2011, at 08:44, Ed Anuff wrote:

Sounds like the problem might be on the hector side.  Lots of hector
users on this list, but usually not a bad idea to ask on
mailto:hector-us...@googlegroups.comhector-us...@googlegroups.commailto:hector-us...@googlegroups.com
 (cc'd).

The jetty servers stopping responding is a bit vague, somewhere in
your logs is an error message that should shed some light on where
things are going awry.  If you can find the exception that's being
thrown in hector and post that, it'd make it much easier to help you
out.

Ed

On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian
mailto:vram.kouramaj...@gmail.comvram.kouramaj...@gmail.commailto:vram.kouramaj...@gmail.com
 wrote:
The hector clients are used as part of our jetty servers. And, the
jetty servers stop responding when one of the Cassandra nodes go down.

Vram

On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump 
mailto:j...@joestump.netj...@joestump.netmailto:j...@joestump.net wrote:
Did the Cassandra cluster go down or did you start getting failures from the 
client when it routed queries to the downed node? The key in the client is to 
keep working around the ring if the initial node is down.

--Joe

On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:

We have a 5 Cassandra nodes with the following configuration:

Casandra Version: 0.6.11
Number of Nodes: 5
Replication Factor: 3
Client: Hector 0.6.0-14
Write Consistency Level: Quorum
Read Consistency Level: Quorum
Ring Topology:
  OwnsRange  Ring

132756707369141912386052673276321963528
192.168.89.153Up 4.15 GB   33.87%
20237398133070283622632741498697119875 |--|
192.168.89.155Up 5.17 GB   18.29%
51358066040236348437506517944084891398 |   ^
192.168.89.154Up 7.41 GB   33.97%
109158969152851862753910401160326064203v   |
192.168.89.152Up 5.07 GB   6.34%
119944993359936402983569623214763193674|   ^
192.168.89.151Up 4.22 GB   7.53%
132756707369141912386052673276321963528|--|

We believe that our setup should survive the crash of one of the
Cassandra nodes. But, we had few crashes and the system stopped
functioning until we brought back the Cassandra nodes.

Any clues?

Vram







Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Vram Kouramajian
We have a 5 Cassandra nodes with the following configuration:

Casandra Version: 0.6.11
Number of Nodes: 5
Replication Factor: 3
Client: Hector 0.6.0-14
Write Consistency Level: Quorum
Read Consistency Level: Quorum
Ring Topology:
   OwnsRange  Ring

132756707369141912386052673276321963528
192.168.89.153Up 4.15 GB   33.87%
20237398133070283622632741498697119875 |--|
192.168.89.155Up 5.17 GB   18.29%
51358066040236348437506517944084891398 |   ^
192.168.89.154Up 7.41 GB   33.97%
109158969152851862753910401160326064203v   |
192.168.89.152Up 5.07 GB   6.34%
119944993359936402983569623214763193674|   ^
192.168.89.151Up 4.22 GB   7.53%
132756707369141912386052673276321963528|--|

We believe that our setup should survive the crash of one of the
Cassandra nodes. But, we had few crashes and the system stopped
functioning until we brought back the Cassandra nodes.

Any clues?

Vram


Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Joe Stump
Did the Cassandra cluster go down or did you start getting failures from the 
client when it routed queries to the downed node? The key in the client is to 
keep working around the ring if the initial node is down.

--Joe

On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:

 We have a 5 Cassandra nodes with the following configuration:
 
 Casandra Version: 0.6.11
 Number of Nodes: 5
 Replication Factor: 3
 Client: Hector 0.6.0-14
 Write Consistency Level: Quorum
 Read Consistency Level: Quorum
 Ring Topology:
   OwnsRange  Ring
 
 132756707369141912386052673276321963528
 192.168.89.153Up 4.15 GB   33.87%
 20237398133070283622632741498697119875 |--|
 192.168.89.155Up 5.17 GB   18.29%
 51358066040236348437506517944084891398 |   ^
 192.168.89.154Up 7.41 GB   33.97%
 109158969152851862753910401160326064203v   |
 192.168.89.152Up 5.07 GB   6.34%
 119944993359936402983569623214763193674|   ^
 192.168.89.151Up 4.22 GB   7.53%
 132756707369141912386052673276321963528|--|
 
 We believe that our setup should survive the crash of one of the
 Cassandra nodes. But, we had few crashes and the system stopped
 functioning until we brought back the Cassandra nodes.
 
 Any clues?
 
 Vram



Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Vram Kouramajian
The hector clients are used as part of our jetty servers. And, the
jetty servers stop responding when one of the Cassandra nodes go down.

Vram

On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump j...@joestump.net wrote:
 Did the Cassandra cluster go down or did you start getting failures from the 
 client when it routed queries to the downed node? The key in the client is to 
 keep working around the ring if the initial node is down.

 --Joe

 On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:

 We have a 5 Cassandra nodes with the following configuration:

 Casandra Version: 0.6.11
 Number of Nodes: 5
 Replication Factor: 3
 Client: Hector 0.6.0-14
 Write Consistency Level: Quorum
 Read Consistency Level: Quorum
 Ring Topology:
   Owns    Range                                      Ring

 132756707369141912386052673276321963528
 192.168.89.153Up         4.15 GB       33.87%
 20237398133070283622632741498697119875     |--|
 192.168.89.155Up         5.17 GB       18.29%
 51358066040236348437506517944084891398     |   ^
 192.168.89.154Up         7.41 GB       33.97%
 109158969152851862753910401160326064203    v   |
 192.168.89.152Up         5.07 GB       6.34%
 119944993359936402983569623214763193674    |   ^
 192.168.89.151Up         4.22 GB       7.53%
 132756707369141912386052673276321963528    |--|

 We believe that our setup should survive the crash of one of the
 Cassandra nodes. But, we had few crashes and the system stopped
 functioning until we brought back the Cassandra nodes.

 Any clues?

 Vram




Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Ed Anuff
Sounds like the problem might be on the hector side.  Lots of hector
users on this list, but usually not a bad idea to ask on
hector-us...@googlegroups.com (cc'd).

The jetty servers stopping responding is a bit vague, somewhere in
your logs is an error message that should shed some light on where
things are going awry.  If you can find the exception that's being
thrown in hector and post that, it'd make it much easier to help you
out.

Ed

On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian
vram.kouramaj...@gmail.com wrote:
 The hector clients are used as part of our jetty servers. And, the
 jetty servers stop responding when one of the Cassandra nodes go down.

 Vram

 On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump j...@joestump.net wrote:
 Did the Cassandra cluster go down or did you start getting failures from the 
 client when it routed queries to the downed node? The key in the client is 
 to keep working around the ring if the initial node is down.

 --Joe

 On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:

 We have a 5 Cassandra nodes with the following configuration:

 Casandra Version: 0.6.11
 Number of Nodes: 5
 Replication Factor: 3
 Client: Hector 0.6.0-14
 Write Consistency Level: Quorum
 Read Consistency Level: Quorum
 Ring Topology:
   Owns    Range                                      Ring

 132756707369141912386052673276321963528
 192.168.89.153Up         4.15 GB       33.87%
 20237398133070283622632741498697119875     |--|
 192.168.89.155Up         5.17 GB       18.29%
 51358066040236348437506517944084891398     |   ^
 192.168.89.154Up         7.41 GB       33.97%
 109158969152851862753910401160326064203    v   |
 192.168.89.152Up         5.07 GB       6.34%
 119944993359936402983569623214763193674    |   ^
 192.168.89.151Up         4.22 GB       7.53%
 132756707369141912386052673276321963528    |--|

 We believe that our setup should survive the crash of one of the
 Cassandra nodes. But, we had few crashes and the system stopped
 functioning until we brought back the Cassandra nodes.

 Any clues?

 Vram





Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread aaron morton
btw, the nodes are a tad out of balance was that deliberate ? 

http://wiki.apache.org/cassandra/Operations#Token_selection
http://wiki.apache.org/cassandra/Operations#Load_balancing


Aaron

On 10 Apr 2011, at 08:44, Ed Anuff wrote:

 Sounds like the problem might be on the hector side.  Lots of hector
 users on this list, but usually not a bad idea to ask on
 hector-us...@googlegroups.com (cc'd).
 
 The jetty servers stopping responding is a bit vague, somewhere in
 your logs is an error message that should shed some light on where
 things are going awry.  If you can find the exception that's being
 thrown in hector and post that, it'd make it much easier to help you
 out.
 
 Ed
 
 On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian
 vram.kouramaj...@gmail.com wrote:
 The hector clients are used as part of our jetty servers. And, the
 jetty servers stop responding when one of the Cassandra nodes go down.
 
 Vram
 
 On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump j...@joestump.net wrote:
 Did the Cassandra cluster go down or did you start getting failures from 
 the client when it routed queries to the downed node? The key in the client 
 is to keep working around the ring if the initial node is down.
 
 --Joe
 
 On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:
 
 We have a 5 Cassandra nodes with the following configuration:
 
 Casandra Version: 0.6.11
 Number of Nodes: 5
 Replication Factor: 3
 Client: Hector 0.6.0-14
 Write Consistency Level: Quorum
 Read Consistency Level: Quorum
 Ring Topology:
   OwnsRange  Ring
 
 132756707369141912386052673276321963528
 192.168.89.153Up 4.15 GB   33.87%
 20237398133070283622632741498697119875 |--|
 192.168.89.155Up 5.17 GB   18.29%
 51358066040236348437506517944084891398 |   ^
 192.168.89.154Up 7.41 GB   33.97%
 109158969152851862753910401160326064203v   |
 192.168.89.152Up 5.07 GB   6.34%
 119944993359936402983569623214763193674|   ^
 192.168.89.151Up 4.22 GB   7.53%
 132756707369141912386052673276321963528|--|
 
 We believe that our setup should survive the crash of one of the
 Cassandra nodes. But, we had few crashes and the system stopped
 functioning until we brought back the Cassandra nodes.
 
 Any clues?
 
 Vram
 
 
 



Re: Site Not Surviving a Single Cassandra Node Crash

2011-04-09 Thread Patricio Echagüe
What is the consistency level you are using ?

And as Ed said, if you can provide the stacktrace that would help too.

On Sat, Apr 9, 2011 at 7:02 PM, aaron morton aa...@thelastpickle.comwrote:

 btw, the nodes are a tad out of balance was that deliberate ?

 http://wiki.apache.org/cassandra/Operations#Token_selection
 http://wiki.apache.org/cassandra/Operations#Load_balancing


 Aaron

 On 10 Apr 2011, at 08:44, Ed Anuff wrote:

 Sounds like the problem might be on the hector side.  Lots of hector
 users on this list, but usually not a bad idea to ask on
 hector-us...@googlegroups.com (cc'd).

 The jetty servers stopping responding is a bit vague, somewhere in
 your logs is an error message that should shed some light on where
 things are going awry.  If you can find the exception that's being
 thrown in hector and post that, it'd make it much easier to help you
 out.

 Ed

 On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian
 vram.kouramaj...@gmail.com wrote:

 The hector clients are used as part of our jetty servers. And, the

 jetty servers stop responding when one of the Cassandra nodes go down.


 Vram


 On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump j...@joestump.net wrote:

 Did the Cassandra cluster go down or did you start getting failures from
 the client when it routed queries to the downed node? The key in the client
 is to keep working around the ring if the initial node is down.


 --Joe


 On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote:


 We have a 5 Cassandra nodes with the following configuration:


 Casandra Version: 0.6.11

 Number of Nodes: 5

 Replication Factor: 3

 Client: Hector 0.6.0-14

 Write Consistency Level: Quorum

 Read Consistency Level: Quorum

 Ring Topology:

   OwnsRange  Ring


 132756707369141912386052673276321963528

 192.168.89.153Up 4.15 GB   33.87%

 20237398133070283622632741498697119875 |--|

 192.168.89.155Up 5.17 GB   18.29%

 51358066040236348437506517944084891398 |   ^

 192.168.89.154Up 7.41 GB   33.97%

 109158969152851862753910401160326064203v   |

 192.168.89.152Up 5.07 GB   6.34%

 119944993359936402983569623214763193674|   ^

 192.168.89.151Up 4.22 GB   7.53%

 132756707369141912386052673276321963528|--|


 We believe that our setup should survive the crash of one of the

 Cassandra nodes. But, we had few crashes and the system stopped

 functioning until we brought back the Cassandra nodes.


 Any clues?


 Vram