Re: Cassandra stucks

2012-05-14 Thread aaron morton
We've not had any reported issues with connection handing, I would look for 
other possible reasons first. Out of interest though what OS are you using? and 
what is the exact JVM version ? 

The javax.naming.CommunicationException 
(http://docs.oracle.com/javase/6/docs/api/javax/naming/CommunicationException.html)
 is raised when code cannot talk to the DNS server. Where are you running 
nodetool from ?

Do you have any switches or firewalls between the servers ? Could they be 
closing connections ?

Can you match the logs between machines. e.g. what was 172.15.2.163 doing when 
the node below thought it was dead. 

What cassandra configuration changes have you made to the nodes and is there 
anything interesting with your networking setup ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/05/2012, at 4:11 AM, Pavel Polushkin wrote:

> Hello,
> Actually there is no problems with JMX, it works fine when node are in UP 
> state. But after a while cluster goes to inadequate state. For now it seems 
> that it’s a bug of connection handling in Cassandra.
> Pavel.
>  
> From: Madalina Matei [mailto:madalinaima...@gmail.com] 
> Sent: Friday, May 11, 2012 20:03
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> Check your JMX port in cassandra-env.sh and see if that's open. 
>  
> Also if you have enabled 
>  
>  JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
>  
> and you are using an ip address for -Djava.rmi.server.hostname make sure that 
> is the correct ip.
>  
>  
> On 11 May 2012, at 16:42, Pavel Polushkin wrote:
> 
> 
> No We are using dedicated phisical hardware. Currently we have 5 nodes.
>  
> From: Madalina Matei [mailto:madalinaima...@gmail.com] 
> Sent: Friday, May 11, 2012 19:40
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> Are you using EC2 ?
>  
> On 11 May 2012, at 16:13, Pavel Polushkin wrote:
> 
> 
> 
> We use 1.0.8 version.
>  
> From: David Leimbach [mailto:leim...@gmail.com] 
> Sent: Friday, May 11, 2012 18:48
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> What's the version number of Cassandra?
> 
> On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin  
> wrote:
> Hello,
> 
>  
> 
> We faced with a strange problem while testing performance on Cassandra 
> cluster. After some time all nodes went to down state for several days. Now 
> all nodes went back to up state and only one node still down.
> 
>  
> 
> Nodetool on down node throws exception:
> 
> Error connection to remote JMX agent!
> 
> java.io.IOException: Failed to retrieve RMIServer stub: 
> javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)
> 
> at 
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
> at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)
> 
> at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
> 
> Caused by: javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
> at 
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
> at javax.naming.InitialContext.lookup(InitialContext.java:392)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)
> 
> ... 4 more
> 
> Caused by: java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out
> 
> at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
> 
> at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
> 
> at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
> 
> at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
> 
> at 

RE: Cassandra stucks

2012-05-11 Thread Pavel Polushkin
Hello, 

Actually there is no problems with JMX, it works fine when node are in
UP state. But after a while cluster goes to inadequate state. For now it
seems that it's a bug of connection handling in Cassandra. 

Pavel.

 

From: Madalina Matei [mailto:madalinaima...@gmail.com] 
Sent: Friday, May 11, 2012 20:03
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

Check your JMX port in cassandra-env.sh and see if that's open. 

 

Also if you have enabled 

 

 JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="

 

and you are using an ip address for -Djava.rmi.server.hostname make sure
that is the correct ip.

 

 

On 11 May 2012, at 16:42, Pavel Polushkin wrote:





No We are using dedicated phisical hardware. Currently we have 5 nodes.

 

From: Madalina Matei [mailto:madalinaima...@gmail.com] 
Sent: Friday, May 11, 2012 19:40
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

Are you using EC2 ?

 

On 11 May 2012, at 16:13, Pavel Polushkin wrote:






We use 1.0.8 version.

 

From: David Leimbach [mailto:leim...@gmail.com] 
Sent: Friday, May 11, 2012 18:48
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

What's the version number of Cassandra?

On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin 
wrote:

Hello,

 

We faced with a strange problem while testing performance on Cassandra
cluster. After some time all nodes went to down state for several days.
Now all nodes went back to up state and only one node still down.

 

Nodetool on down node throws exception:

Error connection to remote JMX agent!

java.io.IOException: Failed to retrieve RMIServer stub:
javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)

at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.
java:248)

at
org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)

at
org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)

at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)

Caused by: javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:10
1)

at
com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java
:185)

at javax.naming.InitialContext.lookup(InitialContext.java:392)

at
javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.
java:1888)

at
javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java
:1858)

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)

... 4 more

Caused by: java.rmi.ConnectIOException: error during JRMP connection
establishment; nested exception is:

java.net.SocketTimeoutException: Read timed out

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)

at
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)

at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)

at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97
)

... 9 more

Caused by: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)

at java.io.DataInputStream.readByte(DataInputStream.java:248)

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)

... 13 more

 

In system log of down node unlimited list of such errors:

INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804)
InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now
UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now
UP INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.161 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.165 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.162 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)

Re: Cassandra stucks

2012-05-11 Thread Madalina Matei
Check your JMX port in cassandra-env.sh and see if that's open. 

Also if you have enabled 

 JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="

and you are using an ip address for -Djava.rmi.server.hostname make sure that 
is the correct ip.


On 11 May 2012, at 16:42, Pavel Polushkin wrote:

> No We are using dedicated phisical hardware. Currently we have 5 nodes.
>  
> From: Madalina Matei [mailto:madalinaima...@gmail.com] 
> Sent: Friday, May 11, 2012 19:40
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> Are you using EC2 ?
>  
> On 11 May 2012, at 16:13, Pavel Polushkin wrote:
> 
> 
> We use 1.0.8 version.
>  
> From: David Leimbach [mailto:leim...@gmail.com] 
> Sent: Friday, May 11, 2012 18:48
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> What's the version number of Cassandra?
> 
> On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin  
> wrote:
> Hello,
> 
>  
> 
> We faced with a strange problem while testing performance on Cassandra 
> cluster. After some time all nodes went to down state for several days. Now 
> all nodes went back to up state and only one node still down.
> 
>  
> 
> Nodetool on down node throws exception:
> 
> Error connection to remote JMX agent!
> 
> java.io.IOException: Failed to retrieve RMIServer stub: 
> javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)
> 
> at 
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
> at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)
> 
> at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
> 
> Caused by: javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
> at 
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
> at javax.naming.InitialContext.lookup(InitialContext.java:392)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)
> 
> ... 4 more
> 
> Caused by: java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out
> 
> at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
> 
> at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
> 
> at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
> 
> at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
> 
> at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)
> 
> ... 9 more
> 
> Caused by: java.net.SocketTimeoutException: Read timed out
> 
> at java.net.SocketInputStream.socketRead0(Native Method)
> 
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 
> at java.io.DataInputStream.readByte(DataInputStream.java:248)
> 
> at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
> 
> ... 13 more
> 
>  
> 
> In system log of down node unlimited list of such errors:
> 
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804) 
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP 
> INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804) 
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.161 is now dead.
> 
> INFO [GossipTasks:1] 2

RE: Cassandra stucks

2012-05-11 Thread Pavel Polushkin
No We are using dedicated phisical hardware. Currently we have 5 nodes.

 

From: Madalina Matei [mailto:madalinaima...@gmail.com] 
Sent: Friday, May 11, 2012 19:40
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

Are you using EC2 ?

 

On 11 May 2012, at 16:13, Pavel Polushkin wrote:





We use 1.0.8 version.

 

From: David Leimbach [mailto:leim...@gmail.com] 
Sent: Friday, May 11, 2012 18:48
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

What's the version number of Cassandra?

On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin 
wrote:

Hello,

 

We faced with a strange problem while testing performance on Cassandra
cluster. After some time all nodes went to down state for several days.
Now all nodes went back to up state and only one node still down.

 

Nodetool on down node throws exception:

Error connection to remote JMX agent!

java.io.IOException: Failed to retrieve RMIServer stub:
javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)

at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.
java:248)

at
org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)

at
org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)

at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)

Caused by: javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:10
1)

at
com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java
:185)

at javax.naming.InitialContext.lookup(InitialContext.java:392)

at
javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.
java:1888)

at
javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java
:1858)

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)

... 4 more

Caused by: java.rmi.ConnectIOException: error during JRMP connection
establishment; nested exception is:

java.net.SocketTimeoutException: Read timed out

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)

at
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)

at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)

at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97
)

... 9 more

Caused by: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)

at java.io.DataInputStream.readByte(DataInputStream.java:248)

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)

... 13 more

 

In system log of down node unlimited list of such errors:

INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804)
InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now
UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now
UP INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.161 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.165 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.162 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.163 is now dead.

INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804)
InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now
UP INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.165 is now
UP

 

The suspicious fact is that on this node we have several tcp connections
to other nodes 7000 port in CLOSE_WAIT state:

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local Address   Foreign Address
State

Re: Cassandra stucks

2012-05-11 Thread Madalina Matei
Are you using EC2 ?

On 11 May 2012, at 16:13, Pavel Polushkin wrote:

> We use 1.0.8 version.
>  
> From: David Leimbach [mailto:leim...@gmail.com] 
> Sent: Friday, May 11, 2012 18:48
> To: user@cassandra.apache.org
> Subject: Re: Cassandra stucks
>  
> What's the version number of Cassandra?
> 
> On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin  
> wrote:
> Hello,
> 
>  
> 
> We faced with a strange problem while testing performance on Cassandra 
> cluster. After some time all nodes went to down state for several days. Now 
> all nodes went back to up state and only one node still down.
> 
>  
> 
> Nodetool on down node throws exception:
> 
> Error connection to remote JMX agent!
> 
> java.io.IOException: Failed to retrieve RMIServer stub: 
> javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)
> 
> at 
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
> at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)
> 
> at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
> 
> Caused by: javax.naming.CommunicationException [Root exception is 
> java.rmi.ConnectIOException: error during JRMP connection establishment; 
> nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out]
> 
> at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
> at 
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
> at javax.naming.InitialContext.lookup(InitialContext.java:392)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
> at 
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
> at 
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)
> 
> ... 4 more
> 
> Caused by: java.rmi.ConnectIOException: error during JRMP connection 
> establishment; nested exception is:
> 
> java.net.SocketTimeoutException: Read timed out
> 
> at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
> 
> at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
> 
> at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
> 
> at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
> 
> at 
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)
> 
> ... 9 more
> 
> Caused by: java.net.SocketTimeoutException: Read timed out
> 
> at java.net.SocketInputStream.socketRead0(Native Method)
> 
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 
> at java.io.DataInputStream.readByte(DataInputStream.java:248)
> 
> at 
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
> 
> ... 13 more
> 
>  
> 
> In system log of down node unlimited list of such errors:
> 
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804) 
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP 
> INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804) 
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.161 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.165 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.162 is now dead.
> 
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818) 
> InetAddress /172.15.2.163 is now dead.
> 
> INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804) 
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10 
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP 
> INFO [Gossi

RE: Cassandra stucks

2012-05-11 Thread Pavel Polushkin
We use 1.0.8 version.

 

From: David Leimbach [mailto:leim...@gmail.com] 
Sent: Friday, May 11, 2012 18:48
To: user@cassandra.apache.org
Subject: Re: Cassandra stucks

 

What's the version number of Cassandra?

On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin 
wrote:

Hello,

 

We faced with a strange problem while testing performance on Cassandra
cluster. After some time all nodes went to down state for several days.
Now all nodes went back to up state and only one node still down.

 

Nodetool on down node throws exception:

Error connection to remote JMX agent!

java.io.IOException: Failed to retrieve RMIServer stub:
javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)

at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.
java:248)

at
org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)

at
org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)

at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)

Caused by: javax.naming.CommunicationException [Root exception is
java.rmi.ConnectIOException: error during JRMP connection establishment;
nested exception is:

java.net.SocketTimeoutException: Read timed out]

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:10
1)

at
com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java
:185)

at javax.naming.InitialContext.lookup(InitialContext.java:392)

at
javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.
java:1888)

at
javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java
:1858)

at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)

... 4 more

Caused by: java.rmi.ConnectIOException: error during JRMP connection
establishment; nested exception is:

java.net.SocketTimeoutException: Read timed out

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)

at
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)

at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)

at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)

at
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97
)

... 9 more

Caused by: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)

at java.io.DataInputStream.readByte(DataInputStream.java:248)

at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)

... 13 more

 

In system log of down node unlimited list of such errors:

INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804)
InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now
UP INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now
UP INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.161 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.165 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.162 is now dead.

INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
InetAddress /172.15.2.163 is now dead.

INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804)
InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now
UP INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804)
InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.165 is now
UP

 

The suspicious fact is that on this node we have several tcp connections
to other nodes 7000 port in CLOSE_WAIT state:

Active Internet connections (servers and established)

Proto Recv-Q Send-Q Local Address   Foreign Address
State

tcp   869073  0 rcwocas:afs3-fileserver rcwocas03.enkata.:34274
CLOSE_WAIT

tcp   463429  0 rcwocas:afs3-fileserver rcwocas02.enkata.:39654
CLOSE_WAIT

tcp   873838  0 rcwocas:afs3-fileserver rcwocas01.enkata.:49486
CLOSE_WAIT

tcp   860245  0 rcwocas:afs3-fileserver rcwocas05.enkata.:

Re: Cassandra stucks

2012-05-11 Thread David Leimbach
What's the version number of Cassandra?

On Fri, May 11, 2012 at 7:38 AM, Pavel Polushkin wrote:

> Hello,
>
> ** **
>
> We faced with a strange problem while testing performance on Cassandra
> cluster. After some time all nodes went to down state for several days. Now
> all nodes went back to up state and only one node still down.
>
> ** **
>
> Nodetool on down node throws exception:
>
> Error connection to remote JMX agent!
>
> java.io.IOException: Failed to retrieve RMIServer stub:
> javax.naming.CommunicationException [Root exception is
> java.rmi.ConnectIOException: error during JRMP connection establishment;
> nested exception is:
>
> java.net.SocketTimeoutException: Read timed out]
>
> at
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:340)***
> *
>
> at
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> 
>
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
> 
>
> at org.apache.cassandra.tools.NodeProbe.(NodeProbe.java:114)
> 
>
> at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
>
> Caused by: javax.naming.CommunicationException [Root exception is
> java.rmi.ConnectIOException: error during JRMP connection establishment;
> nested exception is:
>
> java.net.SocketTimeoutException: Read timed out]
>
> at
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
> 
>
> at
> com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)
> 
>
> at javax.naming.InitialContext.lookup(InitialContext.java:392)
>
> at
> javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1888)
> 
>
> at
> javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1858)
> 
>
> at
> javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)***
> *
>
> ... 4 more
>
> Caused by: java.rmi.ConnectIOException: error during JRMP connection
> establishment; nested exception is:
>
> java.net.SocketTimeoutException: Read timed out
>
> at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
>
> at
> sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
>
> at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
>
> at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
>
> at
> com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)*
> ***
>
> ... 9 more
>
> Caused by: java.net.SocketTimeoutException: Read timed out
>
> at java.net.SocketInputStream.socketRead0(Native Method)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)*
> ***
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)*
> ***
>
> at java.io.DataInputStream.readByte(DataInputStream.java:248)
>
> at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
>
> ... 13 more
>
> ** **
>
> In system log of down node unlimited list of such errors:
>
> INFO [GossipStage:1] 2012-05-10 23:18:27,579 Gossiper.java (line 804)
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP
> INFO [GossipStage:1] 2012-05-10 23:18:27,580 Gossiper.java (line 804)
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:27,580 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.161 is now dead.
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.165 is now dead.
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.162 is now dead.
>
> INFO [GossipTasks:1] 2012-05-10 23:18:29,291 Gossiper.java (line 818)
> InetAddress /172.15.2.163 is now dead.
>
> INFO [GossipStage:1] 2012-05-10 23:18:29,291 Gossiper.java (line 804)
> InetAddress /172.15.2.161 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.162 is now UP
> INFO [GossipStage:1] 2012-05-10 23:18:29,292 Gossiper.java (line 804)
> InetAddress /172.15.2.163 is now UP INFO [GossipStage:1] 2012-05-10
> 23:18:29,292 Gossiper.java (line 804) InetAddress /172.15.2.165 is now UP*
> ***
>
> ** **
>
> The suspicious fact is that on this node we have several tcp connections
> to other nodes 7000 port in CLOSE_WAIT state:
>
> Active Internet connections (servers and established)
>
> Proto Recv-Q Send-Q Local Address   Foreign Address State*
> ***
>
> tcp   869073  0 rcwocas