I'm seeing an issue similar to:

http://issues.apache.org/jira/browse/CASSANDRA-169

Here is when I see it.  I'm running Cassandra on 5 nodes using the
OrderPreservingPartitioner, and have populated Cassandra with 78
records, and I can use get_key_range via Thrift just fine.  Then, if I
manually kill one of the nodes (if I kill off node #5), the node (node
#1) which I've been using to call get_key_range will timeout and the
error:

 Thrift: Internal error processing get_key_range

And the Cassandra output shows the same trace as in 169:

ERROR - Encountered IOException on connection:
java.nio.channels.SocketChannel[closed]
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at 
org.apache.cassandra.net.TcpConnection.connect(TcpConnection.java:349)
        at 
org.apache.cassandra.net.SelectorManager.doProcess(SelectorManager.java:131)
        at org.apache.cassandra.net.SelectorManager.run(SelectorManager.java:98)
WARN - Closing down connection java.nio.channels.SocketChannel[closed]
ERROR - Internal error processing get_key_range
java.lang.RuntimeException: java.util.concurrent.TimeoutException:
Operation timed out.
        at 
org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:573)
        at 
org.apache.cassandra.service.CassandraServer.get_key_range(CassandraServer.java:595)
        at 
org.apache.cassandra.service.Cassandra$Processor$get_key_range.process(Cassandra.java:853)
        at 
org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:606)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:675)
Caused by: java.util.concurrent.TimeoutException: Operation timed out.
        at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
        at 
org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:569)
        ... 7 more



If it was giving an error just one time, I could just rely on catching
the error and trying again.  But a get_key_range call to that node I
was already making get_key_range queries against (node #1) never works
again (it is still up and it responds fine to multiget Thrift calls),
sometimes not even after I restart the down node (node #5).  I end up
having to restart node #1 in addition to node #5.  The behavior for
the other 3 nodes varies - some of them  are also unable to respond to
get_key_range calls, but some of them do respond to get_key_range
calls.

My question is, what path should I go down in terms of reproducing
this problem?  I'm using Aug 27 trunk code - should I update my
Cassandra install prior to gathering more information for this issue,
and if so, which version (0.4 or trunk).  If there is anyone who is
familiar with this issue, could you let me know what I might be doing
wrong, or what the next info-gathering step should be for me?

Thank you,

Simon Smith
Arcode Corporation

Reply via email to