[ 
https://issues.apache.org/jira/browse/CASSANDRA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13823735#comment-13823735
 ] 

Sylvain Lebresne commented on CASSANDRA-6352:
---------------------------------------------

You're almost surely running into CASSANDRA-6299. It will be fixed in 2.0.3 
(and is currently fixed on the cassandra-2.0 branch).

> Cluster does not repond to new SELECT query after a timeout
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-6352
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6352
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Windows7, C* v2.0.xx, 4-node cluster, JVM 1.7.0_45-b18 
> Xmx16GB, Datastax Java Driver 1.0.4 and 2.0.0-beta2
>            Reporter: Ngoc Minh Vo
>         Attachments: ErrorStack.txt
>
>
> Hello,
> We encounter the following issue three times. Here are the descriptions of 
> the issue:
> - data are imported via Datastax Java driver (DJD) v2.0.0-b2 with 
> BatchStatement (i.e.: batch of PreparedStatement). The performance is quite 
> impressive.
> - if we query the cluster via cqlsh (C* 2.0.x) and DJD v1.0.4, everything 
> goes well.
> - but when we use DJD v2.0.0-b2, we got an exception:
> {quote}
> com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
> during read query at consistency ONE (1 responses were required but only 0 
> replica responded)
> {quote}
> - afterward, no Select query works anymore:
> -- all query via cqlsh failed with rpc_timeout
> -- all query via DJD v1.0.4 failed with the same exception as the v2.0.0-b2
> -- these queries have worked perfectly before the first select with DJD v2.0.0
> - nodetool status shows all nodes still Up and Normal
> - nodetool flush still works on all nodes
> Only a reboot of all nodes could solve the issue.
> Unfortunately, we don't have any exploitable informations in log files:
> Node1: the handshaking at 11:28:48 is strange because we didn't reboot any 
> node
> {quote}
>  INFO [MemoryMeter:1] 2013-11-15 11:27:11,724 Memtable.java (line 444) 
> CFS(Keyspace='hector', ColumnFamily='pdl_caching') liveRatio is 
> 5.06951175012658 (just-counted was 4.902669365509605).  calculation took 
> 140ms for 57108 columns
>  INFO [HANDSHAKE-/10.30.226.166] 2013-11-15 11:28:48,550 
> OutboundTcpConnection.java (line 386) Handshaking version with /10.30.226.166
>  INFO [RMI TCP Connection(4)-10.30.224.229] 2013-11-15 11:32:29,256 
> ColumnFamilyStore.java (line 734) Enqueuing flush of 
> Memtable-sstable_activity@2142066849(0/0 serialized/live bytes, 24 ops)
>  INFO [FlushWriter:76] 2013-11-15 11:32:29,257 Memtable.java (line 328) 
> Writing Memtable-sstable_activity@2142066849(0/0 serialized/live bytes, 24 
> ops)
> {quote}
> Node2: there is a hinted-handoff at 11:30:02...
> {quote}
>  INFO [MemoryMeter:1] 2013-11-15 11:25:32,897 Memtable.java (line 444) 
> CFS(Keyspace='hector', ColumnFamily='pdl_identity') liveRatio is 
> 6.046071792095967 (just-counted was 5.493829833297251).  calculation took 3ms 
> for 608 columns
>  INFO [HintedHandoff:1] 2013-11-15 11:30:02,656 HintedHandOffManager.java 
> (line 322) Started hinted handoff for host: 
> 2ce9f0a8-795c-4733-9d52-06057fcc690d with IP: /10.30.227.8
>  INFO [HintedHandoff:1] 2013-11-15 11:30:12,663 HintedHandOffManager.java 
> (line 449) Timed out replaying hints to /10.30.227.8; aborting (0 delivered)
>  INFO [RMI TCP Connection(6)-10.30.224.229] 2013-11-15 11:35:20,096 
> ColumnFamilyStore.java (line 734) Enqueuing flush of 
> Memtable-hints@581765413(1028/10280 serialized/live bytes, 2 ops)
> {quote}
> It seems that the first Select query with DJD v2.0.0-b2 let the cluster in a 
> "pending"/"anormal" state and it no longer responds to future queries.
> I know that without logs it will be hard to reproduce.
> Thanks and regards,
> Minh



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to