[jira] Issue Comment Edited: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)

Aaron Morton (JIRA) Tue, 01 Feb 2011 00:58:03 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989100#comment-12989100
 ]


Aaron Morton edited comment on CASSANDRA-2081 at 2/1/11 8:54 AM:
-----------------------------------------------------------------

Had a read through the code for think I have an idea what my problem was, not 
sure if it applies to the previous issue and not sure if its a real bug. 

o.a.c.service.ReadCallback.response() will only signal the 
o.a.c.utils.SimpleCondition if the data request has been received. If the 
signal is not set after rpc_timeout then ReadCallback.get() will raise a 
j.u.c.TimeoutException() which comes out of the StorageProxy and is caught in 
CassandraServer and turned into a o.a.c.thrift.TimedOutException. 

So if the node that is asked for the data fails to return, the entire request 
will timeout even if there are enough nodes to serve the request. I think I've 
seen this discussed before as by design, and the client should just retry in 
response to the timeout. Is that correct ? 

      was (Author: amorton):
    Had a read through the code for think I have an idea what my problem was, 
not sure if it applies to the previous issue and not sure if its a real bug. 

o.a.c.service.ReadCallback.response() will only signal the 
o.a.c.utils.SimpleCondition if the data request has been received. If the 
signal is not set after rpc_timeout then ReadCallback.get() will raise a 
j.u.c.TimeoutException() which comes out of the StorageProxy and it caught in 
CassandraServer and turned into a o.a.c.thrift.TimedOutException. 

So if the node that is asked for the data fails to return, the entire request 
will timeout even if there are enough nodes to serve the request. I think I've 
seen this discussed before as by design, and the client should just retry in 
response to the timeout. Is that correct ? 
  
> Consistency QUORUM does not work anymore (hector:Could not fullfill request 
> on this host)
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2081
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: linux, hector + cassandra
>            Reporter: Thibaut
>            Priority: Blocker
>             Fix For: 0.7.1
>
>
> I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25.
> Using consistency level Quorum won't work anymore (tested it on read). 
> Consisteny level ONE still works though
> I have tried this with one dead node in my cluster.
> If I restart cassandra with an older svn revision 
> (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with 
> consistency level QUORUM again, while still using 
> apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application.
> 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed 
> intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open 
> transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: 
> No route to host
> 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host 
> retry status false with host: intr1n18(192.168.0.18):9160
> 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill 
> request on this host CassandraClient<intr1n11:9160-483>
> intr1n11 is marked as up however and I can also access the node through the 
> cassandra cli.
> 192.168.0.1     Up     Normal  8.02 GB         5.00%   0cc
> 192.168.0.2     Up     Normal  7.96 GB         5.00%   199
> 192.168.0.3     Up     Normal  8.24 GB         5.00%   266
> 192.168.0.4     Up     Normal  4.94 GB         5.00%   333
> 192.168.0.5     Up     Normal  5.02 GB         5.00%   400
> 192.168.0.6     Up     Normal  5 GB            5.00%   4cc
> 192.168.0.7     Up     Normal  5.1 GB          5.00%   599
> 192.168.0.8     Up     Normal  5.07 GB         5.00%   666
> 192.168.0.9     Up     Normal  4.78 GB         5.00%   733
> 192.168.0.10    Up     Normal  4.34 GB         5.00%   7ff
> 192.168.0.11    Up     Normal  5.01 GB         5.00%   8cc
> 192.168.0.12    Up     Normal  5.31 GB         5.00%   999
> 192.168.0.13    Up     Normal  5.56 GB         5.00%   a66
> 192.168.0.14    Up     Normal  5.82 GB         5.00%   b33
> 192.168.0.15    Up     Normal  5.57 GB         5.00%   c00
> 192.168.0.16    Up     Normal  5.03 GB         5.00%   ccc
> 192.168.0.17    Up     Normal  4.77 GB         5.00%   d99
> 192.168.0.18    Down   Normal  ?               5.00%   e66
> 192.168.0.19    Up     Normal  4.78 GB         5.00%   f33
> 192.168.0.20    Up     Normal  4.83 GB         5.00%   ffffffffffffffff

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (CASSANDRA-2081) Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)

Reply via email to