[jira] [Commented] (CASSANDRA-2514) batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live nodes

Jonathan Ellis (JIRA) Wed, 20 Apr 2011 10:33:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022248#comment-13022248
 ]


Jonathan Ellis commented on CASSANDRA-2514:
-------------------------------------------

That's the point, hintedEndpoints is *usually* but not always a subset of 
writeEndpoints. Here is the code from getHintedEndpoints:

{code}
        // assign dead endpoints to be hinted to the closest live one, or to 
the local node
        // (since it is trivially the closest) if none are alive.  This way, 
the cost of doing
        // a hint is only adding the hint header, rather than doing a full 
extra write, if any
        // destination nodes are alive.
        //
        // we do a 2nd pass on targets instead of using temporary storage,
        // to optimize for the common case (everything was alive).
        InetAddress localAddress = FBUtilities.getLocalAddress();
        for (InetAddress ep : targets)
        {
            if (map.containsKey(ep))
                continue;
            if (!StorageProxy.shouldHint(ep))
            {
                if (logger.isDebugEnabled())
                    logger.debug("not hinting " + ep + " which has been down " 
+ Gossiper.instance.getEndpointDowntime(ep) + "ms");
                continue;
            }

            InetAddress destination = map.isEmpty()
                                    ? localAddress
                                    : 
snitch.getSortedListByProximity(localAddress, map.keySet()).get(0);
            map.put(destination, ep);
        }
{code}

> batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when 
> there aren't sufficient live nodes
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2514
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2514
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: 1. Cassandra 0.7.4 running on RHEL 5.5
> 2. 2 DC setup
> 3. RF = 4 (DC1 = 2, DC2 = 2)
> 4. CL = LOCAL_QUORUM
>            Reporter: Narendra Sharma
>            Priority: Minor
>             Fix For: 0.7.5
>
>         Attachments: 2514-v2.txt, CASSANDRA-2514.patch
>
>
> We have a 2 DC setup with RF = 4. There are 2 nodes in each DC. Following is 
> the keyspace definition:
> <snip>
> keyspaces:
>     - name: KeyspaceMetadata
>       replica_placement_strategy: 
> org.apache.cassandra.locator.NetworkTopologyStrategy
>       strategy_options:
>         DC1 : 2
>         DC2 : 2
>       replication_factor: 4
> </snip>
> I shutdown all except one node and waited for the live node to recognize that 
> other nodes are dead. Following is the nodetool ring output on the live node:
> Address         Status State   Load            Owns    Token                  
>                      
>                                                        
> 169579575332184635438912517119426957796     
> 10.17.221.19    Down   Normal  ?               29.20%  
> 49117425183422571410176530597442406739      
> 10.17.221.17    Up     Normal  81.64 KB        4.41%   
> 56615248844645582918169246064691229930      
> 10.16.80.54     Down   Normal  ?               21.13%  
> 92563519227261352488017033924602789201      
> 10.17.221.18    Down   Normal  ?               45.27%  
> 169579575332184635438912517119426957796     
> I expect UnavailableException when I send batch_mutate request to node that 
> is up. However, it returned TimeOutException:
> TimedOutException()
>     at 
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16493)
>     at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
>     at 
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
> Following is the cassandra-topology.properties
> # Cassandra Node IP=Data Center:Rack
> 10.17.221.17=DC1:RAC1
> 10.17.221.19=DC1:RAC2
> 10.17.221.18=DC2:RAC1
> 10.16.80.54=DC2:RAC2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2514) batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live nodes

Reply via email to