[jira] Issue Comment Edited: (CASSANDRA-1831) NetworkTopologyStrategy allows mismatched RF resulting in obscure failures

Mck SembWever (JIRA) Sat, 22 Jan 2011 08:51:08 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985152#action_12985152
 ]


Mck SembWever edited comment on CASSANDRA-1831 at 1/22/11 11:49 AM:
--------------------------------------------------------------------

I hit this issue. It wasn't immediately obvious to me what was required 
configuration to get NetworkTopologyStrategy working. The best docs i can find 
is in http://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml
A wiki page on NetworkTopologyStrategy would help a lot. (Or someone just 
telling me to try first OldNetworkTopologyStrategy, why wasn't it called 
instead SimpleNetworkTopologyStrategy?)

(http://wiki.apache.org/cassandra/Operations#Network_topology still refers to 
the old RackAwareStrategy)

      was (Author: michaelsembwever):
    I hit this issue. It wasn't immediately obvious to me what was required 
configuration to get NetworkTopologyStrategy working. The best docs i can find 
is in http://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml
A wiki page on NetworkTopologyStrategy would help a lot.

(http://wiki.apache.org/cassandra/Operations#Network_topology still refers to 
the old RackAwareStrategy)
  
> NetworkTopologyStrategy allows mismatched RF resulting in obscure failures
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1831
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1831
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0 rc 1
>            Reporter: Peter Schuller
>
> On today's 0.7 branch:
> Creating a keyspace like this (not how to do it in production, but that's not 
> the point):
>    create keyspace MyKeySpace with replication_factor = 2 and 
> placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy';
> This is accepted by Cassandra in spite of there being no strategy options. 
> Describing the keyspace will then give output similar to:
> Keyspace: MyKeySpace:
>  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
> null
> Attempts to write and read respectively gives the errors included at the 
> bottom of this comment.
> What happens is that the NTS's getReplicationFactor() returns the sum of RF 
> for each DC. But lacking any replicate placement options for DC:s, the sum 
> will always be 0. The result is that NTS.calculateNaturalEndpoints() yields 0 
> endpoints thus triggering the assertion failures apparent in the strack 
> traces.
> This was caused by misconfiguration during testing but should be handled 
> better. What are people's thoughts on the set of changes that would 
> constitute a proper fix?
> Is there a reason for NTS to ever conclude that RF is different than that of 
> the CF def? If not, I would say that one fix is to make the NTS bail early if 
> the calculated RF adding up the DC placements does not match the configured 
> RF for the column family. (I'll submit a patch if people agree.)
> Beyond that, what else, if anything should be done? Should the creation fail 
> due to the RF being inconsistent with strategy options? Is it correct that 
> code assumes that naturalEndPoints will never return fewer nodes than RF? It 
> seems natural to me that the natural endpoint count should always match RF, 
> unless the total number of nodes in the cluster is lacking. But this gets 
> complicated with NTS since the requirement is suddenly that you have enough 
> in each DC. This probably relates to previous discussions on whether or not 
> to allow an RF which is higher than the number of nodes in a cluster.
> In this case, we failed hard because we got exactly 0 endpoints and triggered 
> assertions. In other cases we might have gotten say 1, in which case we may 
> have successfully been able to read and write as if we had a lower RF even 
> though the column family RF was set to 2. This seems dangerous.
> ERROR [pool-1-thread-2] 2010-12-07 11:18:40,638 Cassandra.java (line
> 3044) Internal error processing batch_mutate
> java.lang.AssertionError: invalid response count 1 for replication factor 0
>        at 
> org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:98)
>        at 
> org.apache.cassandra.service.WriteResponseHandler.<init>(WriteResponseHandler.java:48)
>        at 
> org.apache.cassandra.service.WriteResponseHandler.create(WriteResponseHandler.java:61)
>        at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:125)
>        at 
> org.apache.cassandra.locator.NetworkTopologyStrategy.getWriteResponseHandler(NetworkTopologyStrategy.java:166)
>        at 
> org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:114)
>        at 
> org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:446)
>        at 
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:419)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3036)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>        at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [pool-1-thread-3] 2010-12-07 11:18:50,474 Cassandra.java (line
> 2876) Internal error processing get_range_slices
> java.lang.AssertionError
>        at 
> org.apache.cassandra.service.RangeSliceResponseResolver.<init>(RangeSliceResponseResolver.java:53)
>        at 
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:450)
>        at 
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:507)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868)
>        at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>        at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
>  INFO [MigrationStage:1] 2010-12-07 11:24:09,220

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-1831) NetworkTopologyStrategy allows mismatched RF resulting in obscure failures

Reply via email to