[ https://issues.apache.org/jira/browse/CASSANDRA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985152#action_12985152 ]
Mck SembWever edited comment on CASSANDRA-1831 at 1/22/11 11:49 AM: -------------------------------------------------------------------- I hit this issue. It wasn't immediately obvious to me what was required configuration to get NetworkTopologyStrategy working. The best docs i can find is in http://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml A wiki page on NetworkTopologyStrategy would help a lot. (Or someone just telling me to try first OldNetworkTopologyStrategy, why wasn't it called instead SimpleNetworkTopologyStrategy?) (http://wiki.apache.org/cassandra/Operations#Network_topology still refers to the old RackAwareStrategy) was (Author: michaelsembwever): I hit this issue. It wasn't immediately obvious to me what was required configuration to get NetworkTopologyStrategy working. The best docs i can find is in http://svn.apache.org/repos/asf/cassandra/trunk/conf/cassandra.yaml A wiki page on NetworkTopologyStrategy would help a lot. (http://wiki.apache.org/cassandra/Operations#Network_topology still refers to the old RackAwareStrategy) > NetworkTopologyStrategy allows mismatched RF resulting in obscure failures > -------------------------------------------------------------------------- > > Key: CASSANDRA-1831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1831 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.7.0 rc 1 > Reporter: Peter Schuller > > On today's 0.7 branch: > Creating a keyspace like this (not how to do it in production, but that's not > the point): > create keyspace MyKeySpace with replication_factor = 2 and > placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'; > This is accepted by Cassandra in spite of there being no strategy options. > Describing the keyspace will then give output similar to: > Keyspace: MyKeySpace: > Replication Strategy: org.apache.cassandra.locator.SimpleStrategy > null > Attempts to write and read respectively gives the errors included at the > bottom of this comment. > What happens is that the NTS's getReplicationFactor() returns the sum of RF > for each DC. But lacking any replicate placement options for DC:s, the sum > will always be 0. The result is that NTS.calculateNaturalEndpoints() yields 0 > endpoints thus triggering the assertion failures apparent in the strack > traces. > This was caused by misconfiguration during testing but should be handled > better. What are people's thoughts on the set of changes that would > constitute a proper fix? > Is there a reason for NTS to ever conclude that RF is different than that of > the CF def? If not, I would say that one fix is to make the NTS bail early if > the calculated RF adding up the DC placements does not match the configured > RF for the column family. (I'll submit a patch if people agree.) > Beyond that, what else, if anything should be done? Should the creation fail > due to the RF being inconsistent with strategy options? Is it correct that > code assumes that naturalEndPoints will never return fewer nodes than RF? It > seems natural to me that the natural endpoint count should always match RF, > unless the total number of nodes in the cluster is lacking. But this gets > complicated with NTS since the requirement is suddenly that you have enough > in each DC. This probably relates to previous discussions on whether or not > to allow an RF which is higher than the number of nodes in a cluster. > In this case, we failed hard because we got exactly 0 endpoints and triggered > assertions. In other cases we might have gotten say 1, in which case we may > have successfully been able to read and write as if we had a lower RF even > though the column family RF was set to 2. This seems dangerous. > ERROR [pool-1-thread-2] 2010-12-07 11:18:40,638 Cassandra.java (line > 3044) Internal error processing batch_mutate > java.lang.AssertionError: invalid response count 1 for replication factor 0 > at > org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:98) > at > org.apache.cassandra.service.WriteResponseHandler.<init>(WriteResponseHandler.java:48) > at > org.apache.cassandra.service.WriteResponseHandler.create(WriteResponseHandler.java:61) > at > org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:125) > at > org.apache.cassandra.locator.NetworkTopologyStrategy.getWriteResponseHandler(NetworkTopologyStrategy.java:166) > at > org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:114) > at > org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:446) > at > org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:419) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3036) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > ERROR [pool-1-thread-3] 2010-12-07 11:18:50,474 Cassandra.java (line > 2876) Internal error processing get_range_slices > java.lang.AssertionError > at > org.apache.cassandra.service.RangeSliceResponseResolver.<init>(RangeSliceResponseResolver.java:53) > at > org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:450) > at > org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:507) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > INFO [MigrationStage:1] 2010-12-07 11:24:09,220 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.