[ 
https://issues.apache.org/jira/browse/CASSANDRA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907671#action_12907671
 ] 

Gary Dusbabek commented on CASSANDRA-1467:
------------------------------------------

Reproduced in 0.6 (mostly).  The bootstrap still happens because we don't 
enforce the RF restriction that we do now.  When the fourth node is brought up 
and bootstrapped, it sees the the ring minus the coordinator node from the 
removetoken operation.

I've got a patch worked up for trunk.  I'll port it to 0.6 and attach shortly.

> replication factor exceeds number of endpoints, when attempting to join a new 
> node (but cluster has enough running nodes to fulfill RF)
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1467
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1467
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6.5, 0.7 beta 1
>         Environment: I'm using 0.7 beta1. Built from: 
> http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.7.0/apache-cassandra-0.7.0-beta1-src.tar.gz
>   using dpkg-buildpackage from it.
>            Reporter: Josep M. Blanquer
>            Assignee: Gary Dusbabek
>             Fix For: 0.6.6, 0.7 beta 2
>
>
> What happens here the following:
> * given a healthy running cluster of 2 nodes (it used to be 3 but I manually 
> killed one down)
> * with a Keyspace having a ReplicationFactor of 2
> * (this cluster is operational, loaded with data and working well)
>  as soon as I want to bring up a new 3rd node:
> * the node is detected by the current cluster
> * but as soon as it tries to initiate bootstrap sequence it dies with:  
> replication factor (2) exceeds number of endpoints (1)
> I believe this is similar to #1343, but not quite the same, since the 
> keyspace and everything is already created, and I'm not attempting any 
> modification. This is purely bringing a new node up.
> One extra tidbit of information, in case it's important...maybe it is not: 
> I have only set 1 seed node configured (this is a test setup). And when the 
> new node starts coming up (before the crash) its nodetool ring only reports 
> the seed one.
> From the new node coming up (before it crashes):
> r...@node1-3:~# nodetool -h localhost -p 8080 ring
> Address         Status State   Load            Token                          
>              
> 10.250.106.111  Up     Normal  116.3 GB        0          < = this the only 
> configured seed node
> From one of the other running nodes (this is the real status of the cluster):
> r...@node1-2:~# nodetool -h localhost -p 8080 ring
> Address         Status State   Load            Token                          
>              
>                                        
> 113416112894748789872342756657008344878    
> 10.250.106.111  Up     Normal  116.3 GB        0                              
>              
> 10.215.195.81   Up     Normal  116.31 GB       
> 56713727820156410577229101238628035242      
> 10.246.65.221   Up     Joining 7.63 KB         
> 113416112894748789872342756657008344878     
> Here's the full boot trace of the node is trying to join:
> Java HotSpot(TM) Client VM warning: Can't detect initial thread stack 
> location - find_vma failed
> Create RMI registry on port 8081
> Get the platform's MBean server
> Initialize the environment map
> Create an RMI connector server
> Start the RMI connector server on port 8081
>  INFO 22:32:50,448 Loading settings from /etc/cassandra/cassandra.yaml
> DEBUG 22:32:50,537 Syncing log with a period of 10000
>  INFO 22:32:50,538 DiskAccessMode 'auto' determined to be standard, 
> indexAccessMode is standard
> DEBUG 22:32:50,548 setting auto_bootstrap to true
> DEBUG 22:32:50,702 Starting CFS Statistics
> DEBUG 22:32:50,710 key cache capacity for Statistics is 1
> DEBUG 22:32:50,711 Starting CFS Schema
> DEBUG 22:32:50,712 key cache capacity for Schema is 1
> DEBUG 22:32:50,712 Starting CFS Migrations
> DEBUG 22:32:50,713 key cache capacity for Migrations is 1
> DEBUG 22:32:50,713 Starting CFS LocationInfo
> DEBUG 22:32:50,713 key cache capacity for LocationInfo is 1
> DEBUG 22:32:50,714 Starting CFS HintsColumnFamily
> DEBUG 22:32:50,714 key cache capacity for HintsColumnFamily is 1
>  INFO 22:32:50,736 Couldn't detect any schema definitions in local storage.
>  INFO 22:32:50,737 Found table data in data directories. Consider using JMX 
> to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> DEBUG 22:32:50,738 opening keyspace system
>  INFO 22:32:50,757 Cassandra version: 
>  INFO 22:32:50,757 Thrift API version: 10.0.0
>  INFO 22:32:50,758 Saved Token not found. Using 
> 113416112894748789872342756657008344878
>  INFO 22:32:50,758 Saved ClusterName not found. Using SOMETHING 
>  INFO 22:32:50,763 Creating new commitlog segment 
> /mnt/cassandra/commitlog/CommitLog-1283553170763.log
> DEBUG 22:32:50,767 Estimating compactions for LocationInfo
> DEBUG 22:32:50,768 Estimating compactions for HintsColumnFamily
> DEBUG 22:32:50,768 Estimating compactions for Migrations
> DEBUG 22:32:50,768 Estimating compactions for Schema
> DEBUG 22:32:50,768 Estimating compactions for Statistics
> DEBUG 22:32:50,769 Checking to see if compaction of LocationInfo would be 
> useful
> DEBUG 22:32:50,769 Checking to see if compaction of HintsColumnFamily would 
> be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Migrations would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Schema would be useful
> DEBUG 22:32:50,769 Checking to see if compaction of Statistics would be useful
>  INFO 22:32:50,779 switching in a fresh Memtable for LocationInfo at 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=276)
>  INFO 22:32:50,782 Enqueuing flush of memtable-locationi...@18721294(192 
> bytes, 4 operations)
>  INFO 22:32:50,783 Writing memtable-locationi...@18721294(192 bytes, 4 
> operations)
>  INFO 22:32:50,897 Completed flushing 
> /mnt/ebs/data/system/LocationInfo-e-1-Data.db
> DEBUG 22:32:50,898 Checking to see if compaction of LocationInfo would be 
> useful
> DEBUG 22:32:50,898 Discarding 0
> DEBUG 22:32:50,899 discard completed log segments for 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=276), column family 0.
> DEBUG 22:32:50,899 Marking replay position 276 on commit log 
> CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
>  INFO 22:32:50,908 Starting up server gossip
>  INFO 22:32:50,928 Joining: getting load information
>  INFO 22:32:50,928 Sleeping 90000 ms to wait for load information...
> DEBUG 22:32:50,940 attempting to connect to node1-1.domain.com/10.250.106.111
> DEBUG 22:32:51,044 attempting to connect to node1-1.domain.com/10.250.106.111
> DEBUG 22:32:51,045 attempting to connect to /10.215.195.81
>  INFO 22:32:51,051 Node /10.215.195.81 is now part of the cluster
> DEBUG 22:32:51,051 Resetting pool for /10.215.195.81
> DEBUG 22:32:51,052 Token 113416112894748789872342756657008344877 removed 
> manually (endpoint was unknown)
>  INFO 22:32:51,052 Node /10.250.106.111 is now part of the cluster
> DEBUG 22:32:51,053 Resetting pool for /10.250.106.111
> DEBUG 22:32:51,053 Node /10.250.106.111 state normal, token 0
> DEBUG 22:32:51,053 clearing cached endpoints
> DEBUG 22:32:51,171 Applying AddKeyspace from /10.250.106.111
> DEBUG 22:32:51,188 Applying migration 77e97d6c-b625-11df-8596-318df8b646e8
>  INFO 22:32:51,189 switching in a fresh Memtable for Migrations at 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=7417)
>  INFO 22:32:51,189 Enqueuing flush of memtable-migrati...@18093512(4938 
> bytes, 1 operations)
>  INFO 22:32:51,189 Writing memtable-migrati...@18093512(4938 bytes, 1 
> operations)
>  INFO 22:32:51,194 switching in a fresh Memtable for Schema at 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=7417)
>  INFO 22:32:51,194 Enqueuing flush of memtable-sch...@27402470(1784 bytes, 3 
> operations)
>  INFO 22:32:51,274 Completed flushing 
> /mnt/ebs/data/system/Migrations-e-1-Data.db
> DEBUG 22:32:51,274 Checking to see if compaction of Migrations would be useful
> DEBUG 22:32:51,275 Discarding 2
> DEBUG 22:32:51,275 discard completed log segments for 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=7417), column family 2
> .
> DEBUG 22:32:51,275 Marking replay position 7417 on commit log 
> CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
>  INFO 22:32:51,275 Writing memtable-sch...@27402470(1784 bytes, 3 operations)
>  INFO 22:32:51,388 Completed flushing /mnt/ebs/data/system/Schema-e-1-Data.db
> DEBUG 22:32:51,389 Checking to see if compaction of Schema would be useful
> DEBUG 22:32:51,389 Discarding 3
> DEBUG 22:32:51,389 discard completed log segments for 
> CommitLogContext(file='/mnt/cassandra/commitlog/CommitLog-1283553170763.log', 
> position=7417), column family 3
> .
> DEBUG 22:32:51,389 Marking replay position 7417 on commit log 
> CommitLogSegment(/mnt/cassandra/commitlog/CommitLog-1283553170763.log)
> DEBUG 22:32:51,406 Starting CFS MyColumnFamily
> DEBUG 22:32:51,406 key cache capacity for MyColumnFamily is 200000
>  INFO 22:32:51,407 Creating new commitlog segment 
> /mnt/cassandra/commitlog/CommitLog-1283553171407.log
> DEBUG 22:32:51,604 attempting to connect to node1-1.domain.com/10.250.106.111
>  INFO 22:32:51,622 InetAddress /10.215.195.81 is now UP
>  INFO 22:32:51,623 InetAddress /10.250.106.111 is now UP
>  INFO 22:32:51,623 Started hinted handoff for endpoint /10.215.195.81
>  INFO 22:32:51,631 Finished hinted handoff of 0 rows to endpoint 
> /10.215.195.81
>  INFO 22:32:51,632 Started hinted handoff for endpoint /10.250.106.111
>  INFO 22:32:51,632 Finished hinted handoff of 0 rows to endpoint 
> /10.250.106.111
> DEBUG 22:32:51,918 GC for ParNew: 14 ms, 20561696 reclaimed leaving 149257720 
> used; max is 902627328
> DEBUG 22:32:51,919 GC for ConcurrentMarkSweep: 73 ms, 4178480 reclaimed 
> leaving 8520856 used; max is 902627328
> DEBUG 22:32:52,924 Disseminating load info ...
> DEBUG 22:32:53,929 attempting to connect to /10.215.195.81
> DEBUG 22:32:54,925 GC for ConcurrentMarkSweep: 73 ms, 212034048 reclaimed 
> leaving 15214608 used; max is 902627328
> DEBUG 22:33:52,931 Disseminating load info ...
> DEBUG 22:34:20,929 ... got load info
>  INFO 22:34:20,929 Joining: getting bootstrap token
> DEBUG 22:34:20,931 token manually specified as 
> 113416112894748789872342756657008344878
>  INFO 22:34:20,932 Joining: sleeping 30000 ms for pending range setup
>  INFO 22:34:50,935 Bootstrapping
> DEBUG 22:34:50,935 Beginning bootstrap process
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
> Caused by: java.lang.IllegalStateException: replication factor (2) exceeds 
> number of endpoints (1)
>         at 
> org.apache.cassandra.locator.RackUnawareStrategy.calculateNaturalEndpoints(RackUnawareStrategy.java:57)
>         at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getRangeAddresses(AbstractReplicationStrategy.java:195)
>         at 
> org.apache.cassandra.dht.BootStrapper.getRangesWithSources(BootStrapper.java:155)
>         at 
> org.apache.cassandra.dht.BootStrapper.startBootstrap(BootStrapper.java:73)
>         at 
> org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:467)
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:408)
>         at 
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:134)
>         at 
> org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:57)
>         ... 5 more
> Cannot load daemon
> Service exit with a return value of 3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to