[ https://issues.apache.org/jira/browse/CASSANDRA-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Dusbabek updated CASSANDRA-1221: ------------------------------------- Attachment: 0.6-conviction-fix.diff patch for 0.6. I couldn't get stress.py to work in my branch, but the same problem should be present. All tests pass with this patch. > loadbalance operation never completes on a 3 node cluster > --------------------------------------------------------- > > Key: CASSANDRA-1221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1221 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.7 > Reporter: Gary Dusbabek > Assignee: Gary Dusbabek > Fix For: 0.7 > > Attachments: 0.6-conviction-fix.diff, > 0001-Gossiper-and-FD-never-called-MS.convict-to-shut-down.patch, system1.log, > system2.log, system3.log > > > Arya Goudarzi reports: > Please confirm if this is an issue and should be reported or I am doing > something wrong. I could not find anything relevant on JIRA: > Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way: > - Added one node; > - Loaded default schema with RF 1 from YAML using JMX; > - Loaded 2M keys using py_stress; > - Bootstrapped a second node; > - Cleaned up the first node; > - Bootstrapped a third node; > - Cleaned up the second node; > I got the following ring: > Address Status Load Range > Ring > 154293670372423273273390365393543806425 > 10.50.26.132 Up 518.63 MB 69164917636305877859094619660693892452 > |<--| > 10.50.26.134 Up 234.8 MB > 111685517405103688771527967027648896391 | | > 10.50.26.133 Up 235.26 MB > 154293670372423273273390365393543806425 |-->| > Now I ran: > nodetool --host 10.50.26.132 loadbalance > It's been going for a while. I checked the streams > nodetool --host 10.50.26.134 streams > Mode: Normal > Not sending any streams. > Streaming from: /10.50.26.132 > Keyspace1: > /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), > (22206096,27271682)] > Keyspace1: > /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), > (15180462,18656982)] > Keyspace1: > /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), > (353139829,433883659)] > Keyspace1: > /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), > (366336059,450095320)] > nodetool --host 10.50.26.132 streams > Mode: Leaving: streaming data to other nodes > Streaming to: /10.50.26.134 > /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), > (366336059,450095320)] > Not receiving any streams. > These have been going for the past 2 hours. > I see in the logs of the node with 134 IP address and I saw this: > INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) > Will not change my token ownership to /10.50.26.132 > So, to my understanding from wikis loadbalance supposed to decommission and > re-bootstrap again by sending its tokens to other nodes and then bootstrap > again. It's been stuck in streaming for the past 2 hours and the size of ring > has not changed. The log in the first node says it has started streaming for > the past hours: > INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) > Beginning transfer process to /10.50.26.134 for ranges > (154293670372423273273390365393543806425,69164917636305877859094619660693892452] > INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) > Flushing memtables for Keyspace1... > INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) > Stream context metadata > [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), > (366336059,450095320)]] 1 sstables. > INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) > Sending a stream initiate message to /10.50.26.134 ... > INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) > Waiting for transfer to /10.50.26.134 to complete > INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) > LocationInfo has reached its threshold; switching in a fresh Memtable at > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', > position=720) > INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) > Enqueuing flush of Memtable(LocationInfo)@1637794189 > INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) > Writing Memtable(LocationInfo)@1637794189 > INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) > Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db > INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 > ColumnFamilyStore.java (line 374) Discarding 1000 > Nothing more after this line. > Am I doing something wrong? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.