[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245 ]
Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:08 PM: ----------------------------------------------------------------- *[EDIT] I didn't see your latests posts before posting, but I hope the extra data can help* You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in charge of this key. But I restarted my test and did the repair on all three nodes, and it didn't work either; here's the output: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} {code} user@cassandra12:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:33:17,866] Repair command #1 finished {code} {code} user@cassandra13:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:33:29,712] Repair command #1 finished {code} The data is still not copied to the new datacenter, and I don't understand why the repair is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter) as you can see in the following nodetool status: {code} user@cassandra13:~$ nodetool status Datacenter: dc1 ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rac UN cassandra01 102 GB 33.3% fa7672f5-77f0-4b41-b9d1-13bf63c39122 0 RC1 UN cassandra02 88.73 GB 33.3% c799df22-0873-4a99-a901-5ef5b00b7b1e 56713727820156410577229101238628035242 RC1 UN cassandra03 50.86 GB 33.3% 5b9c6bc4-7ec7-417d-b92d-c5daa787201b 113427455640312821154458202477256070484 RC1 Datacenter: dc2 ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rac UN cassandra11 51.21 GB 0.0% 7b610455-3fd2-48a3-9315-895a4609be42 1 RC2 UN cassandra12 45.02 GB 0.0% 8553f2c0-851c-4af2-93ee-2854c96de45a 56713727820156410577229101238628035243 RC2 UN cassandra13 36.8 GB 0.0% 7f537660-9128-4c13-872a-6e026104f30e 113427455640312821154458202477256070485 RC2 {code} Furthermore the full repair works, as you can see in this log: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication [2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 17:44:07,934] Repair command #5 finished {code} I hope this information can help, please let me know if you think it's a configuration issue, in which case I would talk to the mailing list. was (Author: alprema): [EDIT] I didn't see your latests posts before posting, but I hope the extra data can help You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in charge of this key. But I restarted my test and did the repair on all three nodes, and it didn't work either; here's the output: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} {code} user@cassandra12:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:33:17,866] Repair command #1 finished {code} {code} user@cassandra13:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:33:29,712] Repair command #1 finished {code} The data is still not copied to the new datacenter, and I don't understand why the repair is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter) as you can see in the following nodetool status: {code} user@cassandra13:~$ nodetool status Datacenter: dc1 ===================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rac UN cassandra01 102 GB 33.3% fa7672f5-77f0-4b41-b9d1-13bf63c39122 0 RC1 UN cassandra02 88.73 GB 33.3% c799df22-0873-4a99-a901-5ef5b00b7b1e 56713727820156410577229101238628035242 RC1 UN cassandra03 50.86 GB 33.3% 5b9c6bc4-7ec7-417d-b92d-c5daa787201b 113427455640312821154458202477256070484 RC1 Datacenter: dc2 ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rac UN cassandra11 51.21 GB 0.0% 7b610455-3fd2-48a3-9315-895a4609be42 1 RC2 UN cassandra12 45.02 GB 0.0% 8553f2c0-851c-4af2-93ee-2854c96de45a 56713727820156410577229101238628035243 RC2 UN cassandra13 36.8 GB 0.0% 7f537660-9128-4c13-872a-6e026104f30e 113427455640312821154458202477256070485 RC2 {code} Furthermore the full repair works, as you can see in this log: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication [2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 17:44:07,934] Repair command #5 finished {code} I hope this information can help, please let me know if you think it's a configuration issue, in which case I would talk to the mailing list. > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > ---------------------------------------------------------------------------------------------- > > Key: CASSANDRA-5424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5424 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.1.7 > Reporter: Jeremiah Jordan > Assignee: Yuki Morishita > Priority: Critical > Fix For: 1.2.5 > > Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt > > > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > Commands follow, but the TL;DR of it, range > (127605887595351923798765477786913079296,0] doesn't get repaired between .38 > node and .236 node until I run a repair, no -pr, on .38 > It seems like primary arnge calculation doesn't take schema into account, but > deciding who to ask for merkle tree's from does. > {noformat} > Address DC Rack Status State Load Owns > Token > > 127605887595351923798765477786913079296 > 10.72.111.225 Cassandra rack1 Up Normal 455.87 KB 25.00% > 0 > 10.2.29.38 Analytics rack1 Up Normal 40.74 MB 25.00% > 42535295865117307932921825928971026432 > 10.46.113.236 Analytics rack1 Up Normal 20.65 MB 50.00% > 127605887595351923798765477786913079296 > create keyspace Keyspace1 > with placement_strategy = 'NetworkTopologyStrategy' > and strategy_options = {Analytics : 2} > and durable_writes = true; > ------- > # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1 > [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2-0000-8b5bf6ebea9e > for range (0,42535295865117307932921825928971026432] finished > [2013-04-03 15:47:00,881] Repair command #1 finished > root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java > (line 676) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] new session: will > sync a1/10.2.29.38, /10.46.113.236 on range > (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1] > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java > (line 881) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] requesting merkle > trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38]) > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Received merkle > tree for Standard1 from /10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Received merkle > tree for Standard1 from a1/10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java > (line 1015) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Endpoints > /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 788) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Standard1 is fully > synced > INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 722) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] session completed > successfully > root@ip-10-46-113-236:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropyStage:1] 2013-04-03 15:46:59,944 AntiEntropyService.java > (line 244) [repair #b79b4850-9c75-11e2-0000-8b5bf6ebea9e] Sending completed > merkle tree to /10.2.29.38 for (Keyspace1,Standard1) > root@ip-10-72-111-225:/home/ubuntu# grep b79b4850-9c75-11e2-0000-8b5bf6ebea9e > /var/log/cassandra/system.log > root@ip-10-72-111-225:/home/ubuntu# > ------- > # nodetool -h 10.46.113.236 repair -pr Keyspace1 Standard1 > [2013-04-03 15:48:00,274] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:48:02,032] Repair session dcb91540-9c75-11e2-0000-a839ee2ccbef > for range > (42535295865117307932921825928971026432,127605887595351923798765477786913079296] > finished > [2013-04-03 15:48:02,033] Repair command #1 finished > root@ip-10-46-113-236:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef > /var/log/cassandra/system.log > INFO [AntiEntropySessions:5] 2013-04-03 15:48:00,280 AntiEntropyService.java > (line 676) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] new session: will > sync a0/10.46.113.236, /10.2.29.38 on range > (42535295865117307932921825928971026432,127605887595351923798765477786913079296] > for Keyspace1.[Standard1] > INFO [AntiEntropySessions:5] 2013-04-03 15:48:00,285 AntiEntropyService.java > (line 881) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] requesting merkle > trees for Standard1 (to [/10.2.29.38, a0/10.46.113.236]) > INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,710 AntiEntropyService.java > (line 211) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Received merkle > tree for Standard1 from a0/10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,943 AntiEntropyService.java > (line 211) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Received merkle > tree for Standard1 from /10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 15:48:02,031 AntiEntropyService.java > (line 1015) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Endpoints > a0/10.46.113.236 and /10.2.29.38 are consistent for Standard1 > INFO [AntiEntropyStage:1] 2013-04-03 15:48:02,032 AntiEntropyService.java > (line 788) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Standard1 is fully > synced > INFO [AntiEntropySessions:5] 2013-04-03 15:48:02,032 AntiEntropyService.java > (line 722) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] session completed > successfully > root@ip-10-2-29-38:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef > /var/log/cassandra/system.log > INFO [AntiEntropyStage:1] 2013-04-03 15:48:01,898 AntiEntropyService.java > (line 244) [repair #dcb91540-9c75-11e2-0000-a839ee2ccbef] Sending completed > merkle tree to /10.46.113.236 for (Keyspace1,Standard1) > root@ip-10-72-111-225:/home/ubuntu# grep dcb91540-9c75-11e2-0000-a839ee2ccbef > /var/log/cassandra/system.log > root@ip-10-72-111-225:/home/ubuntu# > ------- > # nodetool -h 10.72.111.225 repair -pr Keyspace1 Standard1 > [2013-04-03 15:48:30,417] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:48:30,428] Repair session eeb12670-9c75-11e2-0000-316d6fba2dbf > for range (127605887595351923798765477786913079296,0] finished > [2013-04-03 15:48:30,428] Repair command #1 finished > root@ip-10-72-111-225:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf > /var/log/cassandra/system.log > INFO [AntiEntropySessions:1] 2013-04-03 15:48:30,427 AntiEntropyService.java > (line 676) [repair #eeb12670-9c75-11e2-0000-316d6fba2dbf] new session: will > sync /10.72.111.225 on range (127605887595351923798765477786913079296,0] for > Keyspace1.[Standard1] > INFO [AntiEntropySessions:1] 2013-04-03 15:48:30,428 AntiEntropyService.java > (line 681) [repair #eeb12670-9c75-11e2-0000-316d6fba2dbf] No neighbors to > repair with on range (127605887595351923798765477786913079296,0]: session > completed > root@ip-10-46-113-236:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf > /var/log/cassandra/system.log > root@ip-10-46-113-236:/home/ubuntu# > root@ip-10-2-29-38:/home/ubuntu# grep eeb12670-9c75-11e2-0000-316d6fba2dbf > /var/log/cassandra/system.log > root@ip-10-2-29-38:/home/ubuntu# > --- > root@ip-10-2-29-38:/home/ubuntu# nodetool -h 10.2.29.38 repair Keyspace1 > Standard1 > [2013-04-03 16:13:28,674] Starting repair command #2, repairing 3 ranges for > keyspace Keyspace1 > [2013-04-03 16:13:31,786] Repair session 6bb81c20-9c79-11e2-0000-8b5bf6ebea9e > for range > (42535295865117307932921825928971026432,127605887595351923798765477786913079296] > finished > [2013-04-03 16:13:31,786] Repair session 6cb05ed0-9c79-11e2-0000-8b5bf6ebea9e > for range (0,42535295865117307932921825928971026432] finished > [2013-04-03 16:13:31,806] Repair session 6d24a470-9c79-11e2-0000-8b5bf6ebea9e > for range (127605887595351923798765477786913079296,0] finished > [2013-04-03 16:13:31,807] Repair command #2 finished > root@ip-10-2-29-38:/home/ubuntu# grep 6d24a470-9c79-11e2-0000-8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,065 AntiEntropyService.java > (line 676) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] new session: will > sync a1/10.2.29.38, /10.46.113.236 on range > (127605887595351923798765477786913079296,0] for Keyspace1.[Standard1] > INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,065 AntiEntropyService.java > (line 881) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] requesting merkle > trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38]) > INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,751 AntiEntropyService.java > (line 211) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Received merkle > tree for Standard1 from /10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,785 AntiEntropyService.java > (line 211) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Received merkle > tree for Standard1 from a1/10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,805 AntiEntropyService.java > (line 1015) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Endpoints > /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1 > INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,806 AntiEntropyService.java > (line 788) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Standard1 is fully > synced > INFO [AntiEntropySessions:7] 2013-04-03 16:13:31,806 AntiEntropyService.java > (line 722) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] session completed > successfully > root@ip-10-46-113-236:/home/ubuntu# grep 6d24a470-9c79-11e2-0000-8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropyStage:1] 2013-04-03 16:13:31,665 AntiEntropyService.java > (line 244) [repair #6d24a470-9c79-11e2-0000-8b5bf6ebea9e] Sending completed > merkle tree to /10.2.29.38 for (Keyspace1,Standard1) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira