[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673327#comment-13673327 ] Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 5:15 PM: - I redid the same test (creating the keyspace with data, then changing its replication factor so it's replicated in DC2, then repairing) and it turns out that if you don't run a repair on DC2 before changing the replication factor, the repair -pr works fine \-_\-. Anyway, your solution worked, thank you for your help and sorry I polluted JIRA with my questions. was (Author: alprema): I redid the same test (creating the keyspace with data, then changing its replication factor so it's replicated in DC2, then repairing) and it turns out that if you don't run a repair on DC2 before changing the replication factor, the repair -pr works fine -_-. Anyway, your solution worked, thank you for your help and sorry I polluted JIRA with my questions. > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > -- > > Key: CASSANDRA-5424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5424 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Jeremiah Jordan >Assignee: Yuki Morishita >Priority: Critical > Fix For: 1.2.5 > > Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt > > > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > Commands follow, but the TL;DR of it, range > (127605887595351923798765477786913079296,0] doesn't get repaired between .38 > node and .236 node until I run a repair, no -pr, on .38 > It seems like primary arnge calculation doesn't take schema into account, but > deciding who to ask for merkle tree's from does. > {noformat} > Address DC RackStatus State LoadOwns > Token > > 127605887595351923798765477786913079296 > 10.72.111.225 Cassandra rack1 Up Normal 455.87 KB 25.00% > 0 > 10.2.29.38 Analytics rack1 Up Normal 40.74 MB25.00% > 42535295865117307932921825928971026432 > 10.46.113.236 Analytics rack1 Up Normal 20.65 MB50.00% > 127605887595351923798765477786913079296 > create keyspace Keyspace1 > with placement_strategy = 'NetworkTopologyStrategy' > and strategy_options = {Analytics : 2} > and durable_writes = true; > --- > # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1 > [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e > for range (0,42535295865117307932921825928971026432] finished > [2013-04-03 15:47:00,881] Repair command #1 finished > root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java > (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will > sync a1/10.2.29.38, /10.46.113.236 on range > (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1] > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java > (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle > trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38]) > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from /10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from a1/10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java > (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints > /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 788) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Standard1 is fully > synced > INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 722) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] session completed > successfully > root@ip-10-46-113-236:/home/ubuntu# grep b79b4850
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245 ] Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:07 PM: - [EDIT] I didn't see your latests posts before posting, but I hope the extra data can help You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in charge of this key. But I restarted my test and did the repair on all three nodes, and it didn't work either; here's the output: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} {code} user@cassandra12:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:33:17,866] Repair command #1 finished {code} {code} user@cassandra13:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:33:29,712] Repair command #1 finished {code} The data is still not copied to the new datacenter, and I don't understand why the repair is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter) as you can see in the following nodetool status: {code} user@cassandra13:~$ nodetool status Datacenter: dc1 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra01 102 GB 33.3% fa7672f5-77f0-4b41-b9d1-13bf63c39122 0 RC1 UN cassandra02 88.73 GB 33.3% c799df22-0873-4a99-a901-5ef5b00b7b1e 56713727820156410577229101238628035242 RC1 UN cassandra03 50.86 GB 33.3% 5b9c6bc4-7ec7-417d-b92d-c5daa787201b 113427455640312821154458202477256070484 RC1 Datacenter: dc2 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra11 51.21 GB 0.0% 7b610455-3fd2-48a3-9315-895a4609be42 1 RC2 UN cassandra12 45.02 GB 0.0% 8553f2c0-851c-4af2-93ee-2854c96de45a 56713727820156410577229101238628035243 RC2 UN cassandra13 36.8 GB0.0% 7f537660-9128-4c13-872a-6e026104f30e 113427455640312821154458202477256070485 RC2 {code} Furthermore the full repair works, as you can see in this log: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication [2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 17:44:07,934] Repair command #5 finished {code} I hope this information can help, please let me know if you think it's a configuration issue, in which case I would talk to the mailing list. was (Author: alprema): You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on th
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245 ] Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:09 PM: - *[EDIT] I didn't see your latests posts before posting, but I hope the extra data can help anyway* You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in charge of this key. But I restarted my test and did the repair on all three nodes, and it didn't work either; here's the output: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} {code} user@cassandra12:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:33:17,866] Repair command #1 finished {code} {code} user@cassandra13:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:33:29,712] Repair command #1 finished {code} The data is still not copied to the new datacenter, and I don't understand why the repair is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter) as you can see in the following nodetool status: {code} user@cassandra13:~$ nodetool status Datacenter: dc1 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra01 102 GB 33.3% fa7672f5-77f0-4b41-b9d1-13bf63c39122 0 RC1 UN cassandra02 88.73 GB 33.3% c799df22-0873-4a99-a901-5ef5b00b7b1e 56713727820156410577229101238628035242 RC1 UN cassandra03 50.86 GB 33.3% 5b9c6bc4-7ec7-417d-b92d-c5daa787201b 113427455640312821154458202477256070484 RC1 Datacenter: dc2 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra11 51.21 GB 0.0% 7b610455-3fd2-48a3-9315-895a4609be42 1 RC2 UN cassandra12 45.02 GB 0.0% 8553f2c0-851c-4af2-93ee-2854c96de45a 56713727820156410577229101238628035243 RC2 UN cassandra13 36.8 GB0.0% 7f537660-9128-4c13-872a-6e026104f30e 113427455640312821154458202477256070485 RC2 {code} Furthermore the full repair works, as you can see in this log: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication [2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 17:44:07,934] Repair command #5 finished {code} I hope this information can help, please let me know if you think it's a configuration issue, in which case I would talk to the mailing list. was (Author: alprema): *[EDIT] I didn't see your latests posts before posting, but I hope the extra data can help* You were right to say that I need to run the repair -pr on the three no
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245 ] Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:08 PM: - *[EDIT] I didn't see your latests posts before posting, but I hope the extra data can help* You were right to say that I need to run the repair -pr on the three nodes, because I only have one row (it's a test) in the CF so I guess I had to run the repair -pr on the node in charge of this key. But I restarted my test and did the repair on all three nodes, and it didn't work either; here's the output: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} {code} user@cassandra12:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:33:17,866] Repair command #1 finished {code} {code} user@cassandra13:~$ nodetool repair -pr Test_Replication [2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:33:29,712] Repair command #1 finished {code} The data is still not copied to the new datacenter, and I don't understand why the repair is made for those ranges (a range of 1??), it could be a problem of unbalanced cluster as you suggested, but we distributed the tokens as advised (+1 on the nodes of the new datacenter) as you can see in the following nodetool status: {code} user@cassandra13:~$ nodetool status Datacenter: dc1 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra01 102 GB 33.3% fa7672f5-77f0-4b41-b9d1-13bf63c39122 0 RC1 UN cassandra02 88.73 GB 33.3% c799df22-0873-4a99-a901-5ef5b00b7b1e 56713727820156410577229101238628035242 RC1 UN cassandra03 50.86 GB 33.3% 5b9c6bc4-7ec7-417d-b92d-c5daa787201b 113427455640312821154458202477256070484 RC1 Datacenter: dc2 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID TokenRac UN cassandra11 51.21 GB 0.0% 7b610455-3fd2-48a3-9315-895a4609be42 1 RC2 UN cassandra12 45.02 GB 0.0% 8553f2c0-851c-4af2-93ee-2854c96de45a 56713727820156410577229101238628035243 RC2 UN cassandra13 36.8 GB0.0% 7f537660-9128-4c13-872a-6e026104f30e 113427455640312821154458202477256070485 RC2 {code} Furthermore the full repair works, as you can see in this log: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for keyspace Test_Replication [2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 17:44:07,934] Repair command #5 finished {code} I hope this information can help, please let me know if you think it's a configuration issue, in which case I would talk to the mailing list. was (Author: alprema): [EDIT] I didn't see your latests posts before posting, but I hope the extra data can help You were right to say that I need to run the repair -pr on the three nodes, bec
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673203#comment-13673203 ] Jonathan Ellis edited comment on CASSANDRA-5424 at 6/3/13 3:29 PM: --- I should have said, 2-DC setup, NTS, and replicas in both DC. And more than one node in each DC. In any case, I do see the problem now. Working on a fix. was (Author: jbellis): I should have said, 2-DC setup, NTS, and replicas in both DC. > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > -- > > Key: CASSANDRA-5424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5424 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Jeremiah Jordan >Assignee: Yuki Morishita >Priority: Critical > Fix For: 1.2.5 > > Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt > > > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > Commands follow, but the TL;DR of it, range > (127605887595351923798765477786913079296,0] doesn't get repaired between .38 > node and .236 node until I run a repair, no -pr, on .38 > It seems like primary arnge calculation doesn't take schema into account, but > deciding who to ask for merkle tree's from does. > {noformat} > Address DC RackStatus State LoadOwns > Token > > 127605887595351923798765477786913079296 > 10.72.111.225 Cassandra rack1 Up Normal 455.87 KB 25.00% > 0 > 10.2.29.38 Analytics rack1 Up Normal 40.74 MB25.00% > 42535295865117307932921825928971026432 > 10.46.113.236 Analytics rack1 Up Normal 20.65 MB50.00% > 127605887595351923798765477786913079296 > create keyspace Keyspace1 > with placement_strategy = 'NetworkTopologyStrategy' > and strategy_options = {Analytics : 2} > and durable_writes = true; > --- > # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1 > [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e > for range (0,42535295865117307932921825928971026432] finished > [2013-04-03 15:47:00,881] Repair command #1 finished > root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java > (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will > sync a1/10.2.29.38, /10.46.113.236 on range > (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1] > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java > (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle > trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38]) > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from /10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from a1/10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java > (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints > /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 788) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Standard1 is fully > synced > INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java > (line 722) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] session completed > successfully > root@ip-10-46-113-236:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropyStage:1] 2013-04-03 15:46:59,944 AntiEntropyService.java > (line 244) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Sending completed > merkle tree to /10.2.29.38 for (Keyspace1,Standard1) > root@ip-10-72-111-225:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e > /var/log/cassandra/system.log > root@ip-10-72-111-225:/home/ubuntu# > --- > # nodetool -h 10.46.113.236 repair -pr Keyspace1 Standard1 > [2013-04-03
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673060#comment-13673060 ] Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 12:21 PM: -- We just applied 1.2.5 on our cluster and the repair hanging is fixed, but the -pr is still not working as expected. Our cluster has two datacenters, let's call them dc1 and dc2, we created a Keyspace Test_Replication with replication factor _\{ dc1: 3 \}_ (no info for dc2) and ran a nodetool repair Test_Replication (that used to hang) on dc2 and it exited saying there was nothing to do (which is OK). Then we changed the replication factor to _\{ dc1: 3, dc2: 3 \}_ and started a nodetool repair -pr Test_Replication on cassandra11@dc2 which output this: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} But even after flushing the Keyspace, there was no data on the server. We then ran a full repair: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 14:01:56,679] Starting repair command #2, repairing 6 ranges for keyspace Test_Replication [2013-06-03 14:01:57,260] Repair session 63632d70-cc45-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 14:01:57,260] Repair session 63650230-cc45-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 14:01:57,260] Repair session 6385d0a0-cc45-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 14:01:57,260] Repair session 639f7320-cc45-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 14:01:57,260] Repair session 63af51a0-cc45-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 14:01:57,295] Repair session 63b12660-cc45-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 14:01:57,295] Repair command #2 finished {code} After which we could find the data on dc2 as expected. So it seems that -pr is still not working as expected, or maybe we're doing/understanding something wrong. (I was not sure if I should open a new ticket or comment this one so please let me know if I should move it) was (Author: alprema): We just applied 1.2.5 on our cluster and the repair hanging is fixed, but the -pr is still not working as expected. Our cluster has two datacenters, let's call them dc1 and dc2, we created a Keyspace Test_Replication with replication factor _\{ dc1: 3 \}_ (no info for dc2) and ran a nodetool repair Test_Replication (that used to hang) on dc2 and it exited saying there was nothing to do (which is OK). Then we changed the replication factor to _\{ dc1: 3, dc2: 3 \}_ and started a nodetool repair -pr Test_Replication on cassandra11@dc2 which output this: {code} user@cassandra11:~$ nodetool repair -pr Test_Replication [2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for keyspace Test_Replication [2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 13:54:53,985] Repair command #1 finished {code} But even after flushing the Keyspace, there was no data on the server. We then ran a full repair: {code} user@cassandra11:~$ nodetool repair Test_Replication [2013-06-03 14:01:56,679] Starting repair command #2, repairing 6 ranges for keyspace Test_Replication [2013-06-03 14:01:57,260] Repair session 63632d70-cc45-11e2-bfd5-3d9212e452cc for range (0,1] finished [2013-06-03 14:01:57,260] Repair session 63650230-cc45-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035243,113427455640312821154458202477256070484] finished [2013-06-03 14:01:57,260] Repair session 6385d0a0-cc45-11e2-bfd5-3d9212e452cc for range (1,56713727820156410577229101238628035242] finished [2013-06-03 14:01:57,260] Repair session 639f7320-cc45-11e2-bfd5-3d9212e452cc for range (56713727820156410577229101238628035242,56713727820156410577229101238628035243] finished [2013-06-03 14:01:57,260] Repair session 63af51a0-cc45-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070484,113427455640312821154458202477256070485] finished [2013-06-03 14:01:57,295] Repair session 63b12660-cc45-11e2-bfd5-3d9212e452cc for range (113427455640312821154458202477256070485,0] finished [2013-06-03 14:01:57,295] Repair command #2 finished {code} After which we could find the data on dc2 as expected. So it seem
[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's
[ https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633274#comment-13633274 ] Yuki Morishita edited comment on CASSANDRA-5424 at 4/16/13 8:01 PM: v3 attached. - NTS now uses LinkedHashSet in calculateNaturalEndpoint to preserve insertion order while eliminating duplicates. - I think it is unsafe to use cached endpoints through getNaturalEndpoints since tokenMetadata cannot be consistent inside getPrimaryRangesForEndpoint, so I stick with impl from v2. - fix sampleKeyRange. I think the problem is that the name tokenMetadata.getPrimaryRangeFor is confusing. Probably we should rename that to just getRangeFor. - Added test for getPrimaryRangesForEndpoint to StorageServiceServerTest. was (Author: yukim): v3 attached. - NTS now uses LinkedHashSet in calculateNaturalEndpoint to preserve insertion order while eliminating duplicates. - I think it is unsafe to use cached endpoints through getNaturalEndpoints since tokenMetadata cannot be consistent inside getPrimaryRangesForEndpoint, so I stick with impl from v2. - fix sampleKeyRange. I think the problem is the nome of the method tokenMetadata.getPrimaryRangeFor is confusing. Probably we should rename that to just getRangeFor. - Added test for getPrimaryRangesForEndpoint to StorageServiceServerTest. > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > -- > > Key: CASSANDRA-5424 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5424 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.1.7 >Reporter: Jeremiah Jordan >Assignee: Yuki Morishita >Priority: Critical > Fix For: 1.2.5 > > Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt > > > nodetool repair -pr on all nodes won't repair the full range when a Keyspace > isn't in all DC's > Commands follow, but the TL;DR of it, range > (127605887595351923798765477786913079296,0] doesn't get repaired between .38 > node and .236 node until I run a repair, no -pr, on .38 > It seems like primary arnge calculation doesn't take schema into account, but > deciding who to ask for merkle tree's from does. > {noformat} > Address DC RackStatus State LoadOwns > Token > > 127605887595351923798765477786913079296 > 10.72.111.225 Cassandra rack1 Up Normal 455.87 KB 25.00% > 0 > 10.2.29.38 Analytics rack1 Up Normal 40.74 MB25.00% > 42535295865117307932921825928971026432 > 10.46.113.236 Analytics rack1 Up Normal 20.65 MB50.00% > 127605887595351923798765477786913079296 > create keyspace Keyspace1 > with placement_strategy = 'NetworkTopologyStrategy' > and strategy_options = {Analytics : 2} > and durable_writes = true; > --- > # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1 > [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for > keyspace Keyspace1 > [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e > for range (0,42535295865117307932921825928971026432] finished > [2013-04-03 15:47:00,881] Repair command #1 finished > root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e > /var/log/cassandra/system.log > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java > (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will > sync a1/10.2.29.38, /10.46.113.236 on range > (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1] > INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java > (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle > trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38]) > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from /10.46.113.236 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java > (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle > tree for Standard1 from a1/10.2.29.38 > INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java > (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints > /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1 > INFO [Ant