[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673327#comment-13673327
 ] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 5:15 PM:
-

I redid the same test (creating the keyspace with data, then changing its 
replication factor so it's replicated in DC2, then repairing) and it turns out 
that if you don't run a repair on DC2 before changing the replication factor, 
the repair -pr works fine \-_\-.

Anyway, your solution worked, thank you for your help and sorry I polluted JIRA 
with my questions.


  was (Author: alprema):
I redid the same test (creating the keyspace with data, then changing its 
replication factor so it's replicated in DC2, then repairing) and it turns out 
that if you don't run a repair on DC2 before changing the replication factor, 
the repair -pr works fine -_-.

Anyway, your solution worked, thank you for your help and sorry I polluted JIRA 
with my questions.

  
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> --
>
> Key: CASSANDRA-5424
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5424
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.1.7
>Reporter: Jeremiah Jordan
>Assignee: Yuki Morishita
>Priority: Critical
> Fix For: 1.2.5
>
> Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt
>
>
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> Commands follow, but the TL;DR of it, range 
> (127605887595351923798765477786913079296,0] doesn't get repaired between .38 
> node and .236 node until I run a repair, no -pr, on .38
> It seems like primary arnge calculation doesn't take schema into account, but 
> deciding who to ask for merkle tree's from does.
> {noformat}
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  127605887595351923798765477786913079296 
> 10.72.111.225   Cassandra   rack1   Up Normal  455.87 KB   25.00% 
>  0   
> 10.2.29.38  Analytics   rack1   Up Normal  40.74 MB25.00% 
>  42535295865117307932921825928971026432  
> 10.46.113.236   Analytics   rack1   Up Normal  20.65 MB50.00% 
>  127605887595351923798765477786913079296 
> create keyspace Keyspace1
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {Analytics : 2}
>   and durable_writes = true;
> ---
> # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1
> [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for 
> keyspace Keyspace1
> [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e 
> for range (0,42535295865117307932921825928971026432] finished
> [2013-04-03 15:47:00,881] Repair command #1 finished
> root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e 
> /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java 
> (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will 
> sync a1/10.2.29.38, /10.46.113.236 on range 
> (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java 
> (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle 
> trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38])
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from /10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from a1/10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java 
> (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints 
> /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java 
> (line 788) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Standard1 is fully 
> synced
>  INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java 
> (line 722) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] session completed 
> successfully
> root@ip-10-46-113-236:/home/ubuntu# grep b79b4850

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245
 ] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:07 PM:
-

[EDIT] I didn't see your latests posts before posting, but I hope the extra 
data can help

You were right to say that I need to run the repair -pr on the three nodes, 
because I only have one row (it's a test) in the CF so I guess I had to run the 
repair -pr on the node in charge of this key.
But I restarted my test and did the repair on all three nodes, and it didn't 
work either; here's the output:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}

{code}
user@cassandra12:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:33:17,866] Repair command #1 finished
{code}

{code}
user@cassandra13:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:33:29,712] Repair command #1 finished
{code}

The data is still not copied to the new datacenter, and I don't understand why 
the repair is made for those ranges (a range of 1??), it could be a problem of 
unbalanced cluster as you suggested, but we distributed the tokens as advised 
(+1 on the nodes of the new datacenter) as you can see in the following 
nodetool status:

{code}
user@cassandra13:~$ nodetool status
Datacenter: dc1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra01 102 GB 33.3%  fa7672f5-77f0-4b41-b9d1-13bf63c39122  0   
 RC1
UN  cassandra02 88.73 GB   33.3%  c799df22-0873-4a99-a901-5ef5b00b7b1e  
56713727820156410577229101238628035242   RC1
UN  cassandra03 50.86 GB   33.3%  5b9c6bc4-7ec7-417d-b92d-c5daa787201b  
113427455640312821154458202477256070484  RC1
Datacenter: dc2
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra11 51.21 GB   0.0%   7b610455-3fd2-48a3-9315-895a4609be42  1   
 RC2
UN  cassandra12 45.02 GB   0.0%   8553f2c0-851c-4af2-93ee-2854c96de45a  
56713727820156410577229101238628035243   RC2
UN  cassandra13 36.8 GB0.0%   7f537660-9128-4c13-872a-6e026104f30e  
113427455640312821154458202477256070485  RC2
{code}

Furthermore the full repair works, as you can see in this log:

{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for 
keyspace Test_Replication
[2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035243,113427455640312821154458202477256070484]
 finished
[2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc 
for range (1,56713727820156410577229101238628035242] finished
[2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc 
for range (113427455640312821154458202477256070485,0] finished
[2013-06-03 17:44:07,934] Repair command #5 finished
{code}

I hope this information can help, please let me know if you think it's a 
configuration issue, in which case I would talk to the mailing list.

  was (Author: alprema):
You were right to say that I need to run the repair -pr on the three nodes, 
because I only have one row (it's a test) in the CF so I guess I had to run the 
repair -pr on th

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245
 ] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:09 PM:
-

*[EDIT] I didn't see your latests posts before posting, but I hope the extra 
data can help anyway*

You were right to say that I need to run the repair -pr on the three nodes, 
because I only have one row (it's a test) in the CF so I guess I had to run the 
repair -pr on the node in charge of this key.
But I restarted my test and did the repair on all three nodes, and it didn't 
work either; here's the output:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}

{code}
user@cassandra12:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:33:17,866] Repair command #1 finished
{code}

{code}
user@cassandra13:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:33:29,712] Repair command #1 finished
{code}

The data is still not copied to the new datacenter, and I don't understand why 
the repair is made for those ranges (a range of 1??), it could be a problem of 
unbalanced cluster as you suggested, but we distributed the tokens as advised 
(+1 on the nodes of the new datacenter) as you can see in the following 
nodetool status:

{code}
user@cassandra13:~$ nodetool status
Datacenter: dc1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra01 102 GB 33.3%  fa7672f5-77f0-4b41-b9d1-13bf63c39122  0   
 RC1
UN  cassandra02 88.73 GB   33.3%  c799df22-0873-4a99-a901-5ef5b00b7b1e  
56713727820156410577229101238628035242   RC1
UN  cassandra03 50.86 GB   33.3%  5b9c6bc4-7ec7-417d-b92d-c5daa787201b  
113427455640312821154458202477256070484  RC1
Datacenter: dc2
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra11 51.21 GB   0.0%   7b610455-3fd2-48a3-9315-895a4609be42  1   
 RC2
UN  cassandra12 45.02 GB   0.0%   8553f2c0-851c-4af2-93ee-2854c96de45a  
56713727820156410577229101238628035243   RC2
UN  cassandra13 36.8 GB0.0%   7f537660-9128-4c13-872a-6e026104f30e  
113427455640312821154458202477256070485  RC2
{code}

Furthermore the full repair works, as you can see in this log:

{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for 
keyspace Test_Replication
[2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035243,113427455640312821154458202477256070484]
 finished
[2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc 
for range (1,56713727820156410577229101238628035242] finished
[2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc 
for range (113427455640312821154458202477256070485,0] finished
[2013-06-03 17:44:07,934] Repair command #5 finished
{code}

I hope this information can help, please let me know if you think it's a 
configuration issue, in which case I would talk to the mailing list.

  was (Author: alprema):
*[EDIT] I didn't see your latests posts before posting, but I hope the 
extra data can help*

You were right to say that I need to run the repair -pr on the three no

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673245#comment-13673245
 ] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 4:08 PM:
-

*[EDIT] I didn't see your latests posts before posting, but I hope the extra 
data can help*

You were right to say that I need to run the repair -pr on the three nodes, 
because I only have one row (it's a test) in the CF so I guess I had to run the 
repair -pr on the node in charge of this key.
But I restarted my test and did the repair on all three nodes, and it didn't 
work either; here's the output:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}

{code}
user@cassandra12:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:17,844] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:17,866] Repair session e9f38c50-cc62-11e2-af47-db8ca926a9c5 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:33:17,866] Repair command #1 finished
{code}

{code}
user@cassandra13:~$ nodetool repair -pr Test_Replication
[2013-06-03 17:33:29,689] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 17:33:29,712] Repair session f102f3a0-cc62-11e2-ae98-39da3e693be3 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:33:29,712] Repair command #1 finished
{code}

The data is still not copied to the new datacenter, and I don't understand why 
the repair is made for those ranges (a range of 1??), it could be a problem of 
unbalanced cluster as you suggested, but we distributed the tokens as advised 
(+1 on the nodes of the new datacenter) as you can see in the following 
nodetool status:

{code}
user@cassandra13:~$ nodetool status
Datacenter: dc1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra01 102 GB 33.3%  fa7672f5-77f0-4b41-b9d1-13bf63c39122  0   
 RC1
UN  cassandra02 88.73 GB   33.3%  c799df22-0873-4a99-a901-5ef5b00b7b1e  
56713727820156410577229101238628035242   RC1
UN  cassandra03 50.86 GB   33.3%  5b9c6bc4-7ec7-417d-b92d-c5daa787201b  
113427455640312821154458202477256070484  RC1
Datacenter: dc2
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Owns   Host ID   
TokenRac
UN  cassandra11 51.21 GB   0.0%   7b610455-3fd2-48a3-9315-895a4609be42  1   
 RC2
UN  cassandra12 45.02 GB   0.0%   8553f2c0-851c-4af2-93ee-2854c96de45a  
56713727820156410577229101238628035243   RC2
UN  cassandra13 36.8 GB0.0%   7f537660-9128-4c13-872a-6e026104f30e  
113427455640312821154458202477256070485  RC2
{code}

Furthermore the full repair works, as you can see in this log:

{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 17:44:07,570] Starting repair command #5, repairing 6 ranges for 
keyspace Test_Replication
[2013-06-03 17:44:07,903] Repair session 6d37b720-cc64-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 17:44:07,903] Repair session 6d3a0110-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035243,113427455640312821154458202477256070484]
 finished
[2013-06-03 17:44:07,903] Repair session 6d4d6200-cc64-11e2-bfd5-3d9212e452cc 
for range (1,56713727820156410577229101238628035242] finished
[2013-06-03 17:44:07,903] Repair session 6d581060-cc64-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 17:44:07,903] Repair session 6d5ea010-cc64-11e2-bfd5-3d9212e452cc 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 17:44:07,934] Repair session 6d604dc0-cc64-11e2-bfd5-3d9212e452cc 
for range (113427455640312821154458202477256070485,0] finished
[2013-06-03 17:44:07,934] Repair command #5 finished
{code}

I hope this information can help, please let me know if you think it's a 
configuration issue, in which case I would talk to the mailing list.

  was (Author: alprema):
[EDIT] I didn't see your latests posts before posting, but I hope the extra 
data can help

You were right to say that I need to run the repair -pr on the three nodes, 
bec

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673203#comment-13673203
 ] 

Jonathan Ellis edited comment on CASSANDRA-5424 at 6/3/13 3:29 PM:
---

I should have said, 2-DC setup, NTS, and replicas in both DC.  And more than 
one node in each DC.

In any case, I do see the problem now.  Working on a fix.

  was (Author: jbellis):
I should have said, 2-DC setup, NTS, and replicas in both DC.
  
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> --
>
> Key: CASSANDRA-5424
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5424
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.1.7
>Reporter: Jeremiah Jordan
>Assignee: Yuki Morishita
>Priority: Critical
> Fix For: 1.2.5
>
> Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt
>
>
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> Commands follow, but the TL;DR of it, range 
> (127605887595351923798765477786913079296,0] doesn't get repaired between .38 
> node and .236 node until I run a repair, no -pr, on .38
> It seems like primary arnge calculation doesn't take schema into account, but 
> deciding who to ask for merkle tree's from does.
> {noformat}
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  127605887595351923798765477786913079296 
> 10.72.111.225   Cassandra   rack1   Up Normal  455.87 KB   25.00% 
>  0   
> 10.2.29.38  Analytics   rack1   Up Normal  40.74 MB25.00% 
>  42535295865117307932921825928971026432  
> 10.46.113.236   Analytics   rack1   Up Normal  20.65 MB50.00% 
>  127605887595351923798765477786913079296 
> create keyspace Keyspace1
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {Analytics : 2}
>   and durable_writes = true;
> ---
> # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1
> [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for 
> keyspace Keyspace1
> [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e 
> for range (0,42535295865117307932921825928971026432] finished
> [2013-04-03 15:47:00,881] Repair command #1 finished
> root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e 
> /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java 
> (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will 
> sync a1/10.2.29.38, /10.46.113.236 on range 
> (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java 
> (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle 
> trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38])
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from /10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from a1/10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java 
> (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints 
> /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,880 AntiEntropyService.java 
> (line 788) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Standard1 is fully 
> synced
>  INFO [AntiEntropySessions:1] 2013-04-03 15:47:00,880 AntiEntropyService.java 
> (line 722) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] session completed 
> successfully
> root@ip-10-46-113-236:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e 
> /var/log/cassandra/system.log
>  INFO [AntiEntropyStage:1] 2013-04-03 15:46:59,944 AntiEntropyService.java 
> (line 244) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Sending completed 
> merkle tree to /10.2.29.38 for (Keyspace1,Standard1)
> root@ip-10-72-111-225:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e 
> /var/log/cassandra/system.log
> root@ip-10-72-111-225:/home/ubuntu# 
> ---
> # nodetool -h 10.46.113.236  repair -pr Keyspace1 Standard1
> [2013-04-03 

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-06-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673060#comment-13673060
 ] 

Kévin LOVATO edited comment on CASSANDRA-5424 at 6/3/13 12:21 PM:
--

We just applied 1.2.5 on our cluster and the repair hanging is fixed, but the 
-pr is still not working as expected.
Our cluster has two datacenters, let's call them dc1 and dc2, we created a 
Keyspace Test_Replication with replication factor _\{ dc1: 3 \}_ (no info for 
dc2) and ran a nodetool repair Test_Replication (that used to hang) on dc2 and 
it exited saying there was nothing to do (which is OK).
Then we changed the replication factor to _\{ dc1: 3, dc2: 3 \}_ and started a 
nodetool repair -pr Test_Replication on cassandra11@dc2 which output this:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}
But even after flushing the Keyspace, there was no data on the server.
We then ran a full repair:
{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 14:01:56,679] Starting repair command #2, repairing 6 ranges for 
keyspace Test_Replication
[2013-06-03 14:01:57,260] Repair session 63632d70-cc45-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 14:01:57,260] Repair session 63650230-cc45-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035243,113427455640312821154458202477256070484]
 finished
[2013-06-03 14:01:57,260] Repair session 6385d0a0-cc45-11e2-bfd5-3d9212e452cc 
for range (1,56713727820156410577229101238628035242] finished
[2013-06-03 14:01:57,260] Repair session 639f7320-cc45-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 14:01:57,260] Repair session 63af51a0-cc45-11e2-bfd5-3d9212e452cc 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 14:01:57,295] Repair session 63b12660-cc45-11e2-bfd5-3d9212e452cc 
for range (113427455640312821154458202477256070485,0] finished
[2013-06-03 14:01:57,295] Repair command #2 finished
{code}
After which we could find the data on dc2 as expected.

So it seems that -pr is still not working as expected, or maybe we're 
doing/understanding something wrong.
(I was not sure if I should open a new ticket or comment this one so please let 
me know if I should move it)

  was (Author: alprema):
We just applied 1.2.5 on our cluster and the repair hanging is fixed, but 
the -pr is still not working as expected.
Our cluster has two datacenters, let's call them dc1 and dc2, we created a 
Keyspace Test_Replication with replication factor _\{ dc1: 3 \}_ (no info for 
dc2) and ran a nodetool repair Test_Replication (that used to hang) on dc2 and 
it exited saying there was nothing to do (which is OK).
Then we changed the replication factor to _\{ dc1: 3, dc2: 3 \}_ and started a 
nodetool repair -pr Test_Replication on cassandra11@dc2 which output this:
{code}
user@cassandra11:~$ nodetool repair -pr Test_Replication
[2013-06-03 13:54:53,948] Starting repair command #1, repairing 1 ranges for 
keyspace Test_Replication
[2013-06-03 13:54:53,985] Repair session 676c00f0-cc44-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 13:54:53,985] Repair command #1 finished
{code}
But even after flushing the Keyspace, there was no data on the server.
We then ran a full repair:
{code}
user@cassandra11:~$ nodetool repair  Test_Replication
[2013-06-03 14:01:56,679] Starting repair command #2, repairing 6 ranges for 
keyspace Test_Replication
[2013-06-03 14:01:57,260] Repair session 63632d70-cc45-11e2-bfd5-3d9212e452cc 
for range (0,1] finished
[2013-06-03 14:01:57,260] Repair session 63650230-cc45-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035243,113427455640312821154458202477256070484]
 finished
[2013-06-03 14:01:57,260] Repair session 6385d0a0-cc45-11e2-bfd5-3d9212e452cc 
for range (1,56713727820156410577229101238628035242] finished
[2013-06-03 14:01:57,260] Repair session 639f7320-cc45-11e2-bfd5-3d9212e452cc 
for range 
(56713727820156410577229101238628035242,56713727820156410577229101238628035243] 
finished
[2013-06-03 14:01:57,260] Repair session 63af51a0-cc45-11e2-bfd5-3d9212e452cc 
for range 
(113427455640312821154458202477256070484,113427455640312821154458202477256070485]
 finished
[2013-06-03 14:01:57,295] Repair session 63b12660-cc45-11e2-bfd5-3d9212e452cc 
for range (113427455640312821154458202477256070485,0] finished
[2013-06-03 14:01:57,295] Repair command #2 finished
{code}
After which we could find the data on dc2 as expected.

So it seem

[jira] [Comment Edited] (CASSANDRA-5424) nodetool repair -pr on all nodes won't repair the full range when a Keyspace isn't in all DC's

2013-04-16 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633274#comment-13633274
 ] 

Yuki Morishita edited comment on CASSANDRA-5424 at 4/16/13 8:01 PM:


v3 attached.

- NTS now uses LinkedHashSet in calculateNaturalEndpoint to preserve insertion 
order while eliminating duplicates.

- I think it is unsafe to use cached endpoints through getNaturalEndpoints 
since tokenMetadata cannot be consistent inside getPrimaryRangesForEndpoint, so 
I stick with impl from v2.

- fix sampleKeyRange. I think the problem is that the name 
tokenMetadata.getPrimaryRangeFor is confusing. Probably we should rename that 
to just getRangeFor.

- Added test for getPrimaryRangesForEndpoint to StorageServiceServerTest.


  was (Author: yukim):
v3 attached.

- NTS now uses LinkedHashSet in calculateNaturalEndpoint to preserve insertion 
order while eliminating duplicates.

- I think it is unsafe to use cached endpoints through getNaturalEndpoints 
since tokenMetadata cannot be consistent inside getPrimaryRangesForEndpoint, so 
I stick with impl from v2.

- fix sampleKeyRange. I think the problem is the nome of the method 
tokenMetadata.getPrimaryRangeFor is confusing. Probably we should rename that 
to just getRangeFor.

- Added test for getPrimaryRangesForEndpoint to StorageServiceServerTest.

  
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> --
>
> Key: CASSANDRA-5424
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5424
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.1.7
>Reporter: Jeremiah Jordan
>Assignee: Yuki Morishita
>Priority: Critical
> Fix For: 1.2.5
>
> Attachments: 5424-1.1.txt, 5424-v2-1.2.txt, 5424-v3-1.2.txt
>
>
> nodetool repair -pr on all nodes won't repair the full range when a Keyspace 
> isn't in all DC's
> Commands follow, but the TL;DR of it, range 
> (127605887595351923798765477786913079296,0] doesn't get repaired between .38 
> node and .236 node until I run a repair, no -pr, on .38
> It seems like primary arnge calculation doesn't take schema into account, but 
> deciding who to ask for merkle tree's from does.
> {noformat}
> Address DC  RackStatus State   LoadOwns   
>  Token   
>   
>  127605887595351923798765477786913079296 
> 10.72.111.225   Cassandra   rack1   Up Normal  455.87 KB   25.00% 
>  0   
> 10.2.29.38  Analytics   rack1   Up Normal  40.74 MB25.00% 
>  42535295865117307932921825928971026432  
> 10.46.113.236   Analytics   rack1   Up Normal  20.65 MB50.00% 
>  127605887595351923798765477786913079296 
> create keyspace Keyspace1
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {Analytics : 2}
>   and durable_writes = true;
> ---
> # nodetool -h 10.2.29.38 repair -pr Keyspace1 Standard1
> [2013-04-03 15:46:58,000] Starting repair command #1, repairing 1 ranges for 
> keyspace Keyspace1
> [2013-04-03 15:47:00,881] Repair session b79b4850-9c75-11e2--8b5bf6ebea9e 
> for range (0,42535295865117307932921825928971026432] finished
> [2013-04-03 15:47:00,881] Repair command #1 finished
> root@ip-10-2-29-38:/home/ubuntu# grep b79b4850-9c75-11e2--8b5bf6ebea9e 
> /var/log/cassandra/system.log
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,009 AntiEntropyService.java 
> (line 676) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] new session: will 
> sync a1/10.2.29.38, /10.46.113.236 on range 
> (0,42535295865117307932921825928971026432] for Keyspace1.[Standard1]
>  INFO [AntiEntropySessions:1] 2013-04-03 15:46:58,015 AntiEntropyService.java 
> (line 881) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] requesting merkle 
> trees for Standard1 (to [/10.46.113.236, a1/10.2.29.38])
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,202 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from /10.46.113.236
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,697 AntiEntropyService.java 
> (line 211) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Received merkle 
> tree for Standard1 from a1/10.2.29.38
>  INFO [AntiEntropyStage:1] 2013-04-03 15:47:00,879 AntiEntropyService.java 
> (line 1015) [repair #b79b4850-9c75-11e2--8b5bf6ebea9e] Endpoints 
> /10.46.113.236 and a1/10.2.29.38 are consistent for Standard1
>  INFO [Ant