[ https://issues.apache.org/jira/browse/CASSANDRA-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Watson updated CASSANDRA-5525: ----------------------------------- Description: 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', restarted and ran upgradesstables and cleanup. Tried to join 2 additional nodes into the ring. However, 1 of the new nodes ran out of disk space. This started causing 'no host id' alerts in the live cluster when attempting to store hints for that node. {noformat} ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main] java.lang.AssertionError: Missing host ID {noformat} The other node I killed to stop it from continuing to join. Since the live cluster was now in some sort of broken state dropping mutation messages on 3 nodes. This was fixed by restarting them, however 1 node never stopped, so had to decomm it (leaving the original cluster at 11 nodes.) Ring pre-join: {noformat} Load Tokens Owns (effective) Host ID 147.55 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 124.99 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 136.63 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 141.78 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 137.74 GB 256 16.7% 6d726cbf-147d-426e-a735-e14928c95e45 135.9 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 165.96 GB 256 16.7% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 135.41 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 143.38 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 178.05 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 194.92 GB 256 25.0% 361d7e31-b155-4ce1-8890-451b3ddf46cf 150.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 {noformat} Ring after decomm bad node: {noformat} Load Tokens Owns (effective) Host ID 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 {noformat} was: 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', restarted and ran upgradesstables and cleanup. Tried to join 2 additional nodes into the ring. However, 1 of the new nodes ran out of disk space. This started causing 'no host id' alerts in the live cluster when attempting to store hints for that node. {noformat} ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main] java.lang.AssertionError: Missing host ID {noformat} The other node I killed to stop it from continuing to join. Since the live cluster was now in some sort of broken state dropping mutation messages on a 3 nodes. This was fixed by restarting them, however 1 node never stopped, so had to decomm it (leaving the original cluster at 11 nodes.) Ring pre-join: {noformat} Load Tokens Owns (effective) Host ID 147.55 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 124.99 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 136.63 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 141.78 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 137.74 GB 256 16.7% 6d726cbf-147d-426e-a735-e14928c95e45 135.9 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 165.96 GB 256 16.7% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 135.41 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 143.38 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 178.05 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 194.92 GB 256 25.0% 361d7e31-b155-4ce1-8890-451b3ddf46cf 150.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 {noformat} Ring after decomm bad node: {noformat} Load Tokens Owns (effective) Host ID 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 {noformat} > Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node > load > ------------------------------------------------------------------------------- > > Key: CASSANDRA-5525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5525 > Project: Cassandra > Issue Type: Bug > Reporter: John Watson > Attachments: Screen Shot 2013-04-25 at 12.35.24 PM.png > > > 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', > restarted and ran upgradesstables and cleanup. > Tried to join 2 additional nodes into the ring. > However, 1 of the new nodes ran out of disk space. This started causing 'no > host id' alerts in the live cluster when attempting to store hints for that > node. > {noformat} > ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main] > java.lang.AssertionError: Missing host ID > {noformat} > The other node I killed to stop it from continuing to join. Since the live > cluster was now in some sort of broken state dropping mutation messages on 3 > nodes. This was fixed by restarting them, however 1 node never stopped, so > had to decomm it (leaving the original cluster at 11 nodes.) > Ring pre-join: > {noformat} > Load Tokens Owns (effective) Host ID > 147.55 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 > 124.99 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 > 136.63 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce > 141.78 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 > 137.74 GB 256 16.7% 6d726cbf-147d-426e-a735-e14928c95e45 > 135.9 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 > 165.96 GB 256 16.7% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 > 135.41 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 > 143.38 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 > 178.05 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed > 194.92 GB 256 25.0% 361d7e31-b155-4ce1-8890-451b3ddf46cf > 150.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 > {noformat} > Ring after decomm bad node: > {noformat} > Load Tokens Owns (effective) Host ID > 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27 > 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450 > 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce > 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3 > 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950 > 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45 > 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8 > 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283 > 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37 > 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed > 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira