Re: Can I cancel a decommissioning procedure??

2019-06-05 Thread Alain RODRIGUEZ
Sure, you're welcome, glad to hear it worked! =)

Thanks for letting us know/reporting this back here, it might matter for
other people as well.

C*heers!
Alain


Le mer. 5 juin 2019 à 07:45, William R  a écrit :

> Eventually after the reboot the decommission was cancelled. Thanks a lot
> for the info!
>
> Cheers
>
>
> Sent with ProtonMail  Secure Email.
>
> ‐‐‐ Original Message ‐‐‐
> On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ 
> wrote:
>
> > the issue is that the rest nodes in the cluster marked it as DL
> (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up!
>
> The last information other nodes had is that this node is leaving, and
> down, that's expected in this situation. When the node comes back online,
> it should come back UN and 'quickly' other nodes should ACK it.
>
> During decommission, the node itself is responsible for streaming its data
> over. Streams were stopped as the node went down, Cassandra won't remove
> the node unless data was streamed properly (or if you force  the node out).
> I don't think that there is a decommission 'resume', and even les that it
> is enabled by default.
> Thus when the node comes back, the only possible option I see is a
> 'regular' start for that node and other to acknowledge that the node is up
> and not leaving anymore.
>
> The only consequence I expect (other than the node missing the latest
> data) is that other nodes might have some extra data due to the
> decommission attempts. If that's needed (streaming for long or no TTL), you
> can consider using 'nodetool cleanup -j 2' on all the other nodes than the
> one that went down, to remove the extra data (and free space).
>
>  I did restart, still waiting to come up (normally takes ~ 30 minutes)
>>
>
> 30 minutes to start the nodes sounds like a long time to me, but well,
> that's another topic.
>
> C*heers
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 4 juin 2019 à 18:31, William R  a écrit :
>
>> Hi Alain,
>>
>> Thank you for your comforting reply :)  I did restart, still waiting to
>> come up (normally takes ~ 30 minutes) , the issue is that the rest nodes in
>> the cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed..
>> Lets see once is up!
>>
>>
>> Sent with ProtonMail  Secure Email.
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ 
>> wrote:
>>
>> Hello William,
>>
>> At the moment we keep the node down before figure out a way to cancel
>>> that.
>>>
>>
>> Off the top of my head, a restart of the node is the way to go to cancel
>> a decommission.
>> I think you did the right thing and your safety measure is also the fix
>> here :).
>>
>> Did you try to bring it up again?
>>
>> If it's really critical, you can probably test that quickly with ccm (
>> https://github.com/riptano/ccm), tlp-cluster (
>> https://github.com/thelastpickle/tlp-cluster) or simply with any
>> existing dev/test environment if you have any available with some data.
>>
>> Good luck with that, a PEBKAC issue are the worst. You can do a lot of
>> damage, could always have avoided it and it makes you feel terrible.
>> It doesn't sound that bad in your case though, I've seen (and done)
>> worse  ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are
>> unpredictable :).
>> Nonetheless, and to go back to something more serious, there are ways to
>> limit the amount and possible scope of those, such as good practices,
>> testing and automations.
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> Le mar. 4 juin 2019 à 17:55, William R  a
>> écrit :
>>
>>> Hi,
>>>
>>> Was an accidental decommissioning of a node and we really need to to
>>> cancel it.. is there any way? At the moment we keep the node down before
>>> figure out a way to cancel that.
>>>
>>> Thanks
>>>
>>
>>
>


Re: Can I cancel a decommissioning procedure??

2019-06-04 Thread William R
Eventually after the reboot the decommission was cancelled. Thanks a lot for 
the info!

Cheers

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐ Original Message ‐‐‐
On Tuesday, June 4, 2019 10:59 PM, Alain RODRIGUEZ  wrote:

>> the issue is that the rest nodes in the cluster marked it as DL 
>> (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is up!
>
> The last information other nodes had is that this node is leaving, and down, 
> that's expected in this situation. When the node comes back online, it should 
> come back UN and 'quickly' other nodes should ACK it.
>
> During decommission, the node itself is responsible for streaming its data 
> over. Streams were stopped as the node went down, Cassandra won't remove the 
> node unless data was streamed properly (or if you force  the node out). I 
> don't think that there is a decommission 'resume', and even les that it is 
> enabled by default.
> Thus when the node comes back, the only possible option I see is a 'regular' 
> start for that node and other to acknowledge that the node is up and not 
> leaving anymore.
>
> The only consequence I expect (other than the node missing the latest data) 
> is that other nodes might have some extra data due to the decommission 
> attempts. If that's needed (streaming for long or no TTL), you can consider 
> using 'nodetool cleanup -j 2' on all the other nodes than the one that went 
> down, to remove the extra data (and free space).
>
>>  I did restart, still waiting to come up (normally takes ~ 30 minutes)
>
> 30 minutes to start the nodes sounds like a long time to me, but well, that's 
> another topic.
>
> C*heers
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 4 juin 2019 à 18:31, William R  a écrit :
>
>> Hi Alain,
>>
>> Thank you for your comforting reply :)  I did restart, still waiting to come 
>> up (normally takes ~ 30 minutes) , the issue is that the rest nodes in the 
>> cluster marked it as DL (DOWN/LEAVING) thats why I am kinda stressed.. Lets 
>> see once is up!
>>
>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>
>> ‐‐‐ Original Message ‐‐‐
>> On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ  wrote:
>>
>>> Hello William,
>>>
 At the moment we keep the node down before figure out a way to cancel that.
>>>
>>> Off the top of my head, a restart of the node is the way to go to cancel a 
>>> decommission.
>>> I think you did the right thing and your safety measure is also the fix 
>>> here :).
>>>
>>> Did you try to bring it up again?
>>>
>>> If it's really critical, you can probably test that quickly with ccm 
>>> (https://github.com/riptano/ccm), tlp-cluster 
>>> (https://github.com/thelastpickle/tlp-cluster) or simply with any existing 
>>> dev/test environment if you have any available with some data.
>>>
>>> Good luck with that, a PEBKAC issue are the worst. You can do a lot of 
>>> damage, could always have avoided it and it makes you feel terrible.
>>> It doesn't sound that bad in your case though, I've seen (and done) worse  
>>> ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are unpredictable :).
>>> Nonetheless, and to go back to something more serious, there are ways to 
>>> limit the amount and possible scope of those, such as good practices, 
>>> testing and automations.
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> Le mar. 4 juin 2019 à 17:55, William R  a 
>>> écrit :
>>>
 Hi,

 Was an accidental decommissioning of a node and we really need to to 
 cancel it.. is there any way? At the moment we keep the node down before 
 figure out a way to cancel that.

 Thanks

Re: Can I cancel a decommissioning procedure??

2019-06-04 Thread William R
Hi Alain,

Thank you for your comforting reply :)  I did restart, still waiting to come up 
(normally takes ~ 30 minutes) , the issue is that the rest nodes in the cluster 
marked it as DL (DOWN/LEAVING) thats why I am kinda stressed.. Lets see once is 
up!

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐ Original Message ‐‐‐
On Tuesday, June 4, 2019 7:25 PM, Alain RODRIGUEZ  wrote:

> Hello William,
>
>> At the moment we keep the node down before figure out a way to cancel that.
>
> Off the top of my head, a restart of the node is the way to go to cancel a 
> decommission.
> I think you did the right thing and your safety measure is also the fix here 
> :).
>
> Did you try to bring it up again?
>
> If it's really critical, you can probably test that quickly with ccm 
> (https://github.com/riptano/ccm), tlp-cluster 
> (https://github.com/thelastpickle/tlp-cluster) or simply with any existing 
> dev/test environment if you have any available with some data.
>
> Good luck with that, a PEBKAC issue are the worst. You can do a lot of 
> damage, could always have avoided it and it makes you feel terrible.
> It doesn't sound that bad in your case though, I've seen (and done) worse  
> ¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are unpredictable :).
> Nonetheless, and to go back to something more serious, there are ways to 
> limit the amount and possible scope of those, such as good practices, testing 
> and automations.
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mar. 4 juin 2019 à 17:55, William R  a 
> écrit :
>
>> Hi,
>>
>> Was an accidental decommissioning of a node and we really need to to cancel 
>> it.. is there any way? At the moment we keep the node down before figure out 
>> a way to cancel that.
>>
>> Thanks

Re: Can I cancel a decommissioning procedure??

2019-06-04 Thread Alain RODRIGUEZ
Hello William,

At the moment we keep the node down before figure out a way to cancel that.
>

Off the top of my head, a restart of the node is the way to go to cancel a
decommission.
I think you did the right thing and your safety measure is also the fix
here :).

Did you try to bring it up again?

If it's really critical, you can probably test that quickly with ccm (
https://github.com/riptano/ccm), tlp-cluster (
https://github.com/thelastpickle/tlp-cluster) or simply with any existing
dev/test environment if you have any available with some data.

Good luck with that, a PEBKAC issue are the worst. You can do a lot of
damage, could always have avoided it and it makes you feel terrible.
It doesn't sound that bad in your case though, I've seen (and done) worse
¯\_(ツ)_/¯. It's hard to fight PEBKACs, we, operators, are unpredictable :).
Nonetheless, and to go back to something more serious, there are ways to
limit the amount and possible scope of those, such as good practices,
testing and automations.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le mar. 4 juin 2019 à 17:55, William R  a
écrit :

> Hi,
>
> Was an accidental decommissioning of a node and we really need to to
> cancel it.. is there any way? At the moment we keep the node down before
> figure out a way to cancel that.
>
> Thanks
>


Can I cancel a decommissioning procedure??

2019-06-04 Thread William R
Hi,

Was an accidental decommissioning of a node and we really need to to cancel 
it.. is there any way? At the moment we keep the node down before figure out a 
way to cancel that.

Thanks