On 06/30/2013 07:26 PM, Barak Azulay wrote:
> 
> 
> ----- Original Message -----
>> From: "Dan Kenigsberg" <dan...@redhat.com>
>> To: "Eli Mesika" <emes...@redhat.com>
>> Cc: engine-devel@ovirt.org
>> Sent: Sunday, June 30, 2013 5:40:49 PM
>> Subject: Re: [Engine-devel] SSH Soft Fencing
>>
>> On Thu, Jun 27, 2013 at 08:48:39AM -0400, Eli Mesika wrote:
>>>
>>>
>>> ----- Original Message -----
>>>> From: "Martin Perina" <mper...@redhat.com>
>>>> To: engine-devel@ovirt.org
>>>> Cc: "Yair Zaslavsky" <yzasl...@redhat.com>, "Barak Azulay"
>>>> <bazu...@redhat.com>, "Eli Mesika" <emes...@redhat.com>
>>>> Sent: Thursday, June 27, 2013 1:51:06 PM
>>>> Subject: SSH Soft Fencing
>>>>
>>>> Hi,
>>>>
>>>> SSH Soft Fencing is a new feature for 3.3 and it tries to restart VDSM
>>>> using SSH connection on non responsive hosts prior to real fencing.
>>>> More info can be found at
>>>>
>>>> http://www.ovirt.org/Automatic_Fencing#Automatic_Fencing_in_oVirt_3.3
>>>>
>>>> In current SSH Soft Fencing implementation the restart VDSM using SSH
>>>> command is part of standard fencing implementation in
>>>> VdsNotRespondingTreatmentCommand. But this command is executed only
>>>> if a host has a valid PM configuration. If host doesn't have a valid
>>>> PM configuration, the execution of the command is disabled and host
>>>> state is change to Non Responsive.
>>>>
>>>> So my question are:
>>>>
>>>> 1) Should SSH Soft Fencing be executed on hosts without valid PM
>>>>    configuration?
>>>
>>> I think that the answer should be yes. The vdsm restart will solve most of
>>> problems
>>
>> Would you enumerate the problems that would be solved by a vdsm restart
>> (on list, but on the feature page, too)?
>> I am aware of two issues, both are vdsm bugs:
>> - If libvirtd crashes, vdsm not is not restarted unless there are
>>   running VMs
>> - Vdsm had several bugs in its soft prepareForShutdown process, getting
>>   itself stuck there in case of various background storage processes.
>>
>> I think that solving these two issues would be safer and cleaner than
>> introducing `ssh host service vdsmd restart` flow.
>>
>> The first issue is only a matter of untangling some vdsm internal
>> ugliness: whenever a libvirtconnection is produced, it should be wrapped
>> so that it cathces libvirt crashes. Unlike now, where only VM-related
>> libvirtconnection undergo this treatment.
>>
>> The second issue can be avoiding by vdsm resorting to kill-9-ing itself.
>> After all, this is what `service vdsmd restart` ends up doing after a
>> VERY short timeout (2-3 seconds, iicr).
>>
>> I suppose that there are other reasoning for a remote restart, but in
>> general, I think that it's better to have Vdsm "do the right thing" than
>> expecting Engine to control that remotely.
> 
> 
> theoretically you are absolutely right, but this is much more challenging 
> when the platform you are using keeps changing and might introduce unfamiliar 
> behaviors or bugs.
> You have enumerated several issues that we have encountered in the past and 
> were fixed by us or by different components.
> - libvirt related
> - prepareForShutdown
> - ... I even remember some from SuperVDSM
> 
> All the above eventually were handled brutally by the engine and caused the 
> host to be entirely fenced and all running VMs were killed (and the service 
> they gave went down).
> 
> This is about trying to handle an unexpected situation in a more somewhat 
> delicate manner that in most cases will save killing the VMs, in a scenario 
> where the host is going to be fenced anyway 
> 

+1
We can not anticipates our own bugs ;)


> Now the question Martin had raised is whether this functionality should be 
> applied also when a host has no physical Power-Management device, 
> 
> Hopes this provides the info you refereed to.
> 
> 
> Thanks
> Barak Azulay 
> 
> 
>>
>> Regards,
>> Dan.
>>> , so why not using it whether a PM agent is defined or not.
>>>
>>>>
>>>> 2) Should VDSM restart using SSH command be reimplemented
>>>>    as standalone command to be usable also in other parts of engine?
>>>>    If 1) is true, I think it will have to be done anyway.
>>>
>>> +1
>>>
>>>>
>>>>
>>>> Martin Perina
>>>>
>>> _______________________________________________
>>> Engine-devel mailing list
>>> Engine-devel@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/engine-devel
>> _______________________________________________
>> Engine-devel mailing list
>> Engine-devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/engine-devel
>>
>>
>>
> _______________________________________________
> Engine-devel mailing list
> Engine-devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/engine-devel
> 

_______________________________________________
Engine-devel mailing list
Engine-devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/engine-devel

Reply via email to