Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2014-01-07 Thread Bob Haxo
Hi Andrew,

With configuration fumble, err, test,  that brought about this "of
chickens and eggs and VMs" request, the situation is that the reboot of
the non-host server results in the restart of the VM running on the host
server.

>From earlier [Pacemaker] thread:


> From: Tom Fernandes 
> Subject: [Pacemaker] chicken-egg-problem with libvirtd and a VM within
> cluster
> Date: Thu, 11 Oct 2012 18:09:30 +0200 (09:09 PDT)
> ...
> I observed that when I stop and start corosync on one of the nodes,
> pacemaker 
> (when starting corosync again) wants to check the status of the vm
> before 
> starting libvirtd. This check fails as libvirtd needs to be running
> for this 
> check. After trying for 20s libvirtd starts. The vm gets restarted
> after those 
> 20s and then runs on one of the nodes. I am left with a
> monitoring-error to 
> cleanup and my vm has rebooted.


And the same issue raised by myself earlier:


> From: Bob Haxo 
> Subject: [Pacemaker] GFS2 with Pacemaker on RHEL6.3 restarts with
> reboot
> Date: Wed, 8 Aug 2012 19:14:31 -0700
> ...
> 
> Problem: When the the non-VM-host is rebooted, then when Pacemaker
> restarts the gfs2 filesystem gets restarted on the VM host, which
> causes
> the stop and start of the VirtualDomain. The gfs2 filesystem also gets
> restarted without of the VirtualDomain resource included. 


The "chicken and egg and VMs" configured cluster is no longer available.
Perhaps the output of "crm configure show" has been saved.

Regarding the "chicken and egg and VMs" question, I now avoid the
issue ... somehow, and have moved on to new issues.

Please see the thread: [Pacemaker] "stonith_admin -F node" results in a
pair of reboots.  In particular the Tue, 7 Jan 2014 09:21:54 +0100
response from Fabio Di Nitto. 

The information from Fabio was very helpful. I currently seem to have
arrived at a RHEL 6.5 HA virtual server solution: no  "chicken and egg
and VMs" problem, no "fencing of both servers when only one was
explicitly fenced", no "clvmd startup timed out" resulting in "clvmd:pid
blocked for more than 120 seconds", but with a working VM, a working
live migration and a correct response to a manual fence cmd.  Tomorrow I
will add to that thread the results of my work today.

Regards,
Bob Haxo

On Wed, 2014-01-08 at 10:32 +1100, Andrew Beekhof wrote:

> On 20 Dec 2013, at 5:30 am, Bob Haxo  wrote:
> 
> > Hello,
> > 
> > Earlier emails related to this topic:
> > [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> > [pacemaker] VirtualDomain problem after reboot of one node
> > 
> > 
> > My configuration:
> > 
> > RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
> > 
> > pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> > pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> > pacemaker-1.1.10-14.el6_5.1.x86_64
> > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> > 
> > Two node HA VM cluster using real shared drive, not drbd.
> > 
> > Resources (relevant to this discussion):
> > primitive p_fs_images ocf:heartbeat:Filesystem \
> > primitive p_libvirtd lsb:libvirtd \
> > primitive virt ocf:heartbeat:VirtualDomain \
> > 
> > services chkconfig on: cman, clvmd, pacemaker
> > services chkconfig off: corosync, gfs2, libvirtd
> > 
> > Observation:
> > 
> 
> > Rebooting the NON-host system results in the restart of the VM
> merrily running on the host system.
> 
> I'm still bootstrapping after the break, but I'm not following this.
> Can you rephrase? 
> 
> 
> > 
> > Apparent cause:
> > 
> 
> > Upon startup, Pacemaker apparently checks the status of configured
> resources. However, the status request for the virt
> (ocf:heartbeat:VirtualDomain) resource fails with:
> 
> > 
> > Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> > child_timeout_callback:virt_monitor_0 process (PID 4158) timed out
> > Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> > operation_finished:virt_monitor_0:4158 - timed out after 20ms
> > Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> > operation_finished:virt_monitor_0:4158:stderr [ error: Failed to 
> > reconnect to the hypervisor ]
> > Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> > operation_finished:virt_monitor_0:4158:stderr [ error: no valid 
> > connection ]
> > Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> > operation_finished:virt_monitor_0:4158:stderr [ error: Failed to 
> > connect socket to '/var/run/libvirt/libvirt-sock': No such file or 
> > directory ]
> 
> Sounds like the agent should perhaps be returning OCF_NOT_RUNNING in this 
> case.
> 
> 
> > 
> > 
> > This failure then snowballs into an "orphan" situation in which the
> running VM is restarted.
> > 
> > There was the suggestion of chkconfig on libvirtd (and presumably
> deleting the resource) so that the /var/run/libvirt/libvirt-sock has
> been created by service libvirtd. With libvirtd started by the system,
> there is no un-needed reboot of the VM.
> > 
> > However, it may be that removing libvirtd from Pacema

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2014-01-07 Thread Andrew Beekhof

On 20 Dec 2013, at 5:30 am, Bob Haxo  wrote:

> Hello,
> 
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> [pacemaker] VirtualDomain problem after reboot of one node
> 
> 
> My configuration:
> 
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
> 
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> 
> Two node HA VM cluster using real shared drive, not drbd.
> 
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
> 
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
> 
> Observation:
> 
> Rebooting the NON-host system results in the restart of the VM merrily 
> running on the host system.

I'm still bootstrapping after the break, but I'm not following this.  Can you 
rephrase? 

> 
> Apparent cause:
> 
> Upon startup, Pacemaker apparently checks the status of configured resources. 
> However, the status request for the virt (ocf:heartbeat:VirtualDomain) 
> resource fails with:
> 
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> child_timeout_callback:virt_monitor_0 process (PID 4158) timed out
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: operation_finished:  
>   virt_monitor_0:4158 - timed out after 20ms
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: no valid connection ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': No such file or directory ]

Sounds like the agent should perhaps be returning OCF_NOT_RUNNING in this case.

> 
> 
> This failure then snowballs into an "orphan" situation in which the running 
> VM is restarted.
> 
> There was the suggestion of chkconfig on libvirtd (and presumably deleting 
> the resource) so that the /var/run/libvirt/libvirt-sock has been created by 
> service libvirtd. With libvirtd started by the system, there is no un-needed 
> reboot of the VM.
> 
> However, it may be that removing libvirtd from Pacemaker control leaves the 
> VM vdisk filesystem susceptible to corruption during a reboot induced 
> failover.
> 
> Question:
> 
> Is there an accepted Pacemaker configuration such that the un-needed restart 
> of the VM does not occur with the reboot of the non-host system?
> 
> Regards,
> Bob Haxo
> 
> 
> 
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2013-12-19 Thread Bob Haxo
Hi Emmanuel,


> i don't see any reason for put libvirtd as primitive in pacemaker


Yes ... well, maybe.  During my testing of failure scenarios (in
particular, reboot of the VM host), several times the VM filesystem
ended up corrupted and I needed to reinstall the VM.  At least a couple
of these failures occurred when I was testing with the system starting
libvirtd and not controlling libvirtd start/stop via a cloned resource.

And, those failures are the reason that I'm seeking the wisdom of
others.

Now that I understand more the issues, I will be again testing system
start of libvirt, with more care.

Thanks,
Bob Haxo



On Thu, 2013-12-19 at 21:30 +0100, emmanuel segura wrote:
> remove the libvirtd from pacemaker and chkconfig libvirtd on every
> node, like that the cluster just manage the vm, maybe i wrong but i
> don't see any reason for put libvirtd as primitivi in pacemaker
> 
> 
> 
> 
> 2013/12/19 Bob Haxo 
> 
> Hi Emmanuel,
> 
> Thanks for the suggestions. It is pretty clear what is the
> problem; it's just not clear what is the fix or the
> work-around.  
> 
> Search the Pacemaker email archive for the email of Andrew
> Beekhof, 12 Oct 2012, "Re: [Pacemaker] chicken-egg-problem
> with libvirtd and a VM within cluster", and the email to which
> he is responding (from Tom Fernandes).
> 
> The status/monitor function of VirtualDomain fails because
> the /var/run/libvirt/libvirt-sock has not been created.  This
> socket is created by the lsb:libvirtd, but that is not started
> (as a resource) until Pacemaker has heard back from
> heartbeat:VirtualDomain, which will never happen
> until /var/run/libvirt/libvirt-sock has been created ("service
> libvirtd start" during this wait period does enable Pacemaker
> to continue starting resources).  After the VirtualDomain
> monitor function timeout, Pacemaker deals with the failing
> logic loop, resulting in a re-start of the VM.
> 
> I hoping that "Unfortunately we still don't have a good answer
> for you." is no longer the case, and that there is a fix or
> that there is a community accepted workaround for the issue.
> 
> 
> Regards,
> Bob Haxo
> 
> 
> 
> 
> 
> 
> 
> On Thu, 2013-12-19 at 19:48 +0100, emmanuel segura wrote: 
> 
> > Maybe the problem is this, the cluster try to start the vm
> > and libvirtd isn't started
> > 
> > 
> > 
> > 2013/12/19 emmanuel segura 
> > 
> > if don't set your vm to start at boot time, you
> > don't to put in cluster libvirtd, maybe the problem
> > isn't this, but why put the os services in cluster,
> > for example crond .. :)
> > 
> > 
> > 
> > 2013/12/19 Bob Haxo  
> > 
> > Hello,
> > 
> > Earlier emails related to this topic:
> > [pacemaker] chicken-egg-problem with
> > libvirtd and a VM within cluster
> > [pacemaker] VirtualDomain problem after
> > reboot of one node
> > 
> > 
> > My configuration:
> > 
> > RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
> > 
> > pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> > pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> > pacemaker-1.1.10-14.el6_5.1.x86_64
> > pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> > 
> > Two node HA VM cluster using real shared
> > drive, not drbd.
> > 
> > Resources (relevant to this discussion):
> > primitive p_fs_images
> > ocf:heartbeat:Filesystem \
> > primitive p_libvirtd lsb:libvirtd \
> > primitive virt ocf:heartbeat:VirtualDomain \
> > 
> > services chkconfig on: cman, clvmd,
> > pacemaker
> > services chkconfig off: corosync, gfs2,
> > libvirtd
> > 
> > Observation:
> > 
> > Rebooting the NON-host system results in the
> > restart of the VM merrily running on the
> > host system.
> >   

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2013-12-19 Thread emmanuel segura
remove the libvirtd from pacemaker and chkconfig libvirtd on every node,
like that the cluster just manage the vm, maybe i wrong but i don't see any
reason for put libvirtd as primitivi in pacemaker


2013/12/19 Bob Haxo 

>  Hi Emmanuel,
>
> Thanks for the suggestions. It is pretty clear what is the problem; it's
> just not clear what is the fix or the work-around.
>
> Search the Pacemaker email archive for the email of Andrew Beekhof, 12 Oct
> 2012, "Re: [Pacemaker] chicken-egg-problem with libvirtd and a VM within
> cluster", and the email to which he is responding (from Tom Fernandes).
>
> The status/monitor function of VirtualDomain fails because the
> /var/run/libvirt/libvirt-sock has not been created.  This socket is
> created by the lsb:libvirtd, but that is not started (as a resource) until
> Pacemaker has heard back from heartbeat:VirtualDomain, which will never
> happen until /var/run/libvirt/libvirt-sock has been created ("service
> libvirtd start" during this wait period does enable Pacemaker to continue
> starting resources).  After the VirtualDomain monitor function timeout,
> Pacemaker deals with the failing logic loop, resulting in a re-start of the
> VM.
>
> I hoping that "Unfortunately we still don't have a good answer for you."
> is no longer the case, and that there is a fix or that there is a community
> accepted workaround for the issue.
>
>
> Regards,
> Bob Haxo
>
>
>
>
>
>
> On Thu, 2013-12-19 at 19:48 +0100, emmanuel segura wrote:
>
> Maybe the problem is this, the cluster try to start the vm and libvirtd
> isn't started
>
>
>
>  2013/12/19 emmanuel segura 
>
>  if don't set your vm to start at boot time, you don't to put in cluster
> libvirtd, maybe the problem isn't this, but why put the os services in
> cluster, for example crond .. :)
>
>
>
>   2013/12/19 Bob Haxo 
>
>   Hello,
>
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> [pacemaker] VirtualDomain problem after reboot of one node
>
>
> My configuration:
>
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
>
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
>
> Two node HA VM cluster using real shared drive, not drbd.
>
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
>
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
>
> Observation:
>
> Rebooting the NON-host system results in the restart of the VM merrily
> running on the host system.
>
> Apparent cause:
>
> Upon startup, Pacemaker apparently checks the status of configured
> resources. However, the status request for the virt
> (ocf:heartbeat:VirtualDomain) resource fails with:
>
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> child_timeout_callback:virt_monitor_0 process (PID 4158) timed outDec 
> 18 12:19:30 [4147] mici-admin2   lrmd:  warning: operation_finished:
> virt_monitor_0:4158 - timed out after 20msDec 18 12:19:30 [4147] 
> mici-admin2   lrmd:   notice: operation_finished:
> virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor 
> ]Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished: 
>virt_monitor_0:4158:stderr [ error: no valid connection ]Dec 18 12:19:30 
> [4147] mici-admin2   lrmd:   notice: operation_finished:
> virt_monitor_0:4158:stderr [ error: Failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': No such file or directory ]
>
> This failure then snowballs into an "orphan" situation in which the
> running VM is restarted.
>
> There was the suggestion of chkconfig on libvirtd (and presumably deleting
> the resource) so that the /var/run/libvirt/libvirt-sock has been created by
> service libvirtd. With libvirtd started by the system, there is no
> un-needed reboot of the VM.
>
> However, it may be that removing libvirtd from Pacemaker control leaves
> the VM vdisk filesystem susceptible to corruption during a reboot induced
> failover.
>
> Question:
>
> Is there an accepted Pacemaker configuration such that the un-needed
> restart of the VM does not occur with the reboot of the non-host system?
>
> Regards,
> Bob Haxo
>
>
>
>
>
>
>
>
>
>___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>  ___Pacemaker mailing list: 
> Pacemaker@oss.clusterlabs.orghttp://oss.

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2013-12-19 Thread Bob Haxo
Hi Emmanuel,

Thanks for the suggestions. It is pretty clear what is the problem; it's
just not clear what is the fix or the work-around.  

Search the Pacemaker email archive for the email of Andrew Beekhof, 12
Oct 2012, "Re: [Pacemaker] chicken-egg-problem with libvirtd and a VM
within cluster", and the email to which he is responding (from Tom
Fernandes).

The status/monitor function of VirtualDomain fails because
the /var/run/libvirt/libvirt-sock has not been created.  This socket is
created by the lsb:libvirtd, but that is not started (as a resource)
until Pacemaker has heard back from heartbeat:VirtualDomain, which will
never happen until /var/run/libvirt/libvirt-sock has been created
("service libvirtd start" during this wait period does enable Pacemaker
to continue starting resources).  After the VirtualDomain monitor
function timeout, Pacemaker deals with the failing logic loop, resulting
in a re-start of the VM.

I hoping that "Unfortunately we still don't have a good answer for you."
is no longer the case, and that there is a fix or that there is a
community accepted workaround for the issue.


Regards,
Bob Haxo





On Thu, 2013-12-19 at 19:48 +0100, emmanuel segura wrote:
> Maybe the problem is this, the cluster try to start the vm and
> libvirtd isn't started
> 
> 
> 
> 
> 2013/12/19 emmanuel segura 
> 
> if don't set your vm to start at boot time, you don't to put
> in cluster libvirtd, maybe the problem isn't this, but why put
> the os services in cluster, for example crond .. :)
> 
> 
> 
> 
> 2013/12/19 Bob Haxo 
> 
> Hello,
> 
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM
> within cluster
> [pacemaker] VirtualDomain problem after reboot of one
> node
> 
> 
> My configuration:
> 
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
> 
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
> 
> Two node HA VM cluster using real shared drive, not
> drbd.
> 
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
> 
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
> 
> Observation:
> 
> Rebooting the NON-host system results in the restart
> of the VM merrily running on the host system.
> 
> Apparent cause:
> 
> Upon startup, Pacemaker apparently checks the status
> of configured resources. However, the status request
> for the virt (ocf:heartbeat:VirtualDomain) resource
> fails with:
> 
> 
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> child_timeout_callback:virt_monitor_0 process (PID 4158) timed out
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> operation_finished:virt_monitor_0:4158 - timed out after 20ms
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> operation_finished:virt_monitor_0:4158:stderr [ error: Failed to 
> reconnect to the hypervisor ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> operation_finished:virt_monitor_0:4158:stderr [ error: no valid 
> connection ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: 
> operation_finished:virt_monitor_0:4158:stderr [ error: Failed to connect 
> socket to '/var/run/libvirt/libvirt-sock': No such file or directory ]
> 
> 
> This failure then snowballs into an "orphan" situation
> in which the running VM is restarted.
> 
> There was the suggestion of chkconfig on libvirtd (and
> presumably deleting the resource) so that
> the /var/run/libvirt/libvirt-sock has been created by
> service libvirtd. With libvirtd started by the system,
> there is no un-needed reboot of the VM.
> 
> However, it may be that removing libvirtd from
> Pacemaker control leaves the VM vdisk filesystem
> susceptible to corrup

Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2013-12-19 Thread emmanuel segura
Maybe the problem is this, the cluster try to start the vm and libvirtd
isn't started


2013/12/19 emmanuel segura 

> if don't set your vm to start at boot time, you don't to put in cluster
> libvirtd, maybe the problem isn't this, but why put the os services in
> cluster, for example crond .. :)
>
>
> 2013/12/19 Bob Haxo 
>
>>  Hello,
>>
>> Earlier emails related to this topic:
>> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
>> [pacemaker] VirtualDomain problem after reboot of one node
>>
>>
>> My configuration:
>>
>> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
>>
>> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
>> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
>> pacemaker-1.1.10-14.el6_5.1.x86_64
>> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
>>
>> Two node HA VM cluster using real shared drive, not drbd.
>>
>> Resources (relevant to this discussion):
>> primitive p_fs_images ocf:heartbeat:Filesystem \
>> primitive p_libvirtd lsb:libvirtd \
>> primitive virt ocf:heartbeat:VirtualDomain \
>>
>> services chkconfig on: cman, clvmd, pacemaker
>> services chkconfig off: corosync, gfs2, libvirtd
>>
>> Observation:
>>
>> Rebooting the NON-host system results in the restart of the VM merrily
>> running on the host system.
>>
>> Apparent cause:
>>
>> Upon startup, Pacemaker apparently checks the status of configured
>> resources. However, the status request for the virt
>> (ocf:heartbeat:VirtualDomain) resource fails with:
>>
>> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
>> child_timeout_callback:virt_monitor_0 process (PID 4158) timed out
>> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: operation_finished: 
>>virt_monitor_0:4158 - timed out after 20ms
>> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished: 
>>virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor 
>> ]
>> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished: 
>>virt_monitor_0:4158:stderr [ error: no valid connection ]
>> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished: 
>>virt_monitor_0:4158:stderr [ error: Failed to connect socket to 
>> '/var/run/libvirt/libvirt-sock': No such file or directory ]
>>
>>
>> This failure then snowballs into an "orphan" situation in which the
>> running VM is restarted.
>>
>> There was the suggestion of chkconfig on libvirtd (and presumably
>> deleting the resource) so that the /var/run/libvirt/libvirt-sock has been
>> created by service libvirtd. With libvirtd started by the system, there is
>> no un-needed reboot of the VM.
>>
>> However, it may be that removing libvirtd from Pacemaker control leaves
>> the VM vdisk filesystem susceptible to corruption during a reboot induced
>> failover.
>>
>> Question:
>>
>> Is there an accepted Pacemaker configuration such that the un-needed
>> restart of the VM does not occur with the reboot of the non-host system?
>>
>> Regards,
>> Bob Haxo
>>
>>
>>
>>
>>
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] reboot of non-vm host results in VM restart -- of chickens and eggs and VMs

2013-12-19 Thread emmanuel segura
if don't set your vm to start at boot time, you don't to put in cluster
libvirtd, maybe the problem isn't this, but why put the os services in
cluster, for example crond .. :)


2013/12/19 Bob Haxo 

>  Hello,
>
> Earlier emails related to this topic:
> [pacemaker] chicken-egg-problem with libvirtd and a VM within cluster
> [pacemaker] VirtualDomain problem after reboot of one node
>
>
> My configuration:
>
> RHEL6.5/CMAN/gfs2/Pacemaker/crmsh
>
> pacemaker-libs-1.1.10-14.el6_5.1.x86_64
> pacemaker-cli-1.1.10-14.el6_5.1.x86_64
> pacemaker-1.1.10-14.el6_5.1.x86_64
> pacemaker-cluster-libs-1.1.10-14.el6_5.1.x86_64
>
> Two node HA VM cluster using real shared drive, not drbd.
>
> Resources (relevant to this discussion):
> primitive p_fs_images ocf:heartbeat:Filesystem \
> primitive p_libvirtd lsb:libvirtd \
> primitive virt ocf:heartbeat:VirtualDomain \
>
> services chkconfig on: cman, clvmd, pacemaker
> services chkconfig off: corosync, gfs2, libvirtd
>
> Observation:
>
> Rebooting the NON-host system results in the restart of the VM merrily
> running on the host system.
>
> Apparent cause:
>
> Upon startup, Pacemaker apparently checks the status of configured
> resources. However, the status request for the virt
> (ocf:heartbeat:VirtualDomain) resource fails with:
>
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: 
> child_timeout_callback:virt_monitor_0 process (PID 4158) timed out
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:  warning: operation_finished:  
>   virt_monitor_0:4158 - timed out after 20ms
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to reconnect to the hypervisor ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: no valid connection ]
> Dec 18 12:19:30 [4147] mici-admin2   lrmd:   notice: operation_finished:  
>   virt_monitor_0:4158:stderr [ error: Failed to connect socket to 
> '/var/run/libvirt/libvirt-sock': No such file or directory ]
>
>
> This failure then snowballs into an "orphan" situation in which the
> running VM is restarted.
>
> There was the suggestion of chkconfig on libvirtd (and presumably deleting
> the resource) so that the /var/run/libvirt/libvirt-sock has been created by
> service libvirtd. With libvirtd started by the system, there is no
> un-needed reboot of the VM.
>
> However, it may be that removing libvirtd from Pacemaker control leaves
> the VM vdisk filesystem susceptible to corruption during a reboot induced
> failover.
>
> Question:
>
> Is there an accepted Pacemaker configuration such that the un-needed
> restart of the VM does not occur with the reboot of the non-host system?
>
> Regards,
> Bob Haxo
>
>
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org