[Linux-HA] OpenVZ CTs on NFS troubles

2013-04-03 Thread Roman Haefeli
Hi all

We are using pacemaker and the RA heartbeat:ManageVE to manage a number
of OpenVZ CTs on a three-node cluster. The CTs are hosted on an NFS
mount shared by all three nodes which is managed by by the
heartbeat:Filesystem RA.

We get errors in certain situations, for instance when starting all
resources or when switching nodes between online and standby states. A
few examples:


error after setting node virtuetest2 to 'online':
-
Mar 25 18:05:08 virtuetest2 ManageVE(ploop8)[382818]: ERROR: Restoring 
container ... Unable to open /virtual/vz/dump/Dump.58: No such file or directory


It seems migrate_from_ve() of ManageVE is executed before the files from
the NFS mount are available. It looks like heartbeat:Filesystem is
reporting success too early, when mounting NFS. 


error after setting node virtuetest3 to 'online':
-
Mar 25 18:17:58 virtuetest3 pengine: [1627]: ERROR: native_create_actions: 
Resource ploop3 (ocf::ManageVE) is active on 2 nodes attempting recovery

This happened also during a migration of the CT. Obviously pacemaker
sees the CT running on two nodes, although the CT was suspended on the
source node.

This problem triggers pacemaker to restart the CT on the target node
unnecessarily.



error after setting node virtuetest1 to 'online' (starting all
resources):
--
Mar 28 11:49:51 virtuetest1 ManageVE(ploop6)[782233]: ERROR: Starting 
container... Adding delta dev=/dev/ploop44896 
img=/virtual/vz/private/56/root.hdd/root.hdd (rw) Error in reread_part 
(ploop.c:988): BLKRRPART /dev/ploop44896: Input/output error Error in 
ploop_mount_fs (ploop.c:1017): Can't mount file system dev=/dev/ploop44896p1 
target=/virtual/vz/root/56: No such device or address Failed to mount image: 
Error in ploop_mount_fs (ploop.c:1017): Can't mount file system 
dev=/dev/ploop44896p1 target=/virtual/vz/root/56: No such device or address [21

The error says there is no directory "/virtual/vz/root/56", although
there should be if the mount was successful. Is hearbeat:Filesystem
reporting success too early?

When stopping/restarting, checkpointing/restoring CTs manually, I am not
able to produce any of above errors.

Any help appreciated.

Roman


pacemaker 1.1.7
resource-agents 3.9.5

This is my config:
--

node virtuetest1 \
attributes standby="off"
node virtuetest2 \
attributes standby="off"
node virtuetest3 \
attributes standby="off"
primitive mailalerts ocf:pacemaker:ClusterMon \
params extra_options="--mail-to root --mail-from
c...@virtuetest2.zhdk.ch --mail-prefix virtuetest"
pidfile="/var/run/crm/crm_mon.pid" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
op monitor interval="10s" timeout="20s" \
meta target-role="Stopped"
primitive p_lsb_vz lsb:vz \
op monitor interval="30" timeout="120s"
primitive p_nfs ocf:heartbeat:Filesystem \
params device="10.10.10.201:/vol/virtuetest/virtuetest"
directory="/virtual" fstype="nfs"
options="rsize=64512,wsize=64512,intr,noacl,nolock,ac,sync,tcp" \
op monitor interval="30" timeout="40" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive p_nfs_dump ocf:heartbeat:Filesystem \
params device="10.10.10.201:/vol/virtuedump/virtuedump"
directory="/virtual/vz/dump" fstype="nfs"
options="rsize=64512,wsize=64512,intr,noacl,nolock,ac,async,tcp" \
op monitor interval="30" timeout="40" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive p_sbd lsb:sbd \
op monitor interval="30" timeout="120"
primitive ploop1 ocf:heartbeat:ManageVE \
params veid="51" \
op monitor interval="30" timeout="120s" \
op start interval="0" timeout="300s" \
op stop interval="0" timeout="300s" \
op migrate_to interval="0" timeout="300s" \
op migrate_from interval="0" timeout="300s" \
meta target-role="Started" allow-migrate="true" is-managed="true"
primitive ploop2 ocf:heartbeat:ManageVE \
params veid="52" \
op monitor interval="30" timeout="120s" \
op start interval="0" timeout="300s" \
op stop interval="0" timeout="300s" \
op migrate_to interval="0" timeout="300s" \
op migrate_from interval="0" timeout="300s" \
meta target-role="Started" allow-migrate="true"
primitive ploop3 ocf:heartbeat:ManageVE \
params veid="53" \
op monitor interval="30" timeout="120s" \
op start interval="0" timeout="300s" \
op stop interval="0" timeout="300s" \
op migrate_to interval="0" timeout="300s" \
op migrate_from interval="0" timeout="300s" \
meta target-role="Started" allow-migrate="true"
primitive ploop4 ocf:heartbeat:ManageVE \
params veid="54" \

[Linux-HA] Problem with migration of OpenVZ CTs

2013-03-22 Thread Roman Haefeli
Hi,

I encountered a problem when performing a live migration of some OpenVZ
CTs. Altough the migration didn't trigger any messages in 'crm_mon' and
was initially performed without any troubles, the resource was restarted
on the target node 'unnecessarily'. From the logs it looks as if after
the actual migration pacemaker detected the resource to be running on
both nodes. Why did it detect that? Could it be that it checked too
early on the source node? Might that be a problem with the RA ManageVE
returing too early?

(For details see below)

Roman




The setup:
* Nodes are running Debian Squeeze with the current pve kernel
* Our CTs are running on an NFS share mounted on both nodes
* pacemaker 1.1.7

Action:
Migration of resource 'netpd' from vice1 to vice0

Log of the source node (vice1)
--
Mar 20 16:30:57 vice1 ManageVE[107511]: INFO: Setting up checkpoint... 
suspend... dump... kill... Container is unmounted Checkpointing completed 
succesfully
Mar 20 16:30:57 vice1 lrmd: [1523]: info: operation migrate_to[66] on netpd for 
client 1526: pid 107511 exited with return code 0
Mar 20 16:30:57 vice1 crmd: [1526]: info: process_lrm_event: LRM operation 
netpd_migrate_to_0 (call=66, rc=0, cib-update=172, confirmed=true) ok
[...]
Mar 20 16:30:57 vice1 pengine: [1525]: ERROR: native_create_actions: Resource 
netpd (ocf::ManageVE) is active on 2 nodes attempting recovery
Mar 20 16:30:57 vice1 pengine: [1525]: WARN: native_create_actions: See 
http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information.
[...]
Mar 20 16:30:57 vice1 pengine: [1525]: notice: LogActions: Restart 
netpd#011(Started vice0)


Log of the target node (vice0)
---
Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd stop[23] (pid 2191)
[...]
Mar 20 16:30:57 vice0 ManageVE[2191]: INFO: VE 3025 already stopped.
[...]
Mar 20 16:30:57 vice0 lrmd: [1543]: info: operation stop[23] on netpd for 
client 1546: pid 2191 exited with return code 0
[...]
Mar 20 16:30:57 vice0 crmd: [1546]: info: process_lrm_event: LRM operation 
netpd_stop_0 (call=23, rc=0, cib-update=28, confirmed=true) ok
[...]
Mar 20 16:30:57 vice0 lrmd: [1543]: info: rsc:netpd start[27] (pid 2275)
[...]
Mar 20 16:30:57 vice0 kernel: CT: 3025: started




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] RA heartbeat/exportfs hangs sporadically

2013-03-15 Thread Roman Haefeli
On Mon, 2013-03-11 at 16:28 +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Mon, Mar 11, 2013 at 10:53:55AM +0100, Roman Haefeli wrote:
> > On Fri, 2013-03-08 at 14:15 +0100, Dejan Muhamedagic wrote:
> > > Hi,
> > > 
> > > On Fri, Mar 08, 2013 at 01:39:27PM +0100, Roman Haefeli wrote:
> > > > On Fri, 2013-03-08 at 13:28 +0100, Roman Haefeli wrote:
> > > > > On Fri, 2013-03-08 at 12:02 +0100, Lars Marowsky-Bree wrote:
> > > > > > On 2013-03-08T11:56:12, Roman Haefeli  wrote:
> > > > > > 
> > > > > > > Googling "TrackedProcTimeoutFunction exportfs" didn't reveal any
> > > > > > > results, which makes me think we are alone with this specific 
> > > > > > > problem.
> > > > > > > Is it the RA that hangs or the command 'exportfs' which is 
> > > > > > > executed by
> > > > > > > this RA? 
> > > 
> > > It is most probably the exportfs program. Unless you hit the
> > > "rmtab growing indefinitely" issue.
> > 
> > No, this is with a later version of the RA.
> > 
> > > > From the log:
> > > > Mar  8 03:10:54 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:stop
> > > > process (PID 5528) timed out (try 2).  Killing with signal SIGKILL (9)
> > > 
> > > This means that the process didn't leave after being sent the
> > > TERM signal. I think that KILL takes place five seconds later.
> > > Was this with the "rmtab problem"?
> > 
> > I still don't fully understand. Is this  lrmd trying to kill the RA or
> > the process 'exportfs' with given PID?
> 
> The former. I thought I already answered that.

Yeah, sorry you did. Just for clarification: You say it's most likely
that the 'exportfs' process hangs and thus lrmd tries to kill the RA,
which will not exit until exportfs exits, is that correct?

> > > > For me valuable to know is what is lrmd trying to kill here: the process
> > > > 'exportfs' or the process of the resource agent?
> > > 
> > > The resource agent instance.
> > > 
> > > > I mean, is 'exportfs' broken on said machine?
> > > 
> > > Name resolution taking long perhaps?
> > 
> > We use IP addresses everywhere, so I assume it's not related to name
> > resolution. 
> > 
> > What can I do about a broken 'exportfs'? It happens so seldom that I
> > don't have a chance to deeply investigate the problem to write a proper
> > bug report.
> 
> Do you run the latest resource-agents (3.9.5)? Then you can
> trace the resource agent, like this:
> 
> primitive r ocf:heartbeat:exportfs \
>   params ... \
>   op stop trace_ra=1
> 
> The trace files will be generated per call in
> $HA_VARLIB/trace_ra//..
> 
> HA_VARLIB is usually, I think, /var/lib/heartbeat.

Thanks, that is valuable information. Is it safe to only upgrade the
resource-agents while keeping corosync (1.4.2) and pacemaker (1.1.7) at
their current version?

Thanks,
Roman


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] RA heartbeat/exportfs hangs sporadically

2013-03-11 Thread Roman Haefeli
On Fri, 2013-03-08 at 14:15 +0100, Dejan Muhamedagic wrote:
> Hi,
> 
> On Fri, Mar 08, 2013 at 01:39:27PM +0100, Roman Haefeli wrote:
> > On Fri, 2013-03-08 at 13:28 +0100, Roman Haefeli wrote:
> > > On Fri, 2013-03-08 at 12:02 +0100, Lars Marowsky-Bree wrote:
> > > > On 2013-03-08T11:56:12, Roman Haefeli  wrote:
> > > > 
> > > > > Googling "TrackedProcTimeoutFunction exportfs" didn't reveal any
> > > > > results, which makes me think we are alone with this specific problem.
> > > > > Is it the RA that hangs or the command 'exportfs' which is executed by
> > > > > this RA? 
> 
> It is most probably the exportfs program. Unless you hit the
> "rmtab growing indefinitely" issue.

No, this is with a later version of the RA.

> > From the log:
> > Mar  8 03:10:54 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:stop
> > process (PID 5528) timed out (try 2).  Killing with signal SIGKILL (9)
> 
> This means that the process didn't leave after being sent the
> TERM signal. I think that KILL takes place five seconds later.
> Was this with the "rmtab problem"?

I still don't fully understand. Is this  lrmd trying to kill the RA or
the process 'exportfs' with given PID?

> > For me valuable to know is what is lrmd trying to kill here: the process
> > 'exportfs' or the process of the resource agent?
> 
> The resource agent instance.
> 
> > I mean, is 'exportfs' broken on said machine?
> 
> Name resolution taking long perhaps?

We use IP addresses everywhere, so I assume it's not related to name
resolution. 

What can I do about a broken 'exportfs'? It happens so seldom that I
don't have a chance to deeply investigate the problem to write a proper
bug report.

Roman

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] RA heartbeat/exportfs hangs sporadically

2013-03-08 Thread Roman Haefeli
On Fri, 2013-03-08 at 13:28 +0100, Roman Haefeli wrote:
> On Fri, 2013-03-08 at 12:02 +0100, Lars Marowsky-Bree wrote:
> > On 2013-03-08T11:56:12, Roman Haefeli  wrote:
> > 
> > > Googling "TrackedProcTimeoutFunction exportfs" didn't reveal any
> > > results, which makes me think we are alone with this specific problem.
> > > Is it the RA that hangs or the command 'exportfs' which is executed by
> > > this RA? 

>From the log:
Mar  8 03:10:54 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:stop
process (PID 5528) timed out (try 2).  Killing with signal SIGKILL (9)

For me valuable to know is what is lrmd trying to kill here: the process
'exportfs' or the process of the resource agent?

I mean, is 'exportfs' broken on said machine?

Roman


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] RA heartbeat/exportfs hangs sporadically

2013-03-08 Thread Roman Haefeli
On Fri, 2013-03-08 at 12:02 +0100, Lars Marowsky-Bree wrote:
> On 2013-03-08T11:56:12, Roman Haefeli  wrote:
> 
> > Googling "TrackedProcTimeoutFunction exportfs" didn't reveal any
> > results, which makes me think we are alone with this specific problem.
> > Is it the RA that hangs or the command 'exportfs' which is executed by
> > this RA? 
> 
> What resource-agents version? There was one which had a bug with a file
> it used growing so large that operations would time out (or the disk
> would fill up) ...
> 

Ah, thanks for reminding me. Actually, the version of the
resource-agents package from Debian backports is 3.9.2, which comes with
an exportfs RA that still suffers from the problem of .rmtab growing
indefinitely. I replaced it by a version from git [1] with a fix for
that problem. The problem I'm experiencing now is with the updated
version, though I don't really know if the previous version with
the .rmtab-bug exhibits the same problem, because I wasn't running it
for a long enough time.

This raises the question whether it is valid to update only one single
RA while keeping the rest the same.

Roman

[1]
https://github.com/ClusterLabs/resource-agents/commit/dcf6eac59f3ea2f77bf6069d79218f539186c9f7

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] RA heartbeat/exportfs hangs sporadically

2013-03-08 Thread Roman Haefeli
Hi all

In a two-node cluster that manages an NFS server on top of a DRBD
replication, we experienced fencing of the active node due to exportfs
monitor failure: 

Mar  8 03:08:42 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:monitor 
process (PID 4120) timed out (try 1).  Killing with signal SIGTERM (15).
Mar  8 03:08:47 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:monitor 
process (PID 4120) timed out (try 2).  Killing with signal SIGKILL (9).
Mar  8 03:08:52 vicestore1 lrmd: [1550]: ERROR: TrackedProcTimeoutFunction: 
p_exportfs_virtual:monitor process (PID 4120) will not die!
Mar  8 03:08:53 vicestore1 lrmd: [1550]: WARN: operation monitor[47] on 
p_exportfs_virtual for client 1553: pid 4120 timed out
Mar  8 03:08:53 vicestore1 crmd: [1553]: ERROR: process_lrm_event: LRM 
operation p_exportfs_virtual_monitor_5000 (47) Timed Out (timeout=12ms)

Later on, the cluster tries to stop this specific RA and this also
triggers a timeout:

Mar  8 03:10:49 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:stop process 
(PID 5528) timed out (try 1).  Killing with signal SIGTERM (15).
Mar  8 03:10:54 vicestore1 lrmd: [1550]: WARN: p_exportfs_virtual:stop process 
(PID 5528) timed out (try 2).  Killing with signal SIGKILL (9).
Mar  8 03:10:59 vicestore1 lrmd: [1550]: ERROR: TrackedProcTimeoutFunction: 
p_exportfs_virtual:stop process (PID 5528) will not die!
Mar  8 03:11:29 vicestore1 crmd: [1553]: WARN: action_timer_callback: Timer 
popped (timeout=2, abort_level=0, complete=false)
Mar  8 03:11:29 vicestore1 crmd: [1553]: ERROR: print_elem: Aborting 
transition, action lost: [Action 7]: Completed (id: p_drbd_nfs:1_monitor_15000, 
loc: vicestore1, priority: 0)
Mar  8 03:11:29 vicestore1 crmd: [1553]: info: abort_transition_graph: 
action_timer_callback:535 - Triggered transition abort (complete=0) : Action 
lost
Mar  8 03:11:47 vicestore1 lrmd: [1550]: WARN: operation stop[50] on 
p_exportfs_v   

As you can see, the last message is cut in the middle, because the node
got fenced at that exact time. This happens quite rarely, read every few
months.

Googling "TrackedProcTimeoutFunction exportfs" didn't reveal any
results, which makes me think we are alone with this specific problem.
Is it the RA that hangs or the command 'exportfs' which is executed by
this RA? 

This on Debian Squeeze with pacemaker 1.1.7.

Any hints are welcome

Roman

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How to identify reason for fencing

2013-02-07 Thread Roman Haefeli
On Wed, 2013-02-06 at 11:24 +0100, Michael Schwartzkopff wrote:
> Am Mittwoch, 6. Februar 2013, 11:06:23 schrieb Roman Haefeli:
> > Hi all
> > 
> > We are running a pacemaker/corosync cluster with three nodes that
> > manages ~30 OpenVZ containers.
> > 
> > We recently had the situation where one node fenced the to other two
> > nodes (sbd is configured as a stonith device). In the system logs I was
> > able to spot the line where the node gives the death pill to the others.
> > However, I have difficulties finding the original reason for the
> > decision to fence the other nodes.
> > 
> > Before I spam the list with logs, I'd like to ask if there is something
> > particular I should look for. Are there any advices about how to proceed
> > in such a situation?
> > 
> > Many thanks in advance.
> > 
> > Roman
> 
> The reasong should be in the logs above the fencing event. Something like
> 
> corosync: lost connection.
> 
> If you want help from the list paste your logs (the relevant parts only!) to 
> pastebin and mail the link.

I wasn't sure about which parts are the relevant. However, in the
meanwhile we were able to explain the situation. As often, it was a
whole chain of circumstances that eventually lead to fencing nodes.

Here the whole story (for those interested):

Each node has two NICs which form a network bond. On this bond there are
two VLANs configured, one for DMZ and one for internal use (corosync
ring and NFS traffic). Some containers have their virtual eth devices
bridged to the intern vlan for NFS access. Whenever a container starts
or stops, its veth device joins or leaves the bridge on the internal
vlan. This wouldn't generally be a problem, but it is when the kernel is
the Debian OpenVZ kernel. With this kernel bridges always use the MAC
address of the member with the smallest number as its MAC. When the
container's MAC address is smaller then the one of the physical NIC, the
MAC address of the bridge changes whenever that container is started or
stopped. This MAC switching caused network lags on the bridge where also
the corosync ring is connected. This finally made the corosync ring
break, which in turn lead to the fencing of two nodes.

Either of those would have prevented that:
* A non-Debian OpenVZ kernel (different scheme for assigning MACs to a
bridge)
* giving a higher MAC to the veth of the container
* running the corosync ring on its own vlan.

Roman
  






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] How to identify reason for fencing

2013-02-06 Thread Roman Haefeli
Hi all

We are running a pacemaker/corosync cluster with three nodes that
manages ~30 OpenVZ containers.

We recently had the situation where one node fenced the to other two
nodes (sbd is configured as a stonith device). In the system logs I was
able to spot the line where the node gives the death pill to the others.
However, I have difficulties finding the original reason for the
decision to fence the other nodes.

Before I spam the list with logs, I'd like to ask if there is something
particular I should look for. Are there any advices about how to proceed
in such a situation?

Many thanks in advance.

Roman

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] prevent unwanted fencing

2012-04-04 Thread Roman Haefeli
Hi all

We're running a two node cluster with a bunch of OpenVZ Containers as
Resources and use SBD as a fencing method. We're still in testing mode
and did perform some IO benchmarks on NFS with tiobench. While we were
performing those test, the node fenced itself as soon as tiobench was
finished doing the test. We looked for the reason and found the
following line in the syslog:

---
Apr  2 16:44:07 hostname sbd: [2790]: WARN: Latency: No liveness for 4 s 
exceeds threshold of 3 s (healthy servants: 0)
---

I assume this could be prevented by setting sbd's -5 flag to a higher
value than the default of 3s. But what is a good value?

However, the NFS where we were the performing the tests on is provided
by a different host than the one that provides the iSCSI device used by
SBD. How comes that those two interfere?

Next time we monitored the sbd access time during the benchmark with
this command:

$ while true; do (time sbd -d /dev/sdd list) 2>&1 | grep real; sleep 1; done 

During the test it was usually ~0.030s. However, just when the test
finished, it was much higher, like 2-4s. 

Actually, we are not so much concerned about this right now, but we
would like to make sure, that it is not possible to fence the whole node
by a Container doing extensive IO. How can this be safely prevented?

Roman


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] order troubles

2012-03-26 Thread Roman Haefeli
On Thu, 2012-03-22 at 15:06 +0100, Florian Haas wrote:
> On Thu, Mar 22, 2012 at 10:34 AM, Lars Ellenberg
>  wrote:
> >> order o_nfs_before_vz 0: cl_fs_nfs cl_vz
> >> order o_vz_before_ve992 0: cl_vz ve992
> >
> > a score of "0" is roughly equivalent to
> > "if you happen do plan to do both operations
> >  in the same transition, would you please consider
> >  to do them in this order, pretty please, if you see fit"
> 
> Lars beat me to this, as the post turned out to be a little more
> elaborate than expected, but here's a bit of background info for
> additional clarification:
> 
> http://www.hastexo.com/resources/hints-and-kinks/mandatory-and-advisory-ordering-pacemaker

Thanks for this arcticle. That is valuable information.

> >> This is on / with:
> >> * Debian 6.0.4
> >> * corosync 1.4.2 (from Debian backports)
> >> * pacemaker 1.0.9.1 (from Debian backports)
> 
> squeeze-backports is on 1.1.6, and you _really_ want to upgrade.

Oh.. that's embarassing. I wasn't actually testing with the backports
version all the time! 

I upgraded now and things look more promising. I can
'/etc/init.d/corosync restart' now and all resources are migrated to the
remaining node and then back to their original node without failure.
However, it seems I need to stick with the order score of 0 for this and
similar constraints:

---
order o_vz_before_ve992 0: cl_vz ve992
---

When I set the score to 'inf' and the second node is coming back, not
only the OpenVZ containers that are supposed to be migrated are
restarted, but all containers. I was told in irc #linux-ha, that some
versions pacemake suffer from the problem, that cloned resources (in my
case the vz service [lsb:vz] and the nfs-mount
[ocf:heartbeat:Filesystem]) are restarted too often and I was advised to
use an advisory score. This seems to work correctly for me now: when the
offline node is coming back, only the the containers that are migrated
are restarted. 
The drawback is that the containers are not stopped, when I stop the
filesystem resource. But since I don't have to do that, that is not a
problem.

Roman


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] order troubles

2012-03-22 Thread Roman Haefeli
Hi all

I only started diving into linux-ha recently and am currently working on
a test setup, where a 2-node cluster manages a bunch of OpenVZ
containers. The Containers are running on a cloned nfs mount, which is
also managed by the cluster. In normal operation mode, everything works
fine. I can put one node to 'standby' and all resources are migrated to
the remaining node. If I set it back to 'online', all resources with a
location preference for the second node are moved back to their original
node. So far, so good.

However, if nodeB is not running corosync and I start the corosync
service on nodeB while it is already set to 'online' in the crm, it will
try to start the resources (as if it would go from 'standby' to
'online'), but starting the containers will fail then. The crm shows
this message for each container failed to start:

---
ve992_start_0 (node=nodeB, call=19, rc=1, status=complete): unknown error
---

In the corosync log (on Debian: /var/log/syslog), I find this message:

---
Mar 22 09:59:18 nodeB ManageVE[26793]: ERROR: Starting container ... Container 
is mounted Invalid kernel, or some kernel modules are not loaded Container 
start failed Container is unmounted
---

This is actually the error message printed by vzctl, when the service
provided by /etc/init.d/vz is not running.

>From what I understand, lsb:vz should already be running when starting
the containers (ocf:hearbeat:ManageVE), according to the order
constraints I put:

---
order o_nfs_before_vz 0: cl_fs_nfs cl_vz
order o_vz_before_ve992 0: cl_vz ve992
---

Shouldn't this ensure, that the vz service is only started after nfs was
started and the the containers only should be started _after_ vz was
started? It seems to work for:
 
  'standby' -> 'online'

or:

  'offline' -> 'standby' -> 'online'

but not for:

  'offline' -> 'online'

This is on / with:
* Debian 6.0.4
* corosync 1.4.2 (from Debian backports)
* pacemaker 1.0.9.1 (from Debian backports)

My current cib is attached.

Roman

 

node virtue4 \
attributes standby="off"
node virtue5 \
attributes standby="off"
primitive p_fs_nfs ocf:heartbeat:Filesystem \
params device="10.10.10.201:/vol/virtueprivate/virtueprivate" 
directory="/virtual" fstype="nfs" options="rsize=32767,wsize=32767" \
op monitor interval="30" OCF_CHECK_LEVEL="20"
primitive p_sbd lsb:sbd \
op monitor interval="30"
primitive p_vz lsb:vz \
op monitor interval="30"
primitive stonith_sbd stonith:external/sbd \
params 
sbd_device="/dev/disk/by-id/scsi-360a98000572d4c73526f696553506366" \
meta target-role="Stopped"
primitive ve1104 ocf:heartbeat:ManageVE \
params veid="1104" \
op monitor interval="30" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
meta target-role="Started"
primitive ve1105 ocf:heartbeat:ManageVE \
params veid="1105" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
op monitor interval="30" \
meta allow-migrate="true" target-role="Started"
primitive ve2010 ocf:heartbeat:ManageVE \
params veid="2010" \
op monitor interval="30" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
meta allow-migrate="true" target-role="started"
primitive ve2100 ocf:heartbeat:ManageVE \
params veid="2100" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
op monitor interval="30" \
meta target-role="started"
primitive ve2101 ocf:heartbeat:ManageVE \
params veid="2101" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
op monitor interval="30" \
meta target-role="Started"
primitive ve2102 ocf:heartbeat:ManageVE \
params veid="2102" \
op monitor interval="30" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
meta target-role="Started"
primitive ve991 ocf:heartbeat:ManageVE \
params veid="991" \
op monitor interval="30" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
meta target-role="Started"
primitive ve992 ocf:heartbeat:ManageVE \
params veid="992" \
op monitor interval="30" \
op start interval="0" timeout="75s" \
op stop interval="0" timeout="75s" \
meta allow-migrate="true" target-role="Started"
clone cl_fs_nfs p_fs_nfs \
meta target-role="Started"
clone cl_sbd p_sbd \
meta target-role="started"
clone cl_vz p_vz \
meta target-role="Started"
location cli-prefer-stonith_sbd stonith_sbd \
rule $id="cli-prefer-rule-stonith_sbd" inf: #uname eq virtue4
location cli-prefer-ve1104 ve1104 \
rule $id="cli-prefer-rule-ve1104" inf: #uname eq virtue5
location cli-prefer-ve1105 ve1105 \
rule $id="cli-prefer-rule-v