Re: [Linux-HA] Filesystem thinks it is run as a clone

2011-04-13 Thread Andrew Beekhof
On Wed, Apr 13, 2011 at 10:57 AM, Christoph Bartoschek
 wrote:
> Am 13.04.2011 08:26, schrieb Andrew Beekhof:
>> On Tue, Apr 12, 2011 at 5:17 PM, Christoph Bartoschek
>>   wrote:
>>> Hi,
>>>
>>> today we tested some NFS cluster scenarios and the first test failed.
>>> The first test was to put the current master node into standby. Stopping
>>> the services worked but then starting it on the other node failed. The
>>> ocf:hearbeat:Filesystem resource failed to start. In the logfile we see:
>>>
>>> Apr 12 14:08:42 laplace Filesystem[10772]: [10820]: INFO: Running start
>>> for /dev/home-data/afs on /srv/nfs/afs
>>> Apr 12 14:08:42 laplace Filesystem[10772]: [10822]: ERROR: DANGER! ext4
>>> on /dev/home-data/afs is NOT cluster-aware!
>>> Apr 12 14:08:42 laplace Filesystem[10772]: [10824]: ERROR: DO NOT RUN IT
>>> AS A CLONE!
>>
>> To my eye the Filesystem agent looks confused
>
> The agent is confused because OCF_RESKEY_CRM_meta_clone is non-zero. Is
> this something that can happen?

Not unless the resource has been cloned - and looking at the config
this did not seem to be the case.
Or did I miss something?

>
>>
>>> The message comes from the following code in ocf:hearbeat:Filesystem:
>>>
>>> case $FSTYPE in
>>> ocfs2)  ocfs2_init
>>>          ;;
>>> nfs|smbfs|none|gfs2)    : # this is kind of safe too
>>>          ;;
>>> *)      if [ -n "$OCF_RESKEY_CRM_meta_clone" ]; then
>>>                  ocf_log err "DANGER! $FSTYPE on $DEVICE is NOT
>>> cluster-aware!"
>>>                  ocf_log err "DO NOT RUN IT AS A CLONE!"
>>>                  ocf_log err "Politely refusing to proceed to avoid data
>>> corruption."
>>>                  exit $OCF_ERR_CONFIGURED
>>>          fi
>>>          ;;
>>> esac
>>>
>>>
>>> The message is only printed if the variable OCF_RESKEY_CRM_meta_clone is
>>> non-zero. Our configuration however does not run the filesystem as a
>>> clone. Somehow the OCF_RESKEY_CRM_meta_clone variable leaked into the
>>> start of the Filesystem resource. Is this a known bug?
>>>
>>> Or is there a configuration error on our side? Here is the current
>>> configuration:
>>>
>>>
>>> node laplace \
>>>          attributes standby="off"
>>> node ries \
>>>          attributes standby="off"
>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>          params ip="192.168.143.228" cidr_netmask="24" \
>>>          op monitor interval="30s" \
>>>          meta target-role="Started"
>>> primitive p_drbd_nfs ocf:linbit:drbd \
>>>          params drbd_resource="home-data" \
>>>          op monitor interval="15" role="Master" \
>>>          op monitor interval="30" role="Slave"
>>> primitive p_exportfs_afs ocf:heartbeat:exportfs \
>>>          params fsid="1" directory="/srv/nfs/afs"
>>> options="rw,no_root_squash,mountpoint" \
>>>          clientspec="192.168.143.0/255.255.255.0" \
>>>          wait_for_leasetime_on_stop="false" \
>>>          op monitor interval="30s"
>>> primitive p_fs_afs ocf:heartbeat:Filesystem \
>>>          params device="/dev/home-data/afs" directory="/srv/nfs/afs" \
>>>         fstype="ext4" \
>>>          op monitor interval="10s" \
>>>          meta target-role="Started"
>>> primitive p_lsb_nfsserver lsb:nfs-kernel-server \
>>>          op monitor interval="30s"
>>> primitive p_lvm_nfs ocf:heartbeat:LVM \
>>>          params volgrpname="home-data" \
>>>          op monitor interval="30s"
>>> group g_nfs p_lvm_nfs p_fs_afs p_exportfs_afs ClusterIP
>>> ms ms_drbd_nfs p_drbd_nfs \
>>>          meta master-max="1" master-node-max="1" clone-max="2"
>>> clone-node-max="1" notify="true" target-role="Started"
>>> clone cl_lsb_nfsserver p_lsb_nfsserver
>>> colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
>>> order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start
>>> property $id="cib-bootstrap-options" \
>>>          dc-version="1.0.9-unknown" \
>>>          cluster-infrastructure="openais" \
>>>          expected-quorum-votes="2" \
>>>          stonith-enabled="false" \
>>>          no-quorum-policy="ignore" \
>>>          last-lrm-refresh="1302610197"
>>> rsc_defaults $id="rsc-options" \
>>>          resource-stickiness="200"
>>>
>>>
>>> Thanks
>>> Christoph
>>> ___
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Filesystem thinks it is run as a clone

2011-04-13 Thread Christoph Bartoschek
Am 13.04.2011 08:26, schrieb Andrew Beekhof:
> On Tue, Apr 12, 2011 at 5:17 PM, Christoph Bartoschek
>   wrote:
>> Hi,
>>
>> today we tested some NFS cluster scenarios and the first test failed.
>> The first test was to put the current master node into standby. Stopping
>> the services worked but then starting it on the other node failed. The
>> ocf:hearbeat:Filesystem resource failed to start. In the logfile we see:
>>
>> Apr 12 14:08:42 laplace Filesystem[10772]: [10820]: INFO: Running start
>> for /dev/home-data/afs on /srv/nfs/afs
>> Apr 12 14:08:42 laplace Filesystem[10772]: [10822]: ERROR: DANGER! ext4
>> on /dev/home-data/afs is NOT cluster-aware!
>> Apr 12 14:08:42 laplace Filesystem[10772]: [10824]: ERROR: DO NOT RUN IT
>> AS A CLONE!
>
> To my eye the Filesystem agent looks confused

The agent is confused because OCF_RESKEY_CRM_meta_clone is non-zero. Is 
this something that can happen?

>
>> The message comes from the following code in ocf:hearbeat:Filesystem:
>>
>> case $FSTYPE in
>> ocfs2)  ocfs2_init
>>  ;;
>> nfs|smbfs|none|gfs2): # this is kind of safe too
>>  ;;
>> *)  if [ -n "$OCF_RESKEY_CRM_meta_clone" ]; then
>>  ocf_log err "DANGER! $FSTYPE on $DEVICE is NOT
>> cluster-aware!"
>>  ocf_log err "DO NOT RUN IT AS A CLONE!"
>>  ocf_log err "Politely refusing to proceed to avoid data
>> corruption."
>>  exit $OCF_ERR_CONFIGURED
>>  fi
>>  ;;
>> esac
>>
>>
>> The message is only printed if the variable OCF_RESKEY_CRM_meta_clone is
>> non-zero. Our configuration however does not run the filesystem as a
>> clone. Somehow the OCF_RESKEY_CRM_meta_clone variable leaked into the
>> start of the Filesystem resource. Is this a known bug?
>>
>> Or is there a configuration error on our side? Here is the current
>> configuration:
>>
>>
>> node laplace \
>>  attributes standby="off"
>> node ries \
>>  attributes standby="off"
>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>  params ip="192.168.143.228" cidr_netmask="24" \
>>  op monitor interval="30s" \
>>  meta target-role="Started"
>> primitive p_drbd_nfs ocf:linbit:drbd \
>>  params drbd_resource="home-data" \
>>  op monitor interval="15" role="Master" \
>>  op monitor interval="30" role="Slave"
>> primitive p_exportfs_afs ocf:heartbeat:exportfs \
>>  params fsid="1" directory="/srv/nfs/afs"
>> options="rw,no_root_squash,mountpoint" \
>>  clientspec="192.168.143.0/255.255.255.0" \
>>  wait_for_leasetime_on_stop="false" \
>>  op monitor interval="30s"
>> primitive p_fs_afs ocf:heartbeat:Filesystem \
>>  params device="/dev/home-data/afs" directory="/srv/nfs/afs" \
>> fstype="ext4" \
>>  op monitor interval="10s" \
>>  meta target-role="Started"
>> primitive p_lsb_nfsserver lsb:nfs-kernel-server \
>>  op monitor interval="30s"
>> primitive p_lvm_nfs ocf:heartbeat:LVM \
>>  params volgrpname="home-data" \
>>  op monitor interval="30s"
>> group g_nfs p_lvm_nfs p_fs_afs p_exportfs_afs ClusterIP
>> ms ms_drbd_nfs p_drbd_nfs \
>>  meta master-max="1" master-node-max="1" clone-max="2"
>> clone-node-max="1" notify="true" target-role="Started"
>> clone cl_lsb_nfsserver p_lsb_nfsserver
>> colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
>> order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start
>> property $id="cib-bootstrap-options" \
>>  dc-version="1.0.9-unknown" \
>>  cluster-infrastructure="openais" \
>>  expected-quorum-votes="2" \
>>  stonith-enabled="false" \
>>  no-quorum-policy="ignore" \
>>  last-lrm-refresh="1302610197"
>> rsc_defaults $id="rsc-options" \
>>  resource-stickiness="200"
>>
>>
>> Thanks
>> Christoph
>> ___
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Filesystem thinks it is run as a clone

2011-04-12 Thread Andrew Beekhof
On Tue, Apr 12, 2011 at 5:17 PM, Christoph Bartoschek
 wrote:
> Hi,
>
> today we tested some NFS cluster scenarios and the first test failed.
> The first test was to put the current master node into standby. Stopping
> the services worked but then starting it on the other node failed. The
> ocf:hearbeat:Filesystem resource failed to start. In the logfile we see:
>
> Apr 12 14:08:42 laplace Filesystem[10772]: [10820]: INFO: Running start
> for /dev/home-data/afs on /srv/nfs/afs
> Apr 12 14:08:42 laplace Filesystem[10772]: [10822]: ERROR: DANGER! ext4
> on /dev/home-data/afs is NOT cluster-aware!
> Apr 12 14:08:42 laplace Filesystem[10772]: [10824]: ERROR: DO NOT RUN IT
> AS A CLONE!

To my eye the Filesystem agent looks confused

> The message comes from the following code in ocf:hearbeat:Filesystem:
>
> case $FSTYPE in
> ocfs2)  ocfs2_init
>         ;;
> nfs|smbfs|none|gfs2)    : # this is kind of safe too
>         ;;
> *)      if [ -n "$OCF_RESKEY_CRM_meta_clone" ]; then
>                 ocf_log err "DANGER! $FSTYPE on $DEVICE is NOT
> cluster-aware!"
>                 ocf_log err "DO NOT RUN IT AS A CLONE!"
>                 ocf_log err "Politely refusing to proceed to avoid data
> corruption."
>                 exit $OCF_ERR_CONFIGURED
>         fi
>         ;;
> esac
>
>
> The message is only printed if the variable OCF_RESKEY_CRM_meta_clone is
> non-zero. Our configuration however does not run the filesystem as a
> clone. Somehow the OCF_RESKEY_CRM_meta_clone variable leaked into the
> start of the Filesystem resource. Is this a known bug?
>
> Or is there a configuration error on our side? Here is the current
> configuration:
>
>
> node laplace \
>         attributes standby="off"
> node ries \
>         attributes standby="off"
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>         params ip="192.168.143.228" cidr_netmask="24" \
>         op monitor interval="30s" \
>         meta target-role="Started"
> primitive p_drbd_nfs ocf:linbit:drbd \
>         params drbd_resource="home-data" \
>         op monitor interval="15" role="Master" \
>         op monitor interval="30" role="Slave"
> primitive p_exportfs_afs ocf:heartbeat:exportfs \
>         params fsid="1" directory="/srv/nfs/afs"
> options="rw,no_root_squash,mountpoint" \
>         clientspec="192.168.143.0/255.255.255.0" \
>         wait_for_leasetime_on_stop="false" \
>         op monitor interval="30s"
> primitive p_fs_afs ocf:heartbeat:Filesystem \
>         params device="/dev/home-data/afs" directory="/srv/nfs/afs" \
>        fstype="ext4" \
>         op monitor interval="10s" \
>         meta target-role="Started"
> primitive p_lsb_nfsserver lsb:nfs-kernel-server \
>         op monitor interval="30s"
> primitive p_lvm_nfs ocf:heartbeat:LVM \
>         params volgrpname="home-data" \
>         op monitor interval="30s"
> group g_nfs p_lvm_nfs p_fs_afs p_exportfs_afs ClusterIP
> ms ms_drbd_nfs p_drbd_nfs \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" target-role="Started"
> clone cl_lsb_nfsserver p_lsb_nfsserver
> colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
> order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-unknown" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         last-lrm-refresh="1302610197"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="200"
>
>
> Thanks
> Christoph
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Filesystem thinks it is run as a clone

2011-04-12 Thread Christoph Bartoschek
Hi,

today we tested some NFS cluster scenarios and the first test failed. 
The first test was to put the current master node into standby. Stopping 
the services worked but then starting it on the other node failed. The 
ocf:hearbeat:Filesystem resource failed to start. In the logfile we see:

Apr 12 14:08:42 laplace Filesystem[10772]: [10820]: INFO: Running start 
for /dev/home-data/afs on /srv/nfs/afs
Apr 12 14:08:42 laplace Filesystem[10772]: [10822]: ERROR: DANGER! ext4 
on /dev/home-data/afs is NOT cluster-aware!
Apr 12 14:08:42 laplace Filesystem[10772]: [10824]: ERROR: DO NOT RUN IT 
AS A CLONE!

The message comes from the following code in ocf:hearbeat:Filesystem:

case $FSTYPE in
ocfs2)  ocfs2_init
 ;;
nfs|smbfs|none|gfs2): # this is kind of safe too
 ;;
*)  if [ -n "$OCF_RESKEY_CRM_meta_clone" ]; then
 ocf_log err "DANGER! $FSTYPE on $DEVICE is NOT 
cluster-aware!"
 ocf_log err "DO NOT RUN IT AS A CLONE!"
 ocf_log err "Politely refusing to proceed to avoid data 
corruption."
 exit $OCF_ERR_CONFIGURED
 fi
 ;;
esac


The message is only printed if the variable OCF_RESKEY_CRM_meta_clone is 
non-zero. Our configuration however does not run the filesystem as a 
clone. Somehow the OCF_RESKEY_CRM_meta_clone variable leaked into the 
start of the Filesystem resource. Is this a known bug?

Or is there a configuration error on our side? Here is the current 
configuration:


node laplace \
 attributes standby="off"
node ries \
 attributes standby="off"
primitive ClusterIP ocf:heartbeat:IPaddr2 \
 params ip="192.168.143.228" cidr_netmask="24" \
 op monitor interval="30s" \
 meta target-role="Started"
primitive p_drbd_nfs ocf:linbit:drbd \
 params drbd_resource="home-data" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
primitive p_exportfs_afs ocf:heartbeat:exportfs \
 params fsid="1" directory="/srv/nfs/afs" 
options="rw,no_root_squash,mountpoint" \
 clientspec="192.168.143.0/255.255.255.0" \
 wait_for_leasetime_on_stop="false" \
 op monitor interval="30s"
primitive p_fs_afs ocf:heartbeat:Filesystem \
 params device="/dev/home-data/afs" directory="/srv/nfs/afs" \
fstype="ext4" \
 op monitor interval="10s" \
 meta target-role="Started"
primitive p_lsb_nfsserver lsb:nfs-kernel-server \
 op monitor interval="30s"
primitive p_lvm_nfs ocf:heartbeat:LVM \
 params volgrpname="home-data" \
 op monitor interval="30s"
group g_nfs p_lvm_nfs p_fs_afs p_exportfs_afs ClusterIP
ms ms_drbd_nfs p_drbd_nfs \
 meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true" target-role="Started"
clone cl_lsb_nfsserver p_lsb_nfsserver
colocation c_nfs_on_drbd inf: g_nfs ms_drbd_nfs:Master
order o_drbd_before_nfs inf: ms_drbd_nfs:promote g_nfs:start
property $id="cib-bootstrap-options" \
 dc-version="1.0.9-unknown" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 last-lrm-refresh="1302610197"
rsc_defaults $id="rsc-options" \
 resource-stickiness="200"


Thanks
Christoph
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems