Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-12 Thread Daniel Ragle

Thanks for the comments. Replies within.

On 9/11/2018 1:52 PM, Ken Gaillot wrote:

On Fri, 2018-09-07 at 16:07 -0400, Dan Ragle wrote:

On an active-active two node cluster with DRBD, dlm, filesystem
mounts, a Web Server, and some crons I can't figure out how to have
the crons jump from node to node in the correct order. Specifically,
I have two crontabs (managed via symlink creation/deletion)
which normally will run one on node1 and the other on node2. When a
node goes down, I want both to run on the remaining node until
the original node comes back up, at which time they should split the
nodes again. However, when returning to the original node the
crontab that is being moved must wait until the underlying FS mount
is done on the original node before jumping.

DRBD, dlm, the filesystem mounts and the Web Server are all working
as expected; when I mark the second node as standby Apache
stops, the FS unmounts, dlm stops, and DRBD stops on the node; and
when I mark that same node unstandby the reverse happens as
expected. All three of those are cloned resources.

The crontab resources are not cloned and create symlinks, one
resource preferring the first node and the other preferring the
second. Each is colocated and order dependent on the filesystem
mounts (which in turn are colocated and dependent on dlm, which in
turn is colocated and dependent on DRBD promotion). I thought this
would be sufficient, but when the original node is marked
unstandby the crontab that prefers to be on that node attempts to
jump over immediately before the FS is mounted on that node. Of
course the crontab link fails because the underlying filesystem
hasn't been mounted yet.

pcs version is 0.9.162.

Here's the obfuscated detailed list of commands for the config. I'm
still trying to set it up so it's not production-ready yet, but
want to get this much sorted before I add too much more.

# pcs config export pcs-commands
#!/usr/bin/sh
# sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
# invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
# targeting system: ('linux', 'centos', '7.5.1804', 'Core')
# using interpreter: CPython 2.7.5
pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
pcs cluster setup --name MyCluster \
    node1.mydomain.com node2.mydomain.com --transport udpu
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml property set no-quorum-policy=freeze
pcs -f tmp-cib.xml resource defaults resource-stickiness=100


Just a note, scores are all added together, and highest wins. For
example, if resource-stickiness + location preference for current node
> colocation with resource on different node, then the colocation will
be ignored.


I don't think that's what's happening here; as the resource is moving 
*where* I want/expect it to, just not in the right order.





pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd
drbd_resource=r0 \
    op demote interval=0s timeout=90 monitor interval=60s notify
interval=0s \
    timeout=90 promote interval=0s timeout=90 reload interval=0s
timeout=30 \
    start interval=0s timeout=240 stop interval=0s timeout=100
pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
    allow_stonith_disabled=1 \
    op monitor interval=60s start interval=0s timeout=90 stop
interval=0s \
    timeout=100
pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem
\
    device=/dev/drbd1 directory=/var/www fstype=gfs2 \
    options=_netdev,nodiratime,noatime \
    op monitor interval=20 timeout=40 notify interval=0s timeout=60
start \
    interval=0s timeout=120s stop interval=0s timeout=120s
pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
    configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/s
erver-status \
    op monitor interval=1min start interval=0s timeout=40s stop
interval=0s \
    timeout=60s
pcs -f tmp-cib.xml resource create SharedRootCrons
ocf:heartbeat:symlink \
    link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
    op monitor interval=60 timeout=15 start interval=0s timeout=15
stop \
    interval=0s timeout=15


Another note, I seem to remember some implementations of the cron
daemon refuse to work from symlinks, and some require a restart when a
cron is changed outside of the crontab command. That may or may not
apply in your situation; the system or cron daemon logs should show
whether the change took effect when the resource is started/stopped.



Yup. We're actually already doing this much in production, just not with 
any type of cluster based management. It's working well. Creating and 
deleting the symlinks works fine (crond picks up the change without a 
problem). When updating the underlying cron definitions you do however 
need to touch -h the symlink file itself; just updating the underlying 
file isn't enough to get crond to notice. We just 

Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-11 Thread Ken Gaillot
On Fri, 2018-09-07 at 16:07 -0400, Dan Ragle wrote:
> On an active-active two node cluster with DRBD, dlm, filesystem
> mounts, a Web Server, and some crons I can't figure out how to have 
> the crons jump from node to node in the correct order. Specifically,
> I have two crontabs (managed via symlink creation/deletion) 
> which normally will run one on node1 and the other on node2. When a
> node goes down, I want both to run on the remaining node until 
> the original node comes back up, at which time they should split the
> nodes again. However, when returning to the original node the 
> crontab that is being moved must wait until the underlying FS mount
> is done on the original node before jumping.
> 
> DRBD, dlm, the filesystem mounts and the Web Server are all working
> as expected; when I mark the second node as standby Apache 
> stops, the FS unmounts, dlm stops, and DRBD stops on the node; and
> when I mark that same node unstandby the reverse happens as 
> expected. All three of those are cloned resources.
> 
> The crontab resources are not cloned and create symlinks, one
> resource preferring the first node and the other preferring the 
> second. Each is colocated and order dependent on the filesystem
> mounts (which in turn are colocated and dependent on dlm, which in 
> turn is colocated and dependent on DRBD promotion). I thought this
> would be sufficient, but when the original node is marked 
> unstandby the crontab that prefers to be on that node attempts to
> jump over immediately before the FS is mounted on that node. Of 
> course the crontab link fails because the underlying filesystem
> hasn't been mounted yet.
> 
> pcs version is 0.9.162.
> 
> Here's the obfuscated detailed list of commands for the config. I'm
> still trying to set it up so it's not production-ready yet, but 
> want to get this much sorted before I add too much more.
> 
> # pcs config export pcs-commands
> #!/usr/bin/sh
> # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
> # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
> # targeting system: ('linux', 'centos', '7.5.1804', 'Core')
> # using interpreter: CPython 2.7.5
> pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
> pcs cluster setup --name MyCluster \
>    node1.mydomain.com node2.mydomain.com --transport udpu
> pcs cluster start --all --wait=60
> pcs cluster cib tmp-cib.xml
> cp tmp-cib.xml tmp-cib.xml.deltasrc
> pcs -f tmp-cib.xml property set stonith-enabled=false
> pcs -f tmp-cib.xml property set no-quorum-policy=freeze
> pcs -f tmp-cib.xml resource defaults resource-stickiness=100

Just a note, scores are all added together, and highest wins. For
example, if resource-stickiness + location preference for current node
> colocation with resource on different node, then the colocation will
be ignored.

> pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd
> drbd_resource=r0 \
>    op demote interval=0s timeout=90 monitor interval=60s notify
> interval=0s \
>    timeout=90 promote interval=0s timeout=90 reload interval=0s
> timeout=30 \
>    start interval=0s timeout=240 stop interval=0s timeout=100
> pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
>    allow_stonith_disabled=1 \
>    op monitor interval=60s start interval=0s timeout=90 stop
> interval=0s \
>    timeout=100
> pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem
> \
>    device=/dev/drbd1 directory=/var/www fstype=gfs2 \
>    options=_netdev,nodiratime,noatime \
>    op monitor interval=20 timeout=40 notify interval=0s timeout=60
> start \
>    interval=0s timeout=120s stop interval=0s timeout=120s
> pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
>    configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/s
> erver-status \
>    op monitor interval=1min start interval=0s timeout=40s stop
> interval=0s \
>    timeout=60s
> pcs -f tmp-cib.xml resource create SharedRootCrons
> ocf:heartbeat:symlink \
>    link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
>    op monitor interval=60 timeout=15 start interval=0s timeout=15
> stop \
>    interval=0s timeout=15

Another note, I seem to remember some implementations of the cron
daemon refuse to work from symlinks, and some require a restart when a
cron is changed outside of the crontab command. That may or may not
apply in your situation; the system or cron daemon logs should show
whether the change took effect when the resource is started/stopped.

An alternative design for working around those issues is to have all
the crons always active (on host storage) on both nodes, but the cron
jobs check somehow whether they're on the active node or not and exit
when not where they need to be.

> pcs -f tmp-cib.xml resource create SharedUserCrons
> ocf:heartbeat:symlink \
>    link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
>    op monitor interval=60 timeout=15 start interval=0s timeout=15
> stop \
>    

Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-11 Thread Dan Ragle



On 9/11/2018 9:20 AM, Dan Ragle wrote:



On 9/11/2018 1:59 AM, Andrei Borzenkov wrote:

07.09.2018 23:07, Dan Ragle пишет:

On an active-active two node cluster with DRBD, dlm, filesystem mounts,
a Web Server, and some crons I can't figure out how to have the crons
jump from node to node in the correct order. Specifically, I have two
crontabs (managed via symlink creation/deletion) which normally will run
one on node1 and the other on node2. When a node goes down, I want both
to run on the remaining node until the original node comes back up, at
which time they should split the nodes again. However, when returning to
the original node the crontab that is being moved must wait until the
underlying FS mount is done on the original node before jumping.

DRBD, dlm, the filesystem mounts and the Web Server are all working as
expected; when I mark the second node as standby Apache stops, the FS
unmounts, dlm stops, and DRBD stops on the node; and when I mark that
same node unstandby the reverse happens as expected. All three of those
are cloned resources.

The crontab resources are not cloned and create symlinks, one resource
preferring the first node and the other preferring the second. Each is
colocated and order dependent on the filesystem mounts (which in turn
are colocated and dependent on dlm, which in turn is colocated and
dependent on DRBD promotion). I thought this would be sufficient, but
when the original node is marked unstandby the crontab that prefers to
be on that node attempts to jump over immediately before the FS is
mounted on that node. Of course the crontab link fails because the
underlying filesystem hasn't been mounted yet.

pcs version is 0.9.162.

Here's the obfuscated detailed list of commands for the config. I'm
still trying to set it up so it's not production-ready yet, but want to
get this much sorted before I add too much more.

# pcs config export pcs-commands
#!/usr/bin/sh
# sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
# invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
# targeting system: ('linux', 'centos', '7.5.1804', 'Core')
# using interpreter: CPython 2.7.5
pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
pcs cluster setup --name MyCluster \
   node1.mydomain.com node2.mydomain.com --transport udpu
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml property set no-quorum-policy=freeze
pcs -f tmp-cib.xml resource defaults resource-stickiness=100
pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
   op demote interval=0s timeout=90 monitor interval=60s notify
interval=0s \
   timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
   start interval=0s timeout=240 stop interval=0s timeout=100
pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
   allow_stonith_disabled=1 \
   op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
   timeout=100
pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
   device=/dev/drbd1 directory=/var/www fstype=gfs2 \
   options=_netdev,nodiratime,noatime \
   op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
   interval=0s timeout=120s stop interval=0s timeout=120s
pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
   configfile=/etc/httpd/conf/httpd.conf
statusurl=http://localhost/server-status \
   op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
   timeout=60s
pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15
pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15
pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
   resource create SecondaryUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
   resource clone dlm clone-max=2 clone-node-max=1 interleave=true
pcs -f tmp-cib.xml resource clone WWWMount interleave=true
pcs -f tmp-cib.xml resource clone WebServer interleave=true
pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
pcs -f 

Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-11 Thread Dan Ragle



On 9/11/2018 1:59 AM, Andrei Borzenkov wrote:

07.09.2018 23:07, Dan Ragle пишет:

On an active-active two node cluster with DRBD, dlm, filesystem mounts,
a Web Server, and some crons I can't figure out how to have the crons
jump from node to node in the correct order. Specifically, I have two
crontabs (managed via symlink creation/deletion) which normally will run
one on node1 and the other on node2. When a node goes down, I want both
to run on the remaining node until the original node comes back up, at
which time they should split the nodes again. However, when returning to
the original node the crontab that is being moved must wait until the
underlying FS mount is done on the original node before jumping.

DRBD, dlm, the filesystem mounts and the Web Server are all working as
expected; when I mark the second node as standby Apache stops, the FS
unmounts, dlm stops, and DRBD stops on the node; and when I mark that
same node unstandby the reverse happens as expected. All three of those
are cloned resources.

The crontab resources are not cloned and create symlinks, one resource
preferring the first node and the other preferring the second. Each is
colocated and order dependent on the filesystem mounts (which in turn
are colocated and dependent on dlm, which in turn is colocated and
dependent on DRBD promotion). I thought this would be sufficient, but
when the original node is marked unstandby the crontab that prefers to
be on that node attempts to jump over immediately before the FS is
mounted on that node. Of course the crontab link fails because the
underlying filesystem hasn't been mounted yet.

pcs version is 0.9.162.

Here's the obfuscated detailed list of commands for the config. I'm
still trying to set it up so it's not production-ready yet, but want to
get this much sorted before I add too much more.

# pcs config export pcs-commands
#!/usr/bin/sh
# sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
# invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
# targeting system: ('linux', 'centos', '7.5.1804', 'Core')
# using interpreter: CPython 2.7.5
pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
pcs cluster setup --name MyCluster \
   node1.mydomain.com node2.mydomain.com --transport udpu
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml property set no-quorum-policy=freeze
pcs -f tmp-cib.xml resource defaults resource-stickiness=100
pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
   op demote interval=0s timeout=90 monitor interval=60s notify
interval=0s \
   timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
   start interval=0s timeout=240 stop interval=0s timeout=100
pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
   allow_stonith_disabled=1 \
   op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
   timeout=100
pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
   device=/dev/drbd1 directory=/var/www fstype=gfs2 \
   options=_netdev,nodiratime,noatime \
   op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
   interval=0s timeout=120s stop interval=0s timeout=120s
pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
   configfile=/etc/httpd/conf/httpd.conf
statusurl=http://localhost/server-status \
   op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
   timeout=60s
pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15
pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15
pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
   resource create SecondaryUserCrons ocf:heartbeat:symlink \
   link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
   interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
   resource clone dlm clone-max=2 clone-node-max=1 interleave=true
pcs -f tmp-cib.xml resource clone WWWMount interleave=true
pcs -f tmp-cib.xml resource clone WebServer interleave=true
pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
pcs -f tmp-cib.xml \
   resource master DRBDClone DRBD 

Re: [ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-10 Thread Andrei Borzenkov
07.09.2018 23:07, Dan Ragle пишет:
> On an active-active two node cluster with DRBD, dlm, filesystem mounts,
> a Web Server, and some crons I can't figure out how to have the crons
> jump from node to node in the correct order. Specifically, I have two
> crontabs (managed via symlink creation/deletion) which normally will run
> one on node1 and the other on node2. When a node goes down, I want both
> to run on the remaining node until the original node comes back up, at
> which time they should split the nodes again. However, when returning to
> the original node the crontab that is being moved must wait until the
> underlying FS mount is done on the original node before jumping.
> 
> DRBD, dlm, the filesystem mounts and the Web Server are all working as
> expected; when I mark the second node as standby Apache stops, the FS
> unmounts, dlm stops, and DRBD stops on the node; and when I mark that
> same node unstandby the reverse happens as expected. All three of those
> are cloned resources.
> 
> The crontab resources are not cloned and create symlinks, one resource
> preferring the first node and the other preferring the second. Each is
> colocated and order dependent on the filesystem mounts (which in turn
> are colocated and dependent on dlm, which in turn is colocated and
> dependent on DRBD promotion). I thought this would be sufficient, but
> when the original node is marked unstandby the crontab that prefers to
> be on that node attempts to jump over immediately before the FS is
> mounted on that node. Of course the crontab link fails because the
> underlying filesystem hasn't been mounted yet.
> 
> pcs version is 0.9.162.
> 
> Here's the obfuscated detailed list of commands for the config. I'm
> still trying to set it up so it's not production-ready yet, but want to
> get this much sorted before I add too much more.
> 
> # pcs config export pcs-commands
> #!/usr/bin/sh
> # sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
> # invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
> # targeting system: ('linux', 'centos', '7.5.1804', 'Core')
> # using interpreter: CPython 2.7.5
> pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
> pcs cluster setup --name MyCluster \
>   node1.mydomain.com node2.mydomain.com --transport udpu
> pcs cluster start --all --wait=60
> pcs cluster cib tmp-cib.xml
> cp tmp-cib.xml tmp-cib.xml.deltasrc
> pcs -f tmp-cib.xml property set stonith-enabled=false
> pcs -f tmp-cib.xml property set no-quorum-policy=freeze
> pcs -f tmp-cib.xml resource defaults resource-stickiness=100
> pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
>   op demote interval=0s timeout=90 monitor interval=60s notify
> interval=0s \
>   timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
>   start interval=0s timeout=240 stop interval=0s timeout=100
> pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
>   allow_stonith_disabled=1 \
>   op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
>   timeout=100
> pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
>   device=/dev/drbd1 directory=/var/www fstype=gfs2 \
>   options=_netdev,nodiratime,noatime \
>   op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
>   interval=0s timeout=120s stop interval=0s timeout=120s
> pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
>   configfile=/etc/httpd/conf/httpd.conf
> statusurl=http://localhost/server-status \
>   op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
>   timeout=60s
> pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
>   link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
>   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>   interval=0s timeout=15
> pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
>   link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
>   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>   interval=0s timeout=15
> pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
>   link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
>   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>   interval=0s timeout=15 meta resource-stickiness=0
> pcs -f tmp-cib.xml \
>   resource create SecondaryUserCrons ocf:heartbeat:symlink \
>   link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
>   op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
>   interval=0s timeout=15 meta resource-stickiness=0
> pcs -f tmp-cib.xml \
>   resource clone dlm clone-max=2 clone-node-max=1 interleave=true
> pcs -f tmp-cib.xml resource clone WWWMount interleave=true
> pcs -f tmp-cib.xml resource clone WebServer interleave=true
> pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
> pcs -f tmp-cib.xml 

[ClusterLabs] Non-cloned resource moves before cloned resource startup on unstandby

2018-09-07 Thread Dan Ragle
On an active-active two node cluster with DRBD, dlm, filesystem mounts, a Web Server, and some crons I can't figure out how to have 
the crons jump from node to node in the correct order. Specifically, I have two crontabs (managed via symlink creation/deletion) 
which normally will run one on node1 and the other on node2. When a node goes down, I want both to run on the remaining node until 
the original node comes back up, at which time they should split the nodes again. However, when returning to the original node the 
crontab that is being moved must wait until the underlying FS mount is done on the original node before jumping.


DRBD, dlm, the filesystem mounts and the Web Server are all working as expected; when I mark the second node as standby Apache 
stops, the FS unmounts, dlm stops, and DRBD stops on the node; and when I mark that same node unstandby the reverse happens as 
expected. All three of those are cloned resources.


The crontab resources are not cloned and create symlinks, one resource preferring the first node and the other preferring the 
second. Each is colocated and order dependent on the filesystem mounts (which in turn are colocated and dependent on dlm, which in 
turn is colocated and dependent on DRBD promotion). I thought this would be sufficient, but when the original node is marked 
unstandby the crontab that prefers to be on that node attempts to jump over immediately before the FS is mounted on that node. Of 
course the crontab link fails because the underlying filesystem hasn't been mounted yet.


pcs version is 0.9.162.

Here's the obfuscated detailed list of commands for the config. I'm still trying to set it up so it's not production-ready yet, but 
want to get this much sorted before I add too much more.


# pcs config export pcs-commands
#!/usr/bin/sh
# sequence generated on 2018-09-07 15:21:15 with: clufter 0.77.0
# invoked as: ['/usr/sbin/pcs', 'config', 'export', 'pcs-commands']
# targeting system: ('linux', 'centos', '7.5.1804', 'Core')
# using interpreter: CPython 2.7.5
pcs cluster auth node1.mydomain.com node2.mydomain.com <> /dev/tty
pcs cluster setup --name MyCluster \
  node1.mydomain.com node2.mydomain.com --transport udpu
pcs cluster start --all --wait=60
pcs cluster cib tmp-cib.xml
cp tmp-cib.xml tmp-cib.xml.deltasrc
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml property set no-quorum-policy=freeze
pcs -f tmp-cib.xml resource defaults resource-stickiness=100
pcs -f tmp-cib.xml resource create DRBD ocf:linbit:drbd drbd_resource=r0 \
  op demote interval=0s timeout=90 monitor interval=60s notify interval=0s \
  timeout=90 promote interval=0s timeout=90 reload interval=0s timeout=30 \
  start interval=0s timeout=240 stop interval=0s timeout=100
pcs -f tmp-cib.xml resource create dlm ocf:pacemaker:controld \
  allow_stonith_disabled=1 \
  op monitor interval=60s start interval=0s timeout=90 stop interval=0s \
  timeout=100
pcs -f tmp-cib.xml resource create WWWMount ocf:heartbeat:Filesystem \
  device=/dev/drbd1 directory=/var/www fstype=gfs2 \
  options=_netdev,nodiratime,noatime \
  op monitor interval=20 timeout=40 notify interval=0s timeout=60 start \
  interval=0s timeout=120s stop interval=0s timeout=120s
pcs -f tmp-cib.xml resource create WebServer ocf:heartbeat:apache \
  configfile=/etc/httpd/conf/httpd.conf 
statusurl=http://localhost/server-status \
  op monitor interval=1min start interval=0s timeout=40s stop interval=0s \
  timeout=60s
pcs -f tmp-cib.xml resource create SharedRootCrons ocf:heartbeat:symlink \
  link=/etc/cron.d/root-shared target=/var/www/crons/root-shared \
  op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
  interval=0s timeout=15
pcs -f tmp-cib.xml resource create SharedUserCrons ocf:heartbeat:symlink \
  link=/etc/cron.d/User-shared target=/var/www/crons/User-shared \
  op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
  interval=0s timeout=15
pcs -f tmp-cib.xml resource create PrimaryUserCrons ocf:heartbeat:symlink \
  link=/etc/cron.d/User-server1 target=/var/www/crons/User-server1 \
  op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
  interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
  resource create SecondaryUserCrons ocf:heartbeat:symlink \
  link=/etc/cron.d/User-server2 target=/var/www/crons/User-server2 \
  op monitor interval=60 timeout=15 start interval=0s timeout=15 stop \
  interval=0s timeout=15 meta resource-stickiness=0
pcs -f tmp-cib.xml \
  resource clone dlm clone-max=2 clone-node-max=1 interleave=true
pcs -f tmp-cib.xml resource clone WWWMount interleave=true
pcs -f tmp-cib.xml resource clone WebServer interleave=true
pcs -f tmp-cib.xml resource clone SharedRootCrons interleave=true
pcs -f tmp-cib.xml resource clone SharedUserCrons interleave=true
pcs -f tmp-cib.xml \
  resource master DRBDClone DRBD master-node-max=1 clone-max=2 master-max=2 \
  interleave=true notify=true clone-node-max=1
pcs -f