On Tue, Dec 12, 2023 at 4:50 PM Artem <tyom...@gmail.com> wrote: > > Is there a detailed explanation for resource monitor and start timeouts and > intervals with examples, for dummies? > > my resource configured s follows: > [root@lustre-mds1 ~]# pcs resource show MDT00 > Warning: This command is deprecated and will be removed. Please use 'pcs > resource config' instead. > Resource: MDT00 (class=ocf provider=heartbeat type=Filesystem) > Attributes: MDT00-instance_attributes > device=/dev/mapper/mds00 > directory=/lustre/mds00 > force_unmount=safe > fstype=lustre > Operations: > monitor: MDT00-monitor-interval-20s > interval=20s > timeout=40s > start: MDT00-start-interval-0s > interval=0s > timeout=60s > stop: MDT00-stop-interval-0s > interval=0s > timeout=60s > > I issued manual failover with the following commands: > crm_resource --move -r MDT00 -H lustre-mds1 > > resource tried but returned back with the entries in pacemaker.log like these: > Dec 12 15:53:23 Filesystem(MDT00)[1886100]: INFO: Running start for > /dev/mapper/mds00 on /lustre/mds00 > Dec 12 15:53:45 Filesystem(MDT00)[1886100]: ERROR: Couldn't mount device > [/dev/mapper/mds00] as /lustre/mds00 > > tried again with the same result: > Dec 12 16:11:04 Filesystem(MDT00)[1891333]: INFO: Running start for > /dev/mapper/mds00 on /lustre/mds00 > Dec 12 16:11:26 Filesystem(MDT00)[1891333]: ERROR: Couldn't mount device > [/dev/mapper/mds00] as /lustre/mds00 > > Why it cannot move? >
Because it failed to start this resource on the node selected to run this resource. Maybe the device is missing, maybe the mount point is missing, maybe something else. > Does this 20 sec interval (between start and error) have anything to do with > monitor interval settings? > > [root@lustre-mgs ~]# pcs constraint show --full > Location Constraints: > Resource: MDT00 > Enabled on: > Node: lustre-mds1 (score:100) (id:location-MDT00-lustre-mds1-100) > Node: lustre-mds2 (score:100) (id:location-MDT00-lustre-mds2-100) > Disabled on: > Node: lustre-mgs (score:-INFINITY) > (id:location-MDT00-lustre-mgs--INFINITY) > Node: lustre1 (score:-INFINITY) (id:location-MDT00-lustre1--INFINITY) > Node: lustre2 (score:-INFINITY) (id:location-MDT00-lustre2--INFINITY) > Node: lustre3 (score:-INFINITY) (id:location-MDT00-lustre3--INFINITY) > Node: lustre4 (score:-INFINITY) (id:location-MDT00-lustre4--INFINITY) > Ordering Constraints: > start MGT then start MDT00 (kind:Optional) (id:order-MGT-MDT00-Optional) > start MDT00 then start OST1 (kind:Optional) (id:order-MDT00-OST1-Optional) > start MDT00 then start OST2 (kind:Optional) (id:order-MDT00-OST2-Optional) > > with regards to ordering constraint: OST1 and OST2 are started now, while I'm > exercising MDT00 failover. > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/