Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 04.05.2021 18:43, Matthew Schumacher wrote: > On 5/3/21 7:19 AM, Andrei Borzenkov wrote: >> This was already asked for the same reason. No, there is not. The goal >> of monitor is to find out whether resource is active or not. If >> prerequisite resources are not there, resource cannot be active. >> >>> Is there a way to have a delayed start? >>> >>> At the end of the day, the way VirtualDomain works has been very >>> troublesome for me. The second that the config file isn't available >>> pacemaker thinks that the domain is down and starts kicking the stool >>> from under things, even if the domain is running just fine. It seems to >> You misunderstand what happens. Probes check whether specific resource >> is running on specific node (which allows pacemaker to skip resource >> start if it already active, e.g. after pacemaker service was restarted). >> Then pacemaker recomputes resource distribution. It does it every time >> something changed. So when node2 came back and pacemaker reevaluated >> resource placement node2 became preferred choice. The choice is >> preferred because "crm resource move vm-testvm node2" creates constraint >> that tells exactly that - resource vm-testvm MUST run on node2 if node2 >> is available to run resources. >> >> Pacemaker did exactly what you told it to do. >> >> See "crm resource clear" for the way to remove such constraints. >> > > Thanks for the help, I found the issue. The problem was a boot order > thing. I was starting libvirt after the cluster, so the VM resetting on > unfencing wasn't because of the monitor, it was because it tried to > migrate, found that it couldn't, failed, then stopped the resource and > tried again, but the second time libvirt was up, so it worked. > > I have another question: > You are really better off starting new thread with meaningful subject that will also be visible in archives instead of burying completely unrelated question in the middle of old thread. > How do I access to resource 'reload' action on my custom resource using > crmsh? > > I don't see a 'crm resource reload' and it appears that a restart causes > all of the resources that depend on it to stop/start. My resource does > have a reload action and the 'crm ra info' shows the action, I just > can't seem to call it. > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_reloading_services_after_a_definition_change.html I do not think you cal call any operation manually. With pacemaker you do not request to perform operation - you request desired state and pacemaker decides what operations are necessary to achieve it. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 5/3/21 7:19 AM, Andrei Borzenkov wrote: This was already asked for the same reason. No, there is not. The goal of monitor is to find out whether resource is active or not. If prerequisite resources are not there, resource cannot be active. Is there a way to have a delayed start? At the end of the day, the way VirtualDomain works has been very troublesome for me. The second that the config file isn't available pacemaker thinks that the domain is down and starts kicking the stool from under things, even if the domain is running just fine. It seems to You misunderstand what happens. Probes check whether specific resource is running on specific node (which allows pacemaker to skip resource start if it already active, e.g. after pacemaker service was restarted). Then pacemaker recomputes resource distribution. It does it every time something changed. So when node2 came back and pacemaker reevaluated resource placement node2 became preferred choice. The choice is preferred because "crm resource move vm-testvm node2" creates constraint that tells exactly that - resource vm-testvm MUST run on node2 if node2 is available to run resources. Pacemaker did exactly what you told it to do. See "crm resource clear" for the way to remove such constraints. Thanks for the help, I found the issue. The problem was a boot order thing. I was starting libvirt after the cluster, so the VM resetting on unfencing wasn't because of the monitor, it was because it tried to migrate, found that it couldn't, failed, then stopped the resource and tried again, but the second time libvirt was up, so it worked. I have another question: How do I access to resource 'reload' action on my custom resource using crmsh? I don't see a 'crm resource reload' and it appears that a restart causes all of the resources that depend on it to stop/start. My resource does have a reload action and the 'crm ra info' shows the action, I just can't seem to call it. Thanks, Matt ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 03.05.2021 16:12, Matthew Schumacher wrote: ... > > You are right Andrei. Looking at the logs: > ... > May 03 03:02:41 node2 pacemaker-controld [1283] (do_lrm_rsc_op) info: > Performing key=7:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 > op=vm-testvm_monitor_0 This is probe operation. > May 03 03:02:41 node2 pacemaker-controld [1283] (action_synced_wait) > info: VirtualDomain_meta-data_0[1288] exited with status 0 > May 03 03:02:41 node2 pacemaker-based [1278] (cib_process_request) > info: Forwarding cib_modify operation for section status to all > (origin=local/crmd/8) > May 03 03:02:41 node2 pacemaker-execd [1280] > (process_lrmd_get_rsc_info) info: Agent information for > 'fence-datastore' not in cache > May 03 03:02:41 node2 pacemaker-execd [1280] > (process_lrmd_rsc_register) info: Cached agent information for > 'fence-datastore' > May 03 03:02:41 node2 pacemaker-controld [1283] (do_lrm_rsc_op) info: > Performing key=8:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 > op=fence-datastore_monitor_0 > May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: Configuration > file /datastore/vm/testvm/testvm.xml not readable during probe. > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: Diff: --- 0.1608.23 2 > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: Diff: +++ 0.1608.24 (null) > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: + /cib: @num_updates=24 > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: ++ /cib/status/node_state[@id='2']: > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: ++ > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: ++ id="status-2-.node-unfenced" name="#node-unfenced" value="1620010887"/> > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: ++ > May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) > info: ++ > May 03 03:02:41 node2 pacemaker-based [1278] (cib_process_request) > info: Completed cib_modify operation for section status: OK (rc=0, > origin=node1/attrd/16, version=0.1608.24) > May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: environment is > invalid, resource considered stopped > > When node2 comes back from being fenced (testing a hard failure), it > checks the status of vm-testvm because I previously did a "crm resouce > move vm-testvm node2" so it's trying to put the VirtualDomain resource > back on node2, but calling monitor finds that the config file is missing > because the NFS mount isn't up yet, so it assumes the resource is > stopped (it's not), Resource must be stopped on node2. How can it be started if node just rebooted? Do you start resources manually, outside of pacemaker? > then its confused: > > May 03 03:02:45 VirtualDomain(vm-testvm)[2576]: INFO: Virtual domain > testvm currently has no state, retrying. > May 03 03:02:46 VirtualDomain(vm-testvm)[2576]: INFO: Domain testvm > already stopped. > > Eventually it does end up stopped on node1 and started on node2. > It does exactly what you told it to do. > Is there a way to configure the order so that we don't even run monitor > until the dependent resource is running? > This was already asked for the same reason. No, there is not. The goal of monitor is to find out whether resource is active or not. If prerequisite resources are not there, resource cannot be active. > Is there a way to have a delayed start? > > At the end of the day, the way VirtualDomain works has been very > troublesome for me. The second that the config file isn't available > pacemaker thinks that the domain is down and starts kicking the stool > from under things, even if the domain is running just fine. It seems to You misunderstand what happens. Probes check whether specific resource is running on specific node (which allows pacemaker to skip resource start if it already active, e.g. after pacemaker service was restarted). Then pacemaker recomputes resource distribution. It does it every time something changed. So when node2 came back and pacemaker reevaluated resource placement node2 became preferred choice. The choice is preferred because "crm resource move vm-testvm node2" creates constraint that tells exactly that - resource vm-testvm MUST run on node2 if node2 is available to run resources. Pacemaker did exactly what you told it to do. See "crm resource clear" for the way to remove such constraints. > me that reading the config file is a poor way to test if it's working as > it surely can be up even if the config file is missing, and because it's > generated lots of false positives for me. I wonder why it was written > this way. Wouldn't it make more sense for monitor to get a status from > virsh, monitor does call virsh. Configuration file is checked earlier, during validation. > and th
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 5/2/21 11:10 PM, Andrei Borzenkov wrote: On 03.05.2021 06:27, Matthew Schumacher wrote: On 4/30/21 12:08 PM, Matthew Schumacher wrote: On 4/30/21 11:51 AM, Ken Gaillot wrote: On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that resource. Best Regards, Strahil Nikolov I'm currently working on understanding and implementing this suggestion from Andrei: Which is exactly what clones are for. Clone NFS mount and order VirtualDomain after clone. Just do not forget to set interleave=true so VirtualDomain considers only local clone instance. I tried to use this config, but it's not working for me. I have a group that puts together a ZFS mount (which starts an NFS share), configures some iscsi stuff, and binds a failover IP address: group IP-ZFS-iSCSI fence-datastore zfs-datastore ZFSiSCSI failover-ip Then, I made a mount to that NFS server as a resource: primitive mount-datastore-nfs Filesystem \ params device=":/datastore" directory="/datastore" fstype=nfs op monitor timeout=40s interval=20s Then I made a clone of this: clone clone-mount-datastore-nfs mount-datastore-nfs meta interleave=true target-role=Started So, in theory, the ZFS/NFS server is mounted on all of the nodes with the clone config. Now I define some orders to make sure stuff comes up in order: order mount-datastore-before-vm-testvm Mandatory: clone-mount-datastore-nfs vm-testvm order zfs-datastore-before-mount-datastore Mandatory: IP-ZFS-iSCSI clone-mount-datastore-nfs In theory, when a node comes on line, it should check to make sure IP-ZFS-iSCSI is running somewhere in the cluster, then check the local instance of mount-datastore-nfs to make sure he have the NFS mounts we need, then start vm-testvm, however that doesn't work. If I kill pacemaker on one node, it's fenced, rebooted, and when it comes back I note this in the log: # grep -v pacemaker /var/log/pacemaker/pacemaker.log May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: Configuration file /datastore/vm/testvm/testvm.xml not readable during probe. May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: environment is invalid, resource considered stopped May 03 03:02:42 Filesystem(mount-datastore-nfs)[1442]: INFO: Running start for 172.25.253.110:/dev/datastore-nfs-stub on /datastore May 03 03:02:45 VirtualDomain(vm-testvm)[2576]: INFO: Virtual domain testvm currently has no state, retrying. May 03 03:02:46 VirtualDomain(vm-testvm)[2576]: INFO: Domain testvm already stopped. It is impossible to comment basing on couple of random lines from log. You need to provide full log from DC and the node in question from the moment pacemaker was restarted. But the obvious answer - pacemaker runs probes when it starts and these probes run asynchronously. So this may be simply output of resource agent doing probe. In which case the result is correct - probe found out domain was not running. You are right Andrei. Looking at the logs: May 03 03:02:41 node2 pacemaker-attrd [1281] (attrd_peer_update) notice: Setting #node-unfenced[node2]: (unset) -> 1620010887 | from node1 May 03 03:02:41 node2 pacemaker-execd [1280] (process_lrmd_get_rsc_info) info: Agent information for 'vm-testvm' not in cache May 03 03:02:41 node2 pacemaker-execd [1280] (process_lrmd_rsc_register) info: Cached agent information for 'vm-testvm' May 03 03:02:41 node2 pacemaker-controld [1283] (do_lrm_rsc_op) info: Performing key=7:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 op=vm-testvm_monitor_0 May 03 03:02:41 node2 pacemaker-controld [1283] (action_synced_wait) info: VirtualDomain_meta-data_0[1288] exited with status 0 May 03 03:02:41 node2 pacemaker-based [1278] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/8) May 03 03:02:41 node2 pacemaker-execd [1280] (process_lrmd_get_rsc_info) info: Agent information for 'fence-datastore' not in cache May 03 03:02:41 node2 pacemaker-execd [1280] (process_lrmd_rsc_register) info: Cached agent information for 'fence-datastore' May 03 03:02:41 node2 pacemaker-controld [1283] (do_lrm_rsc_op) info: Performing key=8:1:7:b8b0100c-2951-4d07-83da-27cfc1225718 op=fence-datastore_monitor_0 May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: Configuration file /datastore/vm/testvm/testvm.xml not readable during probe. May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) info: Diff: --- 0.1608.23 2 May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) info: Diff: +++ 0.1608.24 (null) May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) info: + /cib: @num_updates=24 May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) info: ++ /cib/status/node_state[@id='2']: May 03 03:02:41 node2 pacemaker-based [1278] (cib_perform_op) info: ++ May 03 03:02:41
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 03.05.2021 06:27, Matthew Schumacher wrote: > On 4/30/21 12:08 PM, Matthew Schumacher wrote: >> On 4/30/21 11:51 AM, Ken Gaillot wrote: >>> On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that resource. Best Regards, Strahil Nikolov >> >> I'm currently working on understanding and implementing this >> suggestion from Andrei: >> >> Which is exactly what clones are for. Clone NFS mount and order >> VirtualDomain after clone. Just do not forget to set interleave=true so >> VirtualDomain considers only local clone instance. > > I tried to use this config, but it's not working for me. > > I have a group that puts together a ZFS mount (which starts an NFS > share), configures some iscsi stuff, and binds a failover IP address: > > group IP-ZFS-iSCSI fence-datastore zfs-datastore ZFSiSCSI failover-ip > > Then, I made a mount to that NFS server as a resource: > > primitive mount-datastore-nfs Filesystem \ > params device=":/datastore" directory="/datastore" fstype=nfs op > monitor timeout=40s interval=20s > > Then I made a clone of this: > > clone clone-mount-datastore-nfs mount-datastore-nfs meta interleave=true > target-role=Started > > So, in theory, the ZFS/NFS server is mounted on all of the nodes with > the clone config. Now I define some orders to make sure stuff comes up > in order: > > order mount-datastore-before-vm-testvm Mandatory: > clone-mount-datastore-nfs vm-testvm > order zfs-datastore-before-mount-datastore Mandatory: IP-ZFS-iSCSI > clone-mount-datastore-nfs > > In theory, when a node comes on line, it should check to make sure > IP-ZFS-iSCSI is running somewhere in the cluster, then check the local > instance of mount-datastore-nfs to make sure he have the NFS mounts we > need, then start vm-testvm, however that doesn't work. If I kill > pacemaker on one node, it's fenced, rebooted, and when it comes back I > note this in the log: > > > # grep -v pacemaker /var/log/pacemaker/pacemaker.log > May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: Configuration > file /datastore/vm/testvm/testvm.xml not readable during probe. > May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: environment is > invalid, resource considered stopped > May 03 03:02:42 Filesystem(mount-datastore-nfs)[1442]: INFO: Running > start for 172.25.253.110:/dev/datastore-nfs-stub on /datastore > May 03 03:02:45 VirtualDomain(vm-testvm)[2576]: INFO: Virtual domain > testvm currently has no state, retrying. > May 03 03:02:46 VirtualDomain(vm-testvm)[2576]: INFO: Domain testvm > already stopped. > It is impossible to comment basing on couple of random lines from log. You need to provide full log from DC and the node in question from the moment pacemaker was restarted. But the obvious answer - pacemaker runs probes when it starts and these probes run asynchronously. So this may be simply output of resource agent doing probe. In which case the result is correct - probe found out domain was not running. > Looks like the VirtualDomain resource vm-testvm is started before the > Filesystem resource clone-mount-datastore-nfs even though I have this: > > order mount-datastore-before-vm-testvm Mandatory: > clone-mount-datastore-nfs vm-testvm > > I'm not sure what I'm missing. I need to make sure this NFS mount is > started on the local node before starting virtualdomain on that same > node. Should I use the resource instead of the clone in the order > statement? Like this: > > order mount-datastore-before-vm-testvm Mandatory: mount-datastore-nfs > vm-testvm > > Any suggestions appreciated. > > Matt > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 4/30/21 12:08 PM, Matthew Schumacher wrote: On 4/30/21 11:51 AM, Ken Gaillot wrote: On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that resource. Best Regards, Strahil Nikolov I'm currently working on understanding and implementing this suggestion from Andrei: Which is exactly what clones are for. Clone NFS mount and order VirtualDomain after clone. Just do not forget to set interleave=true so VirtualDomain considers only local clone instance. I tried to use this config, but it's not working for me. I have a group that puts together a ZFS mount (which starts an NFS share), configures some iscsi stuff, and binds a failover IP address: group IP-ZFS-iSCSI fence-datastore zfs-datastore ZFSiSCSI failover-ip Then, I made a mount to that NFS server as a resource: primitive mount-datastore-nfs Filesystem \ params device=":/datastore" directory="/datastore" fstype=nfs op monitor timeout=40s interval=20s Then I made a clone of this: clone clone-mount-datastore-nfs mount-datastore-nfs meta interleave=true target-role=Started So, in theory, the ZFS/NFS server is mounted on all of the nodes with the clone config. Now I define some orders to make sure stuff comes up in order: order mount-datastore-before-vm-testvm Mandatory: clone-mount-datastore-nfs vm-testvm order zfs-datastore-before-mount-datastore Mandatory: IP-ZFS-iSCSI clone-mount-datastore-nfs In theory, when a node comes on line, it should check to make sure IP-ZFS-iSCSI is running somewhere in the cluster, then check the local instance of mount-datastore-nfs to make sure he have the NFS mounts we need, then start vm-testvm, however that doesn't work. If I kill pacemaker on one node, it's fenced, rebooted, and when it comes back I note this in the log: # grep -v pacemaker /var/log/pacemaker/pacemaker.log May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: Configuration file /datastore/vm/testvm/testvm.xml not readable during probe. May 03 03:02:41 VirtualDomain(vm-testvm)[1300]: INFO: environment is invalid, resource considered stopped May 03 03:02:42 Filesystem(mount-datastore-nfs)[1442]: INFO: Running start for 172.25.253.110:/dev/datastore-nfs-stub on /datastore May 03 03:02:45 VirtualDomain(vm-testvm)[2576]: INFO: Virtual domain testvm currently has no state, retrying. May 03 03:02:46 VirtualDomain(vm-testvm)[2576]: INFO: Domain testvm already stopped. Looks like the VirtualDomain resource vm-testvm is started before the Filesystem resource clone-mount-datastore-nfs even though I have this: order mount-datastore-before-vm-testvm Mandatory: clone-mount-datastore-nfs vm-testvm I'm not sure what I'm missing. I need to make sure this NFS mount is started on the local node before starting virtualdomain on that same node. Should I use the resource instead of the clone in the order statement? Like this: order mount-datastore-before-vm-testvm Mandatory: mount-datastore-nfs vm-testvm Any suggestions appreciated. Matt ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that resource. Best Regards,Strahil Nikolov On Fri, Apr 30, 2021 at 18:44, Matthew Schumacher wrote: On 4/30/21 8:11 AM, Ken Gaillot wrote >> 2. Make the nfs mount itself a resource and make VirtualDomain >> resources depend on it. In order for this to work each node would >> need >> it's own nfs mount resource, and VirtualDomain resources that can run >> on >> any node would need to depend on the nfs mount resource of whatever >> node >> they decide to run on but not the nfs mount resource of any other >> node. >> I'm not sure how to make this work because the dependency changes >> with >> what node the VirtualDomain resource is started on. > If each VM needs a particular mount, you can clone the NFS server, > create separate groups where groupN = (mountN, vmN), and colocate/order > the groups relative to the clone. > Thanks for the reply, but I'm not sure I follow. Why would I clone the NFS server? I only need one server, and I only need the singular NFS mount on each node. Do you have documentation you could point to so I can catch up in my understanding? Matt ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 4/30/21 11:51 AM, Ken Gaillot wrote: On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: Ken ment yo use 'Filesystem' resourse for mounting that NFS server and then clone that resource. Best Regards, Strahil Nikolov I tried my best to explain in the original post, but like much of this stuff, it's complex and hard to explain. I'll try again. I have a singular NFS server that is a resource in my cluster. It can run on any node. Each node needs to mount this NFS server before VirtualDomain resources can start. I run into cart before the horse because if I mount NFS before starting pacemaker on a booting node, it works fine if that node is connecting to a running cluster as the NFS server is already running, but if that node is the first node starting a cold cluster, it doesn't work because the mount fails as we haven't started pacemaker yet which brings up the NFS server. When pacemaker starts, it starts the VirtualDomain resources, which fail because the NFS mount isn't mounted. The idea would be make the NFS mount on each node a resource using Filesystem and make VirtualDomain depend on that, but then the dependency would need to change based on which node VirtualDomain runs on as it would depend on node1-Filesystem if VirtualDomain runs on node1 and node2-Filesystem if VirtualDomain runs on node2. I'm currently working on understanding and implementing this suggestion from Andrei: Which is exactly what clones are for. Clone NFS mount and order VirtualDomain after clone. Just do not forget to set interleave=true so VirtualDomain considers only local clone instance. It sounds like what I want to do. Matt ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On Fri, 2021-04-30 at 16:20 +, Strahil Nikolov wrote: > Ken ment yo use 'Filesystem' resourse for mounting that NFS server > and then clone that resource. > > Best Regards, > Strahil Nikolov OK, now I'm thinking everyone who reads this has a different interpretation. :) Matthew, can you give more details about your use case? If you only need one NFS server, then it's fine not to clone it. The groups can be ordered after the NFS server without requiring them to be colocated with it. > > On Fri, Apr 30, 2021 at 18:44, Matthew Schumacher > > wrote: > > On 4/30/21 8:11 AM, Ken Gaillot wrote > > >> 2. Make the nfs mount itself a resource and make VirtualDomain > > >> resources depend on it. In order for this to work each node > > would > > >> need > > >> it's own nfs mount resource, and VirtualDomain resources that > > can run > > >> on > > >> any node would need to depend on the nfs mount resource of > > whatever > > >> node > > >> they decide to run on but not the nfs mount resource of any > > other > > >> node. > > >> I'm not sure how to make this work because the dependency > > changes > > >> with > > >> what node the VirtualDomain resource is started on. > > > If each VM needs a particular mount, you can clone the NFS > > server, > > > create separate groups where groupN = (mountN, vmN), and > > colocate/order > > > the groups relative to the clone. > > > > > > > Thanks for the reply, but I'm not sure I follow. Why would I clone > > the > > NFS server? I only need one server, and I only need the singular > > NFS > > mount on each node. > > > > Do you have documentation you could point to so I can catch up in > > my > > understanding? > > > > > > Matt -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 30.04.2021 17:26, Matthew Schumacher wrote: > I have an issue that I'm not sure how to resolve so feedback is welcome. > > I need to mount a local NFS file system on my node before I start a > VirtualDomain resource which depends on it, however, the NFS server is > itself a resource on the cluster. > > This causes the cart before the horse. If I insert a node into a > running cluster, this is pretty simple, mount nfs before starting > pacemaker, and if a VirtualDomain resource is immediately started, we > already have what we need. But that doesn't work on a cold cluster > because if I try to mount NFS on the node before the cluster starts the > NFS server, the mount fails. If I always mount NFS after I start > pacemaker, then pacemaker will usually try to start VirtualDomain > resources before I can get further in the boot and mount NFS which > causes the VirtualDomain resource to fail to start. > > I think I need one of the following fixes: > > 1. Delayed start on VirtualDomain resources so that we give time to get > the file system mounted which feels hackish as it's the old sleep fix > for race conditions. > > 2. Make the nfs mount itself a resource and make VirtualDomain > resources depend on it. In order for this to work each node would need > it's own nfs mount resource, and VirtualDomain resources that can run on > any node would need to depend on the nfs mount resource of whatever node > they decide to run on but not the nfs mount resource of any other node. Which is exactly what clones are for. Clone NFS mount and order VirtualDomain after clone. Just do not forget to set interleave=true so VirtualDomain considers only local clone instance. > I'm not sure how to make this work because the dependency changes with > what node the VirtualDomain resource is started on. > > 3. Make the VirtualDomain resource call a script on start/migrate that > simply looks for the nfs mount, and if missing, try to mount. This seems > less hackish, but will ensure that we always try to get the nfs mount > going the first time the resource is moved/started there. > > Any ideas or thoughts would be very helpful and appreciated. > > Matt > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On 4/30/21 8:11 AM, Ken Gaillot wrote 2. Make the nfs mount itself a resource and make VirtualDomain resources depend on it. In order for this to work each node would need it's own nfs mount resource, and VirtualDomain resources that can run on any node would need to depend on the nfs mount resource of whatever node they decide to run on but not the nfs mount resource of any other node. I'm not sure how to make this work because the dependency changes with what node the VirtualDomain resource is started on. If each VM needs a particular mount, you can clone the NFS server, create separate groups where groupN = (mountN, vmN), and colocate/order the groups relative to the clone. Thanks for the reply, but I'm not sure I follow. Why would I clone the NFS server? I only need one server, and I only need the singular NFS mount on each node. Do you have documentation you could point to so I can catch up in my understanding? Matt ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resolving cart before the horse with mounted filesystems.
On Fri, 2021-04-30 at 07:26 -0700, Matthew Schumacher wrote: > I have an issue that I'm not sure how to resolve so feedback is > welcome. > > I need to mount a local NFS file system on my node before I start a > VirtualDomain resource which depends on it, however, the NFS server > is > itself a resource on the cluster. > > This causes the cart before the horse. If I insert a node into a > running cluster, this is pretty simple, mount nfs before starting > pacemaker, and if a VirtualDomain resource is immediately started, > we > already have what we need. But that doesn't work on a cold cluster > because if I try to mount NFS on the node before the cluster starts > the > NFS server, the mount fails. If I always mount NFS after I start > pacemaker, then pacemaker will usually try to start VirtualDomain > resources before I can get further in the boot and mount NFS which > causes the VirtualDomain resource to fail to start. > > I think I need one of the following fixes: > > 1. Delayed start on VirtualDomain resources so that we give time to > get > the file system mounted which feels hackish as it's the old sleep > fix > for race conditions. > > 2. Make the nfs mount itself a resource and make VirtualDomain > resources depend on it. In order for this to work each node would > need > it's own nfs mount resource, and VirtualDomain resources that can run > on > any node would need to depend on the nfs mount resource of whatever > node > they decide to run on but not the nfs mount resource of any other > node. > I'm not sure how to make this work because the dependency changes > with > what node the VirtualDomain resource is started on. If each VM needs a particular mount, you can clone the NFS server, create separate groups where groupN = (mountN, vmN), and colocate/order the groups relative to the clone. > 3. Make the VirtualDomain resource call a script on start/migrate > that > simply looks for the nfs mount, and if missing, try to mount. This > seems > less hackish, but will ensure that we always try to get the nfs > mount > going the first time the resource is moved/started there. > > Any ideas or thoughts would be very helpful and appreciated. > > Matt > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/