Re: [ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)
On January 11, 2023 2:26:57 a.m. EST, Ulrich Windl wrote: Madison Kelly schrieb am 11.01.2023 um 06:21 in Nachricht ><74df2c8e-1cff-ba07-7f4a-070be296b...@alteeve.com>: >> On 2023-01-11 00:14, Madison Kelly wrote: >>> Hi all, >>> >>> Edit: Last message was in HTML format, sorry about that. >>> >>>I've got a hell of a weird problem, and I am absolutely stumped on >>> what's going on. >>> >>>The short of it is; if my RA is called from the command line, it's >>> fine. If a resource exists, monitor, enable, disable, all that stuff >>> works just fine. If I try to create a resource, it hangs on the validate >>> stage. Specifically, it hangs when 'pcs' calls: >>> >>> crm_resource --validate --output-as xml --class ocf --agent server >>> --provider alteeve --option name= >>> >>>Specifically, it hangs when it tries to make a shell call (to virsh, >>> specifically, but that doesn't matter). So to debug, I started stripping >>> down my RA simpler and simpler until I was left with the very most basic >>> of programs; >>> >>> https://pastebin.com/VtSpkwMr >>> >>>That is literally the simplest program I could write that made the >>> shell call. The 'open()' call is where it hangs. >>> >>> When I call directly; >>> >>> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server >>> srv04-test; echo rc:$? >>> >>> >>> real0m0.061s >>> user0m0.037s >>> sys0m0.014s >>> rc:0 >>> >>> >>> It's just fine. I can see in the log the output from the 'virsh' call as >>> well. However, when I call from crm_resource; >>> >>> time crm_resource --validate --output-as xml --class ocf --agent server >>> --provider alteeve --option name=srv04-test; echo rc:$? >>> >>> >>> >>>>> provider="alteeve"> >>> >>> >> execution_message="Timed Out" reason="Resource agent did not exit within >>> specified timeout"/> >>> >>> >>> >>>crm_resource: Error performing operation: Error >>> occurred >>> >>> >>> >>> >>> real0m20.521s >>> user0m0.022s >>> sys0m0.010s >>> rc:1 >>> >>> >>> In the log file, I see (from line 20 of the super-simple-test-script): >>> >>> >>> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; >>> /usr/bin/echo return_code:0 |] >>> > >In VirtualDomain RA I found a similar command (assuming that works): > virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} > > ${CFGTMP} > >virsh is somewhat strange; libvirtd is running, right? Yes, I can call the RA directly, then immediately call crm_resource, or reverse order, always the same results. Again, same calls work fine when enabling, disabling, etc. So weird... >>> >>> Then nothing else. >>> >>> The strace output is: https://pastebin.com/raw/UCEUdBeP >>> >>> Environment; >>> >>> * selinux is permissive >>> * Pacemaker 2.1.5-4.el8 >>> * pcs 0.10.15 >>> * 4.18.0-408.el8.x86_64 >>> * CentOS Stream release 8 >>> >>> Any help is appreciated, I am stumped. :/ >> >> After sending this, I tried having my "RA" call 'hostname', and that >> worked fine. I switched back to 'virsh list --all', and that hangs. So >> it seems to somehow be related to call 'virsh' specifically. >> >> -- >> Madison Kelly >> Alteeve's Niche! >> Chief Technical Officer >> c: +1-647-471-0951 >> https://alteeve.com/ >> >> ___ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > > > > >___ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)
>>> Madison Kelly schrieb am 11.01.2023 um 06:21 in >>> Nachricht <74df2c8e-1cff-ba07-7f4a-070be296b...@alteeve.com>: > On 2023-01-11 00:14, Madison Kelly wrote: >> Hi all, >> >> Edit: Last message was in HTML format, sorry about that. >> >>I've got a hell of a weird problem, and I am absolutely stumped on >> what's going on. >> >>The short of it is; if my RA is called from the command line, it's >> fine. If a resource exists, monitor, enable, disable, all that stuff >> works just fine. If I try to create a resource, it hangs on the validate >> stage. Specifically, it hangs when 'pcs' calls: >> >> crm_resource --validate --output-as xml --class ocf --agent server >> --provider alteeve --option name= >> >>Specifically, it hangs when it tries to make a shell call (to virsh, >> specifically, but that doesn't matter). So to debug, I started stripping >> down my RA simpler and simpler until I was left with the very most basic >> of programs; >> >> https://pastebin.com/VtSpkwMr >> >>That is literally the simplest program I could write that made the >> shell call. The 'open()' call is where it hangs. >> >> When I call directly; >> >> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server >> srv04-test; echo rc:$? >> >> >> real0m0.061s >> user0m0.037s >> sys0m0.014s >> rc:0 >> >> >> It's just fine. I can see in the log the output from the 'virsh' call as >> well. However, when I call from crm_resource; >> >> time crm_resource --validate --output-as xml --class ocf --agent server >> --provider alteeve --option name=srv04-test; echo rc:$? >> >> >> >>> provider="alteeve"> >> >> > execution_message="Timed Out" reason="Resource agent did not exit within >> specified timeout"/> >> >> >> >>crm_resource: Error performing operation: Error >> occurred >> >> >> >> >> real0m20.521s >> user0m0.022s >> sys0m0.010s >> rc:1 >> >> >> In the log file, I see (from line 20 of the super-simple-test-script): >> >> >> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; >> /usr/bin/echo return_code:0 |] >> In VirtualDomain RA I found a similar command (assuming that works): virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} > ${CFGTMP} virsh is somewhat strange; libvirtd is running, right? >> >> Then nothing else. >> >> The strace output is: https://pastebin.com/raw/UCEUdBeP >> >> Environment; >> >> * selinux is permissive >> * Pacemaker 2.1.5-4.el8 >> * pcs 0.10.15 >> * 4.18.0-408.el8.x86_64 >> * CentOS Stream release 8 >> >> Any help is appreciated, I am stumped. :/ > > After sending this, I tried having my "RA" call 'hostname', and that > worked fine. I switched back to 'virsh list --all', and that hangs. So > it seems to somehow be related to call 'virsh' specifically. > > -- > Madison Kelly > Alteeve's Niche! > Chief Technical Officer > c: +1-647-471-0951 > https://alteeve.com/ > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] RA hangs when called by crm_resource
Depending on the langiuage your RA is written in, you could debug it, or try ocf-tester to debug your RA. For shell scripts you could add some "ocf_log debug ..." statements. >>> Madison Kelly schrieb am 11.01.2023 um 06:11 in >>> Nachricht <06935f6a-a858-c8fe-7b81-168157e5c...@alteeve.com>: > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)
On Tue, Jan 10, 2023 at 10:14 PM Vladislav Bogdanov wrote: > > I suspect that valudate action is run as a non-root user. As far as I know, both the direct command and crm_resource **should** be running the agent as the same user, as long as Madison is running both commands as the same user. For what it's worth, I copied your test script to my machine (Fedora 36 using the current upstream main of Pacemaker) and it worked fine both directly and via crm_resource. At the moment I'm not able to dig very deeply, but I do wonder if it's either a bug that's since been fixed, or perhaps an environment issue. To try to rule out the former, do you have a test environment where you can try to reproduce it on the latest Pacemaker from upstream? > > Madison Kelly 11 января 2023 г. 07:06:55 написал: > >> On 2023-01-11 00:21, Madison Kelly wrote: >>> >>> On 2023-01-11 00:14, Madison Kelly wrote: Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real0m0.061s user0m0.037s sys0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? >>> provider="alteeve"> >>> execution_message="Timed Out" reason="Resource agent did not exit within specified timeout"/> crm_resource: Error performing operation: Error occurred real0m20.521s user0m0.022s sys0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ >>> >>> >>> After sending this, I tried having my "RA" call 'hostname', and that >>> worked fine. I switched back to 'virsh list --all', and that hangs. So >>> it seems to somehow be related to call 'virsh' specifically. >>> >> >> OK, so more info... Knowing now that it's a problem with the virsh call >> specifically (but only when validating, existing VMs monitor, enable, >> disable fine, all which repeatedly call virsh), I noticed a few things. >> >> First, I see in the logs: >> >> >> Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data: >> Connection reset by peer >> >> >> So with this, I further simplified my test script to this: >> >> https://pastebin.com/Ey8FdL1t >> >> Then when I ran my test script directly, the strace output is: >> >> Good: https://pastebin.com/Trbq67ub >> >> When my script is called via crm_resource, the strace is this: >> >> Bad: https://pastebin.com/jtbzHrUM >> >> The first difference I can see happens around line 929 in the good >> paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0" >> exists, which doesn't in the bad paste. Shortly after, I start seeing: >> >> >> line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] >> line: [brk(NULL) = 0x562b7877d000] >> line: [brk(0x562b787aa000) = 0x562b787aa000] >> line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] >> >> >> Around line 959 in the bad paste. There are more brk() lines, and not >> long after the output stops. >> >> -- >> Madison Kelly >> Alteeve's Niche! >> Chief Technical Officer >> c: +1-647-471-0951 >> https://alteeve.com/ >> >> _
Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)
I suspect that valudate action is run as a non-root user. Madison Kelly 11 января 2023 г. 07:06:55 написал: On 2023-01-11 00:21, Madison Kelly wrote: On 2023-01-11 00:14, Madison Kelly wrote: Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real0m0.061s user0m0.037s sys0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? crm_resource: Error performing operation: Error occurred real0m20.521s user0m0.022s sys0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ After sending this, I tried having my "RA" call 'hostname', and that worked fine. I switched back to 'virsh list --all', and that hangs. So it seems to somehow be related to call 'virsh' specifically. OK, so more info... Knowing now that it's a problem with the virsh call specifically (but only when validating, existing VMs monitor, enable, disable fine, all which repeatedly call virsh), I noticed a few things. First, I see in the logs: Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data: Connection reset by peer So with this, I further simplified my test script to this: https://pastebin.com/Ey8FdL1t Then when I ran my test script directly, the strace output is: Good: https://pastebin.com/Trbq67ub When my script is called via crm_resource, the strace is this: Bad: https://pastebin.com/jtbzHrUM The first difference I can see happens around line 929 in the good paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0" exists, which doesn't in the bad paste. Shortly after, I start seeing: line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] line: [brk(NULL) = 0x562b7877d000] line: [brk(0x562b787aa000) = 0x562b787aa000] line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] Around line 959 in the bad paste. There are more brk() lines, and not long after the output stops. -- Madison Kelly Alteeve's Niche! Chief Technical Officer c: +1-647-471-0951 https://alteeve.com/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)
On 2023-01-11 00:21, Madison Kelly wrote: On 2023-01-11 00:14, Madison Kelly wrote: Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real 0m0.061s user 0m0.037s sys 0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? provider="alteeve"> execution_message="Timed Out" reason="Resource agent did not exit within specified timeout"/> crm_resource: Error performing operation: Error occurred real 0m20.521s user 0m0.022s sys 0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ After sending this, I tried having my "RA" call 'hostname', and that worked fine. I switched back to 'virsh list --all', and that hangs. So it seems to somehow be related to call 'virsh' specifically. OK, so more info... Knowing now that it's a problem with the virsh call specifically (but only when validating, existing VMs monitor, enable, disable fine, all which repeatedly call virsh), I noticed a few things. First, I see in the logs: Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data: Connection reset by peer So with this, I further simplified my test script to this: https://pastebin.com/Ey8FdL1t Then when I ran my test script directly, the strace output is: Good: https://pastebin.com/Trbq67ub When my script is called via crm_resource, the strace is this: Bad: https://pastebin.com/jtbzHrUM The first difference I can see happens around line 929 in the good paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0" exists, which doesn't in the bad paste. Shortly after, I start seeing: line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] line: [brk(NULL) = 0x562b7877d000] line: [brk(0x562b787aa000) = 0x562b787aa000] line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8] Around line 959 in the bad paste. There are more brk() lines, and not long after the output stops. -- Madison Kelly Alteeve's Niche! Chief Technical Officer c: +1-647-471-0951 https://alteeve.com/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)
On 2023-01-11 00:14, Madison Kelly wrote: Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real 0m0.061s user 0m0.037s sys 0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? provider="alteeve"> execution_message="Timed Out" reason="Resource agent did not exit within specified timeout"/> crm_resource: Error performing operation: Error occurred real 0m20.521s user 0m0.022s sys 0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ After sending this, I tried having my "RA" call 'hostname', and that worked fine. I switched back to 'virsh list --all', and that hangs. So it seems to somehow be related to call 'virsh' specifically. -- Madison Kelly Alteeve's Niche! Chief Technical Officer c: +1-647-471-0951 https://alteeve.com/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] RA hangs when called by crm_resource (resending text format)
Hi all, Edit: Last message was in HTML format, sorry about that. I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real0m0.061s user0m0.037s sys0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? provider="alteeve"> execution_message="Timed Out" reason="Resource agent did not exit within specified timeout"/> crm_resource: Error performing operation: Error occurred real0m20.521s user0m0.022s sys0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ -- Madison Kelly Alteeve's Niche! Chief Technical Officer c: +1-647-471-0951 https://alteeve.com/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] RA hangs when called by crm_resource
Hi all, I've got a hell of a weird problem, and I am absolutely stumped on what's going on. The short of it is; if my RA is called from the command line, it's fine. If a resource exists, monitor, enable, disable, all that stuff works just fine. If I try to create a resource, it hangs on the validate stage. Specifically, it hangs when 'pcs' calls: crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name= Specifically, it hangs when it tries to make a shell call (to virsh, specifically, but that doesn't matter). So to debug, I started stripping down my RA simpler and simpler until I was left with the very most basic of programs; https://pastebin.com/VtSpkwMr That is literally the simplest program I could write that made the shell call. The 'open()' call is where it hangs. When I call directly; time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server srv04-test; echo rc:$? real 0m0.061s user 0m0.037s sys 0m0.014s rc:0 It's just fine. I can see in the log the output from the 'virsh' call as well. However, when I call from crm_resource; time crm_resource --validate --output-as xml --class ocf --agent server --provider alteeve --option name=srv04-test; echo rc:$? crm_resource: Error performing operation: Error occurred real 0m20.521s user 0m0.022s sys 0m0.010s rc:1 In the log file, I see (from line 20 of the super-simple-test-script): Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; /usr/bin/echo return_code:0 |] Then nothing else. The strace output is: https://pastebin.com/raw/UCEUdBeP Environment; * selinux is permissive * Pacemaker 2.1.5-4.el8 * pcs 0.10.15 * 4.18.0-408.el8.x86_64 * CentOS Stream release 8 Any help is appreciated, I am stumped. :/ -- Madison Kelly Alteeve's Niche! Chief Technical Officer c: +1-647-471-0951 https://alteeve.com/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/