Re: [ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Madison Kelly



On January 11, 2023 2:26:57 a.m. EST, Ulrich Windl 
 wrote:
 Madison Kelly  schrieb am 11.01.2023 um 06:21 in 
 Nachricht
><74df2c8e-1cff-ba07-7f4a-070be296b...@alteeve.com>:
>> On 2023-01-11 00:14, Madison Kelly wrote:
>>> Hi all,
>>> 
>>> Edit: Last message was in HTML format, sorry about that.
>>> 
>>>I've got a hell of a weird problem, and I am absolutely stumped on 
>>> what's going on.
>>> 
>>>The short of it is; if my RA is called from the command line, it's 
>>> fine. If a resource exists, monitor, enable, disable, all that stuff 
>>> works just fine. If I try to create a resource, it hangs on the validate 
>>> stage. Specifically, it hangs when 'pcs' calls:
>>> 
>>> crm_resource --validate --output-as xml --class ocf --agent server 
>>> --provider alteeve --option name=
>>> 
>>>Specifically, it hangs when it tries to make a shell call (to virsh, 
>>> specifically, but that doesn't matter). So to debug, I started stripping 
>>> down my RA simpler and simpler until I was left with the very most basic 
>>> of programs;
>>> 
>>> https://pastebin.com/VtSpkwMr 
>>> 
>>>That is literally the simplest program I could write that made the 
>>> shell call. The 'open()' call is where it hangs.
>>> 
>>> When I call directly;
>>> 
>>> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
>>> srv04-test; echo rc:$?
>>> 
>>> 
>>> real0m0.061s
>>> user0m0.037s
>>> sys0m0.014s
>>> rc:0
>>> 
>>> 
>>> It's just fine. I can see in the log the output from the 'virsh' call as 
>>> well. However, when I call from crm_resource;
>>> 
>>> time crm_resource --validate --output-as xml --class ocf --agent server 
>>> --provider alteeve --option name=srv04-test; echo rc:$?
>>> 
>>> 
>>> 
>>>>> provider="alteeve">
>>>  
>>>  >> execution_message="Timed Out" reason="Resource agent did not exit within 
>>> specified timeout"/>
>>>
>>>
>>>  
>>>crm_resource: Error performing operation: Error 
>>> occurred
>>>  
>>>
>>> 
>>> 
>>> real0m20.521s
>>> user0m0.022s
>>> sys0m0.010s
>>> rc:1
>>> 
>>> 
>>> In the log file, I see (from line 20 of the super-simple-test-script):
>>> 
>>> 
>>> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
>>> /usr/bin/echo return_code:0 |]
>>> 
>
>In VirtualDomain RA I found a similar command (assuming that works):
> virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} >
> ${CFGTMP}
>
>virsh is somewhat strange; libvirtd is running, right?

Yes, I can call the RA directly, then immediately call crm_resource, or reverse 
order, always the same results.

Again, same calls work fine when enabling, disabling, etc. So weird...

>>> 
>>> Then nothing else.
>>> 
>>> The strace output is: https://pastebin.com/raw/UCEUdBeP 
>>> 
>>> Environment;
>>> 
>>> * selinux is permissive
>>> * Pacemaker 2.1.5-4.el8
>>> * pcs 0.10.15
>>> * 4.18.0-408.el8.x86_64
>>> * CentOS Stream release 8
>>> 
>>> Any help is appreciated, I am stumped. :/
>> 
>> After sending this, I tried having my "RA" call 'hostname', and that 
>> worked fine. I switched back to 'virsh list --all', and that hangs. So 
>> it seems to somehow be related to call 'virsh' specifically.
>> 
>> -- 
>> Madison Kelly
>> Alteeve's Niche!
>> Chief Technical Officer
>> c: +1-647-471-0951
>> https://alteeve.com/ 
>> 
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> ClusterLabs home: https://www.clusterlabs.org/ 
>
>
>
>
>___
>Manage your subscription:
>https://lists.clusterlabs.org/mailman/listinfo/users
>
>ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Ulrich Windl
>>> Madison Kelly  schrieb am 11.01.2023 um 06:21 in 
>>> Nachricht
<74df2c8e-1cff-ba07-7f4a-070be296b...@alteeve.com>:
> On 2023-01-11 00:14, Madison Kelly wrote:
>> Hi all,
>> 
>> Edit: Last message was in HTML format, sorry about that.
>> 
>>I've got a hell of a weird problem, and I am absolutely stumped on 
>> what's going on.
>> 
>>The short of it is; if my RA is called from the command line, it's 
>> fine. If a resource exists, monitor, enable, disable, all that stuff 
>> works just fine. If I try to create a resource, it hangs on the validate 
>> stage. Specifically, it hangs when 'pcs' calls:
>> 
>> crm_resource --validate --output-as xml --class ocf --agent server 
>> --provider alteeve --option name=
>> 
>>Specifically, it hangs when it tries to make a shell call (to virsh, 
>> specifically, but that doesn't matter). So to debug, I started stripping 
>> down my RA simpler and simpler until I was left with the very most basic 
>> of programs;
>> 
>> https://pastebin.com/VtSpkwMr 
>> 
>>That is literally the simplest program I could write that made the 
>> shell call. The 'open()' call is where it hangs.
>> 
>> When I call directly;
>> 
>> time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
>> srv04-test; echo rc:$?
>> 
>> 
>> real0m0.061s
>> user0m0.037s
>> sys0m0.014s
>> rc:0
>> 
>> 
>> It's just fine. I can see in the log the output from the 'virsh' call as 
>> well. However, when I call from crm_resource;
>> 
>> time crm_resource --validate --output-as xml --class ocf --agent server 
>> --provider alteeve --option name=srv04-test; echo rc:$?
>> 
>> 
>> 
>>> provider="alteeve">
>>  
>>  > execution_message="Timed Out" reason="Resource agent did not exit within 
>> specified timeout"/>
>>
>>
>>  
>>crm_resource: Error performing operation: Error 
>> occurred
>>  
>>
>> 
>> 
>> real0m20.521s
>> user0m0.022s
>> sys0m0.010s
>> rc:1
>> 
>> 
>> In the log file, I see (from line 20 of the super-simple-test-script):
>> 
>> 
>> Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
>> /usr/bin/echo return_code:0 |]
>> 

In VirtualDomain RA I found a similar command (assuming that works):
 virsh $VIRSH_OPTIONS dumpxml --inactive --security-info ${DOMAIN_NAME} >
 ${CFGTMP}

virsh is somewhat strange; libvirtd is running, right?

>> 
>> Then nothing else.
>> 
>> The strace output is: https://pastebin.com/raw/UCEUdBeP 
>> 
>> Environment;
>> 
>> * selinux is permissive
>> * Pacemaker 2.1.5-4.el8
>> * pcs 0.10.15
>> * 4.18.0-408.el8.x86_64
>> * CentOS Stream release 8
>> 
>> Any help is appreciated, I am stumped. :/
> 
> After sending this, I tried having my "RA" call 'hostname', and that 
> worked fine. I switched back to 'virsh list --all', and that hangs. So 
> it seems to somehow be related to call 'virsh' specifically.
> 
> -- 
> Madison Kelly
> Alteeve's Niche!
> Chief Technical Officer
> c: +1-647-471-0951
> https://alteeve.com/ 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] RA hangs when called by crm_resource

2023-01-10 Thread Ulrich Windl
Depending on the langiuage your RA is written in, you could debug it, or try 
ocf-tester to debug your RA.
For shell scripts you could add some "ocf_log debug ..." statements.

>>> Madison Kelly  schrieb am 11.01.2023 um 06:11 in 
>>> Nachricht
<06935f6a-a858-c8fe-7b81-168157e5c...@alteeve.com>:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Reid Wahl
On Tue, Jan 10, 2023 at 10:14 PM Vladislav Bogdanov
 wrote:
>
> I suspect that valudate action is run as a non-root user.

As far as I know, both the direct command and crm_resource **should**
be running the agent as the same user, as long as Madison is running
both commands as the same user.

For what it's worth, I copied your test script to my machine (Fedora
36 using the current upstream main of Pacemaker) and it worked fine
both directly and via crm_resource. At the moment I'm not able to dig
very deeply, but I do wonder if it's either a bug that's since been
fixed, or perhaps an environment issue.

To try to rule out the former, do you have a test environment where
you can try to reproduce it on the latest Pacemaker from upstream?

>
> Madison Kelly  11 января 2023 г. 07:06:55 написал:
>
>> On 2023-01-11 00:21, Madison Kelly wrote:
>>>
>>> On 2023-01-11 00:14, Madison Kelly wrote:

 Hi all,

 Edit: Last message was in HTML format, sorry about that.

I've got a hell of a weird problem, and I am absolutely stumped on
 what's going on.

The short of it is; if my RA is called from the command line, it's
 fine. If a resource exists, monitor, enable, disable, all that stuff
 works just fine. If I try to create a resource, it hangs on the
 validate stage. Specifically, it hangs when 'pcs' calls:

 crm_resource --validate --output-as xml --class ocf --agent server
 --provider alteeve --option name=

Specifically, it hangs when it tries to make a shell call (to
 virsh, specifically, but that doesn't matter). So to debug, I started
 stripping down my RA simpler and simpler until I was left with the
 very most basic of programs;

 https://pastebin.com/VtSpkwMr

That is literally the simplest program I could write that made the
 shell call. The 'open()' call is where it hangs.

 When I call directly;

 time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
 srv04-test; echo rc:$?

 
 real0m0.061s
 user0m0.037s
 sys0m0.014s
 rc:0
 

 It's just fine. I can see in the log the output from the 'virsh' call
 as well. However, when I call from crm_resource;

 time crm_resource --validate --output-as xml --class ocf --agent
 server --provider alteeve --option name=srv04-test; echo rc:$?

 
 
>>> provider="alteeve">
  
  >>> execution_message="Timed Out" reason="Resource agent did not exit
 within specified timeout"/>


  
crm_resource: Error performing operation: Error
 occurred
  

 

 real0m20.521s
 user0m0.022s
 sys0m0.010s
 rc:1
 

 In the log file, I see (from line 20 of the super-simple-test-script):

 
 Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
 /usr/bin/echo return_code:0 |]
 

 Then nothing else.

 The strace output is: https://pastebin.com/raw/UCEUdBeP

 Environment;

 * selinux is permissive
 * Pacemaker 2.1.5-4.el8
 * pcs 0.10.15
 * 4.18.0-408.el8.x86_64
 * CentOS Stream release 8

 Any help is appreciated, I am stumped. :/
>>>
>>>
>>> After sending this, I tried having my "RA" call 'hostname', and that
>>> worked fine. I switched back to 'virsh list --all', and that hangs. So
>>> it seems to somehow be related to call 'virsh' specifically.
>>>
>>
>> OK, so more info... Knowing now that it's a problem with the virsh call
>> specifically (but only when validating, existing VMs monitor, enable,
>> disable fine, all which repeatedly call virsh), I noticed a few things.
>>
>> First, I see in the logs:
>>
>> 
>> Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
>> Connection reset by peer
>> 
>>
>> So with this, I further simplified my test script to this:
>>
>> https://pastebin.com/Ey8FdL1t
>>
>> Then when I ran my test script directly, the strace output is:
>>
>> Good: https://pastebin.com/Trbq67ub
>>
>> When my script is called via crm_resource, the strace is this:
>>
>> Bad: https://pastebin.com/jtbzHrUM
>>
>> The first difference I can see happens around line 929 in the good
>> paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
>> exists, which doesn't in the bad paste. Shortly after, I start seeing:
>>
>> 
>> line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
>> line: [brk(NULL)   = 0x562b7877d000]
>> line: [brk(0x562b787aa000) = 0x562b787aa000]
>> line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
>> 
>>
>> Around line 959 in the bad paste. There are more brk() lines, and not
>> long after the output stops.
>>
>> --
>> Madison Kelly
>> Alteeve's Niche!
>> Chief Technical Officer
>> c: +1-647-471-0951
>> https://alteeve.com/
>>
>> _

Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Vladislav Bogdanov

I suspect that valudate action is run as a non-root user.

Madison Kelly  11 января 2023 г. 07:06:55 написал:


On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

   I've got a hell of a weird problem, and I am absolutely stumped on
what's going on.

   The short of it is; if my RA is called from the command line, it's
fine. If a resource exists, monitor, enable, disable, all that stuff
works just fine. If I try to create a resource, it hangs on the
validate stage. Specifically, it hangs when 'pcs' calls:

crm_resource --validate --output-as xml --class ocf --agent server
--provider alteeve --option name=

   Specifically, it hangs when it tries to make a shell call (to
virsh, specifically, but that doesn't matter). So to debug, I started
stripping down my RA simpler and simpler until I was left with the
very most basic of programs;

https://pastebin.com/VtSpkwMr

   That is literally the simplest program I could write that made the
shell call. The 'open()' call is where it hangs.

When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
srv04-test; echo rc:$?


real0m0.061s
user0m0.037s
sys0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call
as well. However, when I call from crm_resource;

time crm_resource --validate --output-as xml --class ocf --agent
server --provider alteeve --option name=srv04-test; echo rc:$?



   
 
 
   
   
 
   crm_resource: Error performing operation: Error
occurred
 
   


real0m20.521s
user0m0.022s
sys0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
/usr/bin/echo return_code:0 |]


Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that
worked fine. I switched back to 'virsh list --all', and that hangs. So
it seems to somehow be related to call 'virsh' specifically.



OK, so more info... Knowing now that it's a problem with the virsh call
specifically (but only when validating, existing VMs monitor, enable,
disable fine, all which repeatedly call virsh), I noticed a few things.

First, I see in the logs:


Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
Connection reset by peer


So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
exists, which doesn't in the bad paste. Shortly after, I start seeing:


line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
line: [brk(NULL)   = 0x562b7877d000]
line: [brk(0x562b787aa000) = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]


Around line 959 in the bad paste. There are more brk() lines, and not
long after the output stops.

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Madison Kelly

On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

   I've got a hell of a weird problem, and I am absolutely stumped on 
what's going on.


   The short of it is; if my RA is called from the command line, it's 
fine. If a resource exists, monitor, enable, disable, all that stuff 
works just fine. If I try to create a resource, it hangs on the 
validate stage. Specifically, it hangs when 'pcs' calls:


crm_resource --validate --output-as xml --class ocf --agent server 
--provider alteeve --option name=


   Specifically, it hangs when it tries to make a shell call (to 
virsh, specifically, but that doesn't matter). So to debug, I started 
stripping down my RA simpler and simpler until I was left with the 
very most basic of programs;


https://pastebin.com/VtSpkwMr

   That is literally the simplest program I could write that made the 
shell call. The 'open()' call is where it hangs.


When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
srv04-test; echo rc:$?



real    0m0.061s
user    0m0.037s
sys    0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call 
as well. However, when I call from crm_resource;


time crm_resource --validate --output-as xml --class ocf --agent 
server --provider alteeve --option name=srv04-test; echo rc:$?




   provider="alteeve">

 
 execution_message="Timed Out" reason="Resource agent did not exit 
within specified timeout"/>

   
   
 
   crm_resource: Error performing operation: Error 
occurred

 
   


real    0m20.521s
user    0m0.022s
sys    0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
/usr/bin/echo return_code:0 |]



Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that 
worked fine. I switched back to 'virsh list --all', and that hangs. So 
it seems to somehow be related to call 'virsh' specifically.




OK, so more info... Knowing now that it's a problem with the virsh call 
specifically (but only when validating, existing VMs monitor, enable, 
disable fine, all which repeatedly call virsh), I noticed a few things.


First, I see in the logs:


Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data: 
Connection reset by peer



So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good 
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0" 
exists, which doesn't in the bad paste. Shortly after, I start seeing:



line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
line: [brk(NULL)   = 0x562b7877d000]
line: [brk(0x562b787aa000) = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]


Around line 959 in the bad paste. There are more brk() lines, and not 
long after the output stops.


--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Madison Kelly

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

   I've got a hell of a weird problem, and I am absolutely stumped on 
what's going on.


   The short of it is; if my RA is called from the command line, it's 
fine. If a resource exists, monitor, enable, disable, all that stuff 
works just fine. If I try to create a resource, it hangs on the validate 
stage. Specifically, it hangs when 'pcs' calls:


crm_resource --validate --output-as xml --class ocf --agent server 
--provider alteeve --option name=


   Specifically, it hangs when it tries to make a shell call (to virsh, 
specifically, but that doesn't matter). So to debug, I started stripping 
down my RA simpler and simpler until I was left with the very most basic 
of programs;


https://pastebin.com/VtSpkwMr

   That is literally the simplest program I could write that made the 
shell call. The 'open()' call is where it hangs.


When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
srv04-test; echo rc:$?



real    0m0.061s
user    0m0.037s
sys    0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call as 
well. However, when I call from crm_resource;


time crm_resource --validate --output-as xml --class ocf --agent server 
--provider alteeve --option name=srv04-test; echo rc:$?




   provider="alteeve">

     
     execution_message="Timed Out" reason="Resource agent did not exit within 
specified timeout"/>

   
   
     
   crm_resource: Error performing operation: Error 
occurred

     
   


real    0m20.521s
user    0m0.022s
sys    0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
/usr/bin/echo return_code:0 |]



Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that 
worked fine. I switched back to 'virsh list --all', and that hangs. So 
it seems to somehow be related to call 'virsh' specifically.


--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Madison Kelly

Hi all,

Edit: Last message was in HTML format, sorry about that.

  I've got a hell of a weird problem, and I am absolutely stumped on 
what's going on.


  The short of it is; if my RA is called from the command line, it's 
fine. If a resource exists, monitor, enable, disable, all that stuff 
works just fine. If I try to create a resource, it hangs on the validate 
stage. Specifically, it hangs when 'pcs' calls:


crm_resource --validate --output-as xml --class ocf --agent server 
--provider alteeve --option name=


  Specifically, it hangs when it tries to make a shell call (to virsh, 
specifically, but that doesn't matter). So to debug, I started stripping 
down my RA simpler and simpler until I was left with the very most basic 
of programs;


https://pastebin.com/VtSpkwMr

  That is literally the simplest program I could write that made the 
shell call. The 'open()' call is where it hangs.


When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server 
srv04-test; echo rc:$?



real0m0.061s
user0m0.037s
sys0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call as 
well. However, when I call from crm_resource;


time crm_resource --validate --output-as xml --class ocf --agent server 
--provider alteeve --option name=srv04-test; echo rc:$?




  provider="alteeve">


execution_message="Timed Out" reason="Resource agent did not exit within 
specified timeout"/>

  
  

  crm_resource: Error performing operation: Error 
occurred


  


real0m20.521s
user0m0.022s
sys0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1; 
/usr/bin/echo return_code:0 |]



Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/
--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] RA hangs when called by crm_resource

2023-01-10 Thread Madison Kelly

  
  
Hi all,
  I've got a hell of a weird problem, and I am absolutely stumped
  on what's going on. 
  The short of it is; if my RA is called from the command line,
  it's fine. If a resource exists, monitor, enable, disable, all
  that stuff works just fine. If I try to create a resource, it
  hangs on the validate stage. Specifically, it hangs when 'pcs'
  calls: 

crm_resource --validate --output-as xml --class ocf --agent
  server --provider alteeve --option name=
  Specifically, it hangs when it tries to make a shell call (to
  virsh, specifically, but that doesn't matter). So to debug, I
  started stripping down my RA simpler and simpler until I was left
  with the very most basic of programs;
https://pastebin.com/VtSpkwMr
  That is literally the simplest program I could write that made
  the shell call. The 'open()' call is where it hangs. 

When I call directly;
time /usr/lib/ocf/resource.d/alteeve/server --validate-all
  --server srv04-test; echo rc:$?
  
  
  real    0m0.061s
  user    0m0.037s
  sys    0m0.014s
  rc:0
  
It's just fine. I can see in the log the output from the 'virsh'
  call as well. However, when I call from crm_resource;
time crm_resource --validate --output-as xml --class ocf --agent
  server --provider alteeve --option name=srv04-test; echo rc:$?


  
    
      
      
    
    
      
    crm_resource: Error performing operation: Error
  occurred
      
    
  
  
  real    0m20.521s
  user    0m0.022s
  sys    0m0.010s
  rc:1
  
In the log file, I see (from line 20 of the
  super-simple-test-script):

  Calling: [/usr/bin/virsh dumpxml --inactive srv04-test
  2>&1; /usr/bin/echo return_code:0 |]
  

Then nothing else. 

The strace output is: https://pastebin.com/raw/UCEUdBeP
Environment;

* selinux is permissive
  * Pacemaker 2.1.5-4.el8
  * pcs 0.10.15
  * 4.18.0-408.el8.x86_64
  * CentOS Stream release 8

Any help is appreciated, I am stumped. :/

-- 
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/