We have an AWX with  2 cluster K8S configuration (common external postgress 
db), with container instances as execution envs.

When we fire up only 1 cluster, all works fine.

When we bring up the second cluster, the “awx.main.wsrelay” will try to 
connect from pods in cluster1 to pods on cluster2 (and the other way 
around).
Because it can’t find the other pods coroutine 
'WebSocketRelayManager.cleanup_offline_host' fails, and it’s marking its 
own pod as failing.

In the end, all TASK pods are restarted until Backoff.

Can we isolate somehow the Websocket relay system for “Heartbeet” & 
“Wsrelay”, and group the pods per cluster?

Or this behaviour is a bug? (
https://github.com/ansible/awx/blob/devel/docs/websockets.md)

 *Logs:* 

awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered 
RUNNING state, process has stayed up for > than 1 seconds (startsecs)       
           

│ awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered 
RUNNING state, process has stayed up for > than 1 seconds (startsecs)       
           

│ awx-test2-task 2023-07-20 10:34:46,739 INFO     [-] 
awx.main.commands.run_callback_receiver Callback receiver started with 
pid=50                                

│ awx-test2-task 2023-07-20 10:34:46,764 INFO     [-] awx.main.wsrelay 
Active instance with hostname awx-test2-task-5<>7bsn8 is registered.       
         

│ awx-test2-task 2023-07-20 10:34:46,807 WARNING  [-] 
awx.main.dispatch.periodic periodic beat started                           
                                   

│ awx-test2-task 2023-07-20 10:34:46,832 INFO     [-] awx.main.dispatch 
Running worker dispatcher listening to queues ['tower_broadcast_all', 
'tower_settings_change', 'awx-test2-task-<>-7bsn8'] │

│ awx-test2-task 2023-07-20 10:34:56,776 INFO     [-] awx.main.wsrelay 
Adding {'awx-test2-web-6<>d7-tzscp', 'awx-test2-web-6<>7cdd7-xqw29', 
'awx-test1-web-6<>c-29wzn', 'awx-test1-web-6<>8 │

│ awx-test2-task 2023-07-20 10:34:56,794 INFO     [-] awx.main.wsrelay 
Connection from awx-test2-task-5<>5-7bsn8 to 198.0.0.0 established.         
     

│ awx-test2-task 2023-07-20 10:34:56,795 INFO     [-] awx.main.wsrelay 
Starting producer for metrics                                               
                 

│ awx-test2-task 2023-07-20 10:34:56,798 INFO     [-] awx.main.wsrelay 
Connection from awx-test2-task-584bdc44f5-7bsn8 to 198.0.0.0 established.   
           

│ awx-test2-task 2023-07-20 10:34:56,798 INFO     [-] awx.main.wsrelay 
Starting producer for metrics                                               
                 

│ awx-test2-task 2023-07-20 10:35:06,780 INFO     [-] awx.main.wsrelay 
Removing {'awx-test1-web-6<>c-29wzn', 'awx-test1-web-68<>fc-zx8sf'} from 
websocket broadcast list                        │

│ awx-test2-task /usr/lib64/python3.9/asyncio/events.py:80: RuntimeWarning: 
coroutine 'WebSocketRelayManager.cleanup_offline_host' was never awaited   
             

│ awx-test2-task   self._context.run(self._callback, *self._args)           
                                                                            
           

│ awx-test2-task RuntimeWarning: Enable tracemalloc to get the object 
allocation traceback                                                       
                   

│ awx-test2-task 2023-07-20 10:35:06,789 WARNING  [-] awx.main.wsrelay 
Connection from awx-test2-task-5<>5-7bsn8 to 172.0.0.x cancelled.    ->> 
Cluster1            

│ awx-test2-task 2023-07-20 10:35:06,790 WARNING  [-] awx.main.wsrelay 
Connection from awx-test2-task-5<>5-7bsn8 to 172.x.x.x.x cancelled.    ->> 
Cluster1            

│ awx-test2-task 2023-07-20 10:35:06,791 WARNING  [-] awx.main.wsrelay 
Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled.    ->> 
Cluster2            

│ awx-test2-task 2023-07-20 10:35:06,793 WARNING  [-] awx.main.wsrelay 
Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled.    ->> 
Cluster2

awx-test2-task Traceback (most recent call last):                           
                                                                            
         

│ awx-test2-task   File "/usr/bin/awx-manage", line 8, in <module>         
                                                                            
             

│ awx-test2-task     sys.exit(manage())                                     
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/__init__.py", line 
200, in manage                                                  

│ awx-test2-task     execute_from_command_line(sys.argv)                   
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py",
 
line 442, in execute_from_command_line            

│ awx-test2-task     utility.execute()                                     
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py",
 
line 436, in execute                              

│ awx-test2-task     
self.fetch_command(subcommand).run_from_argv(self.argv)                     
                                                                   

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py",
 
line 412, in run_from_argv                            

│ awx-test2-task     self.execute(*args, **cmd_options)                     
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py",
 
line 458, in execute                                  

│ awx-test2-task     output = self.handle(*args, **options)                 
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/management/commands/run_wsrelay.py",
 
line 168, in handle                      

│ awx-test2-task     asyncio.run(websocket_relay_manager.run())             
                                                                            
           

│ awx-test2-task   File "/usr/lib64/python3.9/asyncio/runners.py", line 44, 
in run                                                                     
             

│ awx-test2-task     return loop.run_until_complete(main)                   
                                                                            
           

│ awx-test2-task   File "/usr/lib64/python3.9/asyncio/base_events.py", line 
647, in run_until_complete                                                 
             

│ awx-test2-task     return future.result()                                 
                                                                            
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", 
line 330, in run                                                

│ awx-test2-task     await asyncio.gather(self.cleanup_offline_host(h) for 
h in deleted_remote_hosts)                                                 
             

│ awx-test2-task   File 
"/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", 
line 330, in <genexpr>                                            

│ awx-test2-task     await asyncio.gather(self.cleanup_offline_host(h) for 
h in deleted_remote_hosts)                                                 
             

│ awx-test2-task RuntimeError: Task got bad yield: <coroutine object 
WebSocketRelayManager.cleanup_offline_host at 0x<>40>                       
           

│ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 
1; not expected)                                                           
               

│ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 
1; not expected)                                                           
             

│ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 
133                                                                         
             

│ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 
133                                                                         
               

│ awx-test2-task 2023-07-20 10:35:11,359 INFO     [-] awx.main.wsrelay 
Active instance with hostname awx-test2-task-58<>5-7bsn8 is registered.

 

 

Repeats N times,

and then: removed self from capacit

 

2023-07-20 11:00:48,825 INFO gave up: wsrelay entered FATAL state, too many 
start retries too quickly                                                   
                                 │

│ awx-test2-task Processing Event: ver:3.0 server:supervisor serial:0 
pool:superwatcher poolserial:0 eventname:PROCESS_STATE_FATAL len:64         
                                                        │

│ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating 
exit request                                                               
                                                     │

│ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating 
exit request                                                               
                                                     │

│ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, 
dispatcher, callback-receiver to die                                       
                                                       │

│ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, 
dispatcher, callback-receiver to die                                       
                                                       │

│ awx-test2-task 2023-07-20 11:00:49,829 WARNING 
 [24ff42c8c9c64921a6097197bec680a3] awx.main.dispatch received SIGTERM, 
stopping                                                                   
      │

│ awx-test2-task 2023-07-20 11:00:49,828 WARNING  [-] 
awx.main.commands.run_callback_receiver received SIGTERM, stopping         
                                                                         │

│ awx-test2-task 2023-07-20 11:00:49,893 WARNING 
 [24ff42c8c9c64921a6097197bec680a3] awx.main.tasks.system Normal shutdown 
signal for instance awx-test2-task-584bdc44f5-qfs4d, removed self from 
capacit │

│ awx-test2-task 2023-07-20 11:00:50,432 INFO stopped: dispatcher (exit 
status 0) 

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-devel/2478568b-2682-41b7-b6ce-658530ead86an%40googlegroups.com.

Reply via email to