shubhamraj-git opened a new pull request, #67455:
URL: https://github.com/apache/airflow/pull/67455

     The OTel integration test methods (test_export_legacy_metric_names) failed 
intermittently with `Failed: Timeout >90.0s` at `scheduler_process.wait()`.     
                                                                                
                                        
                                                                                
                                                                             
     **Root cause:** The `execution_timeout(90)` was sized to match 
`wait_for_dag_run(max_wait_time=90)` but ignored the fixed overheads around it: 
      
                                                                                
                                                          
       10s  startup sleep  (start_scheduler)        
       90s  dag run wait   (wait_for_dag_run max)    
       10s  post-run sleep (span_status propagation)     
       Xs   scheduler shutdown — includes OTel atexit flush (force_flush,  up 
to 10s)
       ───                      
       110s+ worst case > 90s limit   
                                                                                
                                                                             
     When the dag run took ~65–70s, only 5–10s remained for scheduler shutdown 
— not enough for force_flush(). The subprocess.wait() call had no timeout, 
blocking in os.waitpid() until SIGALRM fired. The 60→90 bump in #67170 was 
still too small for the same reason.                                            
                                           
                                                                                
                                                                             
     **Changes:**                                                               
                                                                             
     - Replace bare subprocess.wait() with wait(timeout=30) + TimeoutExpired / 
kill() / wait() in both finally blocks.                                         
                                              
     - Raise execution_timeout from 90→160 on all three test methods.           
                                                                             
       Budget: 10 + 90 + 10 + 30 = 140s worst case + 20s CI buffer = 160s.      
                                                                             
                                                                                
                                                                             
     ---                                                                        
                                                                             
                                                                                
                                                                             
     ##### Was generative AI tooling used to co-author this PR?                 
                                                                             
                                                                                
                                                                             
     - [X] Yes — Claude Code (claude-sonnet-4-6)
                                                                                
                                                                             
     Generated-by: Claude Code (claude-sonnet-4-6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to