shunping opened a new pull request, #38594:
URL: https://github.com/apache/beam/pull/38594

   The logic of heartbeat and retrying in SubprocessServer is as follows:
   
   - There are at most 3 retries for a server start.
   - For each retry, allow a generous maximum wait time of 5 minutes (300s) to 
fully accommodate slow network downloads (such as pip download during SDK 
staging).
   - Also in each retry, introduce a unified process heartbeat status 
(`_last_heartbeat_time`) 
      which is updated dynamically by two active liveness criteria:
      - STDOUT/STDERR Activity: Leverages unbuffered python subprocess 
execution (`PYTHONUNBUFFERED=1`) to pipe output in real-time. When any output 
is printed, the log reader thread flushes the log and updates the heartbeat 
timestamp immediately.
      - CPU Activity: The main thread polls the subprocess's accumulated CPU 
time (user + system) every 5 seconds (via Unix 'ps'). If any CPU progression is 
detected, the heartbeat is updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to