shunping opened a new pull request, #38594:
URL: https://github.com/apache/beam/pull/38594
The logic of heartbeat and retrying in SubprocessServer is as follows:
- There are at most 3 retries for a server start.
- For each retry, allow a generous maximum wait time of 5 minutes (300s) to
fully accommodate slow network downloads (such as pip download during SDK
staging).
- Also in each retry, introduce a unified process heartbeat status
(`_last_heartbeat_time`)
which is updated dynamically by two active liveness criteria:
- STDOUT/STDERR Activity: Leverages unbuffered python subprocess
execution (`PYTHONUNBUFFERED=1`) to pipe output in real-time. When any output
is printed, the log reader thread flushes the log and updates the heartbeat
timestamp immediately.
- CPU Activity: The main thread polls the subprocess's accumulated CPU
time (user + system) every 5 seconds (via Unix 'ps'). If any CPU progression is
detected, the heartbeat is updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]