Copilot commented on code in PR #64995:
URL: https://github.com/apache/airflow/pull/64995#discussion_r3067136785
##########
task-sdk/tests/task_sdk/execution_time/test_supervisor.py:
##########
@@ -540,6 +540,38 @@ def subprocess_main():
assert "Last chance exception handler" in captured.err
assert "RuntimeError: Fake syntax error" in captured.err
+ def test_start_retries_psutil_lookup_on_no_such_process(self,
client_with_ti_start, monkeypatch):
+ real_process = psutil.Process
+ call_count = 0
+
+ def flaky_process(pid=None):
+ nonlocal call_count
+ if pid is not None:
+ call_count += 1
+ if call_count == 1:
+ raise psutil.NoSuchProcess(pid=pid, msg="process PID not
found")
+ return real_process(pid)
+ return real_process()
+
+
monkeypatch.setattr("airflow.sdk.execution_time.supervisor.psutil.Process",
flaky_process)
+
monkeypatch.setattr("airflow.sdk.execution_time.supervisor.time.sleep", lambda
_: None)
Review Comment:
The monkeypatch targets here use a dotted-string that looks like a nested
attribute path ("airflow.sdk.execution_time.supervisor.psutil.Process" /
".time.sleep"). With pytest's monkeypatch, the string form imports everything
up to the last dot as a module, so this will try to import
`airflow.sdk.execution_time.supervisor.psutil` and
`airflow.sdk.execution_time.supervisor.time` (which don’t exist) and the test
will error. Patch the attributes on the already-imported supervisor module
objects instead (e.g. import `airflow.sdk.execution_time.supervisor` and set
`supervisor.psutil.Process` / `supervisor.time.sleep`).
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -441,6 +441,14 @@ def exit(n: int) -> NoReturn:
exit(125)
+def _psutil_process(pid: int) -> psutil.Process:
+ try:
+ return psutil.Process(pid)
+ except psutil.NoSuchProcess:
+ time.sleep(0.1)
+ return psutil.Process(pid)
Review Comment:
`time.sleep(0.1)` is a new hard-coded retry delay. Given this module already
centralizes timing knobs as constants (e.g. `MIN_HEARTBEAT_INTERVAL`,
`SOCKET_CLEANUP_TIMEOUT`), consider extracting this delay (and possibly retry
count/timeout) into a named constant or config-backed value so it’s easier to
tune and reason about if the race needs longer retries on some platforms.
##########
task-sdk/src/airflow/sdk/execution_time/supervisor.py:
##########
@@ -441,6 +441,14 @@ def exit(n: int) -> NoReturn:
exit(125)
+def _psutil_process(pid: int) -> psutil.Process:
+ try:
+ return psutil.Process(pid)
+ except psutil.NoSuchProcess:
+ time.sleep(0.1)
+ return psutil.Process(pid)
Review Comment:
The PR description says this change replaces the immediate psutil lookup
with an `os.waitpid`/`os.kill`-backed lightweight handle, but the
implementation here still wraps the child PID in `psutil.Process` (with a
retry). Either update the PR description to match the actual approach, or
implement the described non-psutil handle so reviewers/users aren’t misled
about the behavior and dependencies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]