andreahlert commented on PR #61627:
URL: https://github.com/apache/airflow/pull/61627#issuecomment-3910446088

   > I would really like @ashb @kaxil and @amoghrajesh to review this and 
figure out what is the desired signal handling in this case.
   > 
   > It might have quite profound impact on how our signal handling is 
implemented in Airlfow.
   > 
   > So far, we assumed that when we run any component except `celery worker` 
we relied on signal propagation to all processes in process group via 
`dumb-init` entrypoint process - and all our images had entrypoint (thus iniit 
process) set as `dumb-init`.
   > 
   > Our images have `dumb-init` set at entry point and default behaviour is 
that signals received by dumb-init are forwarded to all the processes in the 
main proces group. This means that if the proces does not create a new process 
group when it forks it's children, all such created processes will receive the 
signal.
   > 
   > This was a deliberate decision, because a lot of potential tasks and 
parsing process libraries might not properly propagate signals - typical 
example is when we run bash tasks, bash will silently swallow signals and will 
not propagate it to it's children processes unless specific handlers are 
configured in the bash script. A lot of people does not know that it happens 
and they are starting bash subprocesses, not knowing that signals will not be 
propagated - and since airflow is pretty versatile and you can generally write 
any code, we decided to send signals to all the precesses in the group (which 
is defailt dumb-init behaviour).
   > 
   > The only exception is celery worker, because celery worker has a 
deliberate handling of the signals - the first signal they receive puts the 
worker in offline mode, where it does not accept any new tasks - and waits for 
all tasks to complete. This is then used by KEDA autoscaler for example to 
automatically scale celery workers down. Celery handles propagating of signals 
to it's subprocesses, including signal escalation in case the signal is not 
propagated - with timeuts, so even if the bash subprocess swallows the signal, 
celery will escalate eventually to SIGKILL and kill such tasks (but if tasks 
are handling signals, it will nicely and cleanly stop the before escalation).
   > 
   > So instead of sending signals to process group, celery is configured to 
only receive signals by the master worker and it will take care of the 
propagation - DUMB_INIT_SETSID should be set to 0 for celery pods. And this is 
how it is implemented in our helm chart as well, and documentation of Airflow 
Docker image describes this behaviour:
   > 
   > 
https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
   > 
   > I am not sure what was the thought about supervisor's behaviour when it 
was implemented, whether it handles propagation of signals to child processes 
or whether it relies on process group receiving the signals, but I guess the 
latter. It would however be great to document the behaviour and expectations 
for signal handling in this case - because
   > 
   > I see two assumptions here that are wrong:
   > 
   > 1. The supervisor is assumed to be init process  - which I think is pretty 
wrong.  Supervisor in python should never be an init process  - unless it 
implements all that the init process should do. Init process should do more 
than signal propagation, it should also reap orphan processes by other 
subprocesses (so called zombies), it should wait for all of the processes it 
sent signals to before exiting - otherwise it wil not cleanly by the kernel - 
which might mean filesystem corruption for example. This is why all docker 
containers (that includes airflow reference image) should have a "real" init 
process as entrypoint - in our case this is `dumb-init` and it's absolutely not 
recommended by any production image to have `init` process that is not a `real` 
init process (some people set bash interpreter as entrypoint and it's 
absolutely wrong).
   > 2. the assumption is that the init process does not forward the signal to 
the whole process group of forked command (which happens in Airflow reference 
image) .
   > 
   > I think before we continue with this PR we should at the very least make 
some assumptions here set and agreed.
   
   Hey @potiuk, thanks for adding here.
   
   Thanks, this makes sense and I agree we should align on assumptions first.
   
   My intent with this PR is only to guarantee `on_kill()` is reached when 
supervisor receives SIGTERM in Kubernetes worker scenarios. I did not intend to 
make supervisor an init replacement.
   
   I propose we explicitly document the expected model for this path:
   1) `dumb-init` remains responsible for process-group signal behavior
   2) supervisor behavior is documented for task subprocess signal forwarding 
expectations
   3) we confirm whether this PR should rely only on process-group propagation, 
or keep explicit forwarding as a safety mechanism
   
   If you all agree, I can update this PR with docs/tests aligned to the chosen 
model.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to