Oh yikes. Thanks for catching this @Jarek! @Greg Neiheisel <g...@astronomer.io> were there a lot of caveats to deploying from tini or would this be a fairly straightforward fix?
On Sun, Mar 10, 2019 at 5:13 PM Greg Neiheisel <g...@astronomer.io> wrote: > I can confirm that wrapping airflow with tini/dumb-init works pretty well. > We've been relying on it at Astronomer for the past year. We run > exclusively on k8s and are restarting airflow pods very frequently. Here's > our production 1.10.2 image using tini on alpine, for example > > https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102 > . > > On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > NOTE! I changed the subject to not pollute the AIP-12 thread > > > > Fokko, > > > > I think I know why TERM signals do not work in the current POD > > implementation. I already experienced that several times - dockerized > app > > not receiving TERM signal. The reason was always the same. It is not a > bug > > actually - it is expected behaviour in case your ENTRYPOINT is in the > SHELL > > form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT > first > > argument in [ "shell script", "arg" ] form. > > > > In those cases "shell" process becomes the "0" init process. > Unfortunately > > shell process is not prepared to do all the stuff that proper init > process > > should be doing: > > > > * Inherit orphaned child processes > > * reap them > > * Handle and propagate signals properly > > * Wait until all subprocesses are terminated before terminating itself > > > > What happens is that the TERM signal just kills the init "shell" process > > but, then the signal does not reach any of its children and the container > > continues to run. It's well known problem in docker world and there are a > > number of solutions (including exec-ing from shell or - better - using a > > dumb-init/tini in your ENTRYPOINT- very tiny "proper" init > implementations > > that do what they should do. > > > > You can read more for example here: > > https://hynek.me/articles/docker-signals > > . > > > > J. > > > > On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fo...@driesprong.frl> > > wrote: > > > > Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow > more > > > k8s native. You could package your dags within your container, and do a > > > rolling update. Add the DAGs as the last layer, and then point the DAGs > > > folder to the same location. The hard part here is that you need to > > > gracefuly restart the workers. Currently AFAIK the signals given to the > > pod > > > aren't respected. So when the scheduler/webserver/worker receives a > > > SIGTERM, it should stop the jobs nicely and then exit the container, > > before > > > k8s kills the container using a SIGKILL. This will be challenging with > > the > > > workers, which they are potentially long-running. Maybe stop kicking > off > > > new jobs, and let the old ones finish, will be good enough, but then we > > > need to increase the standard kill timeout substantially. Having this > > would > > > also enable the autoscaling of the workers > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > E: jarek.pot...@polidea.com > > > > > -- > *Greg Neiheisel* / CTO Astronomer.io >