Good point Eamon, maxing connections out is definitely something to look out for. We recently added pgbouncer to our helm charts to pool connections to the database for all the different airflow processes. Here's our chart for reference - https://github.com/astronomerio/helm.astronomer.io/tree/master/charts/airflow
On Thu, Aug 30, 2018 at 1:17 PM Kyle Hamlin <hamlin...@gmail.com> wrote: > Thanks for your responses! Glad to hear that tasks can run independently if > something happens. > > On Thu, Aug 30, 2018 at 1:13 PM Eamon Keane <eamon.kea...@gmail.com> > wrote: > > > Adding to Greg's point, if you're using the k8s executor and for some > > reason the k8s executor worker pod fails to launch within 120 seconds > (e.g. > > pending due to scaling up a new node), this counts as a task failure. > Also, > > if the k8s executor pod has already launched a pod operator but is killed > > (e.g. manually or due to node upgrade), the pod operator it launched is > > not killed and runs to completion so if using retries, you need to ensure > > idempotency. The worker pods update the db per my understanding, with > each > > requiring a separate connection to the db, this can tax your connection > > budget (100-300 for small postgres instances on gcp or aws). > > > > On Thu, Aug 30, 2018 at 6:04 PM Greg Neiheisel <g...@astronomer.io> > wrote: > > > > > Hey Kyle, the task pods will continue to run even if you reboot the > > > scheduler and webserver and the status does get updated in the airflow > > db, > > > which is great. > > > > > > I know the scheduler subscribes to the Kubernetes watch API to get an > > event > > > stream of pods completing and it keeps a checkpoint so it can > resubscribe > > > when it comes back up. > > > > > > I forget if the worker pods update the db or if the scheduler is doing > > > that, but it should work out. > > > > > > On Thu, Aug 30, 2018, 9:54 AM Kyle Hamlin <hamlin...@gmail.com> wrote: > > > > > > > gentle bump > > > > > > > > On Wed, Aug 22, 2018 at 5:12 PM Kyle Hamlin <hamlin...@gmail.com> > > wrote: > > > > > > > > > I'm about to make the switch to Kubernetes with Airflow, but am > > > wondering > > > > > what happens when my CI/CD pipeline redeploys the webserver and > > > scheduler > > > > > and there are still long-running tasks (pods). My intuition is that > > > since > > > > > the database hold all state and the tasks are in charge of updating > > > their > > > > > own state, and the UI only renders what it sees in the database > that > > > this > > > > > is not so much of a problem. To be sure, however, here are my > > > questions: > > > > > > > > > > Will task pods continue to run? > > > > > Can task pods continue to poll the external system they are running > > > tasks > > > > > on while being "headless"? > > > > > Can the tasks pods change/update state in the database while being > > > > > "headless"? > > > > > Will the UI/Scheduler still be aware of the tasks (pods) once they > > are > > > > > live again? > > > > > > > > > > Is there anything else the might cause issues when deploying while > > > tasks > > > > > (pods) are running that I'm not thinking of here? > > > > > > > > > > Kyle Hamlin > > > > > > > > > > > > > > > > > -- > > > > Kyle Hamlin > > > > > > > > > > > > -- > Kyle Hamlin > -- *Greg Neiheisel* / CTO Astronomer.io