Interesting, Greg. Do you know if using pg_bouncer would allow you to have more than 100 running k8s executor tasks at one time if e.g. there is a 100 connection limit on gcp instance?
On Thu, Aug 30, 2018 at 6:39 PM Greg Neiheisel <g...@astronomer.io> wrote: > Good point Eamon, maxing connections out is definitely something to look > out for. We recently added pgbouncer to our helm charts to pool connections > to the database for all the different airflow processes. Here's our chart > for reference - > > https://github.com/astronomerio/helm.astronomer.io/tree/master/charts/airflow > > On Thu, Aug 30, 2018 at 1:17 PM Kyle Hamlin <hamlin...@gmail.com> wrote: > > > Thanks for your responses! Glad to hear that tasks can run independently > if > > something happens. > > > > On Thu, Aug 30, 2018 at 1:13 PM Eamon Keane <eamon.kea...@gmail.com> > > wrote: > > > > > Adding to Greg's point, if you're using the k8s executor and for some > > > reason the k8s executor worker pod fails to launch within 120 seconds > > (e.g. > > > pending due to scaling up a new node), this counts as a task failure. > > Also, > > > if the k8s executor pod has already launched a pod operator but is > killed > > > (e.g. manually or due to node upgrade), the pod operator it launched > is > > > not killed and runs to completion so if using retries, you need to > ensure > > > idempotency. The worker pods update the db per my understanding, with > > each > > > requiring a separate connection to the db, this can tax your connection > > > budget (100-300 for small postgres instances on gcp or aws). > > > > > > On Thu, Aug 30, 2018 at 6:04 PM Greg Neiheisel <g...@astronomer.io> > > wrote: > > > > > > > Hey Kyle, the task pods will continue to run even if you reboot the > > > > scheduler and webserver and the status does get updated in the > airflow > > > db, > > > > which is great. > > > > > > > > I know the scheduler subscribes to the Kubernetes watch API to get an > > > event > > > > stream of pods completing and it keeps a checkpoint so it can > > resubscribe > > > > when it comes back up. > > > > > > > > I forget if the worker pods update the db or if the scheduler is > doing > > > > that, but it should work out. > > > > > > > > On Thu, Aug 30, 2018, 9:54 AM Kyle Hamlin <hamlin...@gmail.com> > wrote: > > > > > > > > > gentle bump > > > > > > > > > > On Wed, Aug 22, 2018 at 5:12 PM Kyle Hamlin <hamlin...@gmail.com> > > > wrote: > > > > > > > > > > > I'm about to make the switch to Kubernetes with Airflow, but am > > > > wondering > > > > > > what happens when my CI/CD pipeline redeploys the webserver and > > > > scheduler > > > > > > and there are still long-running tasks (pods). My intuition is > that > > > > since > > > > > > the database hold all state and the tasks are in charge of > updating > > > > their > > > > > > own state, and the UI only renders what it sees in the database > > that > > > > this > > > > > > is not so much of a problem. To be sure, however, here are my > > > > questions: > > > > > > > > > > > > Will task pods continue to run? > > > > > > Can task pods continue to poll the external system they are > running > > > > tasks > > > > > > on while being "headless"? > > > > > > Can the tasks pods change/update state in the database while > being > > > > > > "headless"? > > > > > > Will the UI/Scheduler still be aware of the tasks (pods) once > they > > > are > > > > > > live again? > > > > > > > > > > > > Is there anything else the might cause issues when deploying > while > > > > tasks > > > > > > (pods) are running that I'm not thinking of here? > > > > > > > > > > > > Kyle Hamlin > > > > > > > > > > > > > > > > > > > > > -- > > > > > Kyle Hamlin > > > > > > > > > > > > > > > > > > -- > > Kyle Hamlin > > > > > -- > *Greg Neiheisel* / CTO Astronomer.io >