tanuj241088 commented on issue #29131: URL: https://github.com/apache/airflow/issues/29131#issuecomment-1402557912
@Taragolis ... i was also investigating in the same direction. Here is the update: i dont think this is related to Airflow version upgrade. We do have liveliness probe that checks scheduler heartbeat every 5 min. and looks like liveliness probe is failing. Then i found out that our kubernetest team has upgrade k8s version from 1.9 to 1.21 (today). but still i wonder why liveliness probe will fail with new version of k8s (1.21)?? Here is my liveliness probe script: # If the scheduler stops heartbeating for 5 minutes (10*30s) kill the # scheduler and let Kubernetes restart it livenessProbe: failureThreshold: 10 periodSeconds: 30 exec: command: - python - -Wignore - -c - | import os os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR' os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR' from airflow.jobs.scheduler_job import SchedulerJob from airflow.utils.db import create_session from airflow.utils.net import get_hostname import sys #job = SchedulerJob.most_recent_job() #sys.exit(0 if job.is_alive() and job.hostname == get_hostname() else 1) ##Commenting out above lines as it is causing scheduler to send SIGTERM to longer running tasks with create_session() as session: job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by( SchedulerJob.latest_heartbeat.desc()).limit(1).first() sys.exit(0 if job.is_alive() else 1) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org