csp98 opened a new issue, #27140:
URL: https://github.com/apache/airflow/issues/27140

   ### Official Helm Chart version
   
   1.7.0 (latest released)
   
   ### Apache Airflow version
   
   2.3.4
   
   ### Kubernetes Version
   
   1.22.12-gke.1200     
   
   ### Helm Chart configuration
   
   ```yaml
     dagProcessor:
       enabled: true
   ```
   
   ### Docker Image customisations
   
   ```dockerfile
   FROM apache/airflow:2.3.4-python3.9
   
   USER root
   RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] 
https://packages.cloud.google.com/apt cloud-sdk main" | tee -a 
/etc/apt/sources.list.d/google-cloud-sdk.list
   RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo 
apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
   RUN apt-get update && apt-get install -y google-cloud-cli
   RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s 
https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
   RUN sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
   USER airflow
   ```
   
   ### What happened
   
   Current DAG Processor livenessProbe is the following:
   ```
   CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec 
/entrypoint \
       airflow jobs check --hostname $(hostname)
   ```
   This command checks the metadata DB searching for an active job whose 
hostname is the current pod's one (_airflow-dag-processor-xxxx_). 
   However, after running the dag-processor pod for more than 1 hour, there are 
no jobs with the processor hostname in the jobs table.
   
![image](https://user-images.githubusercontent.com/28935464/196711859-98dadb8f-3273-42ec-a4db-958890db34b7.png)
   
![image](https://user-images.githubusercontent.com/28935464/196711947-5a0fc5d7-4b91-4e82-9ff0-c721e6a4c1cd.png)
   
   As a consequence, the livenessProbe fails and the pod is constantly 
restarting.
   
   After investigating the code, I found out that DagFileProcessorManager nor 
DagFileProcessor are creating jobs in the metadata DB, so the livenessProbe is 
not valid.
   
   ### What you think should happen instead
   
   livenessProbe should be refactored so that it truly checks that DAG 
Processors are up and running
   
   ### How to reproduce
   
   1. Deploy airflow with a standalone dag-processor.
   2. Wait for ~ 5 minutes
   3. Check that the livenessProbe has been failed for 5 minutes and the pod 
has been restarted.
   
   ### Anything else
   
   I think this behavior is inherited from the NOT standalone dag-processor 
mode (the livenessProbe checks for a SchedulerJob, that in fact contains the 
"DagProcessorJob")
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to