GitHub user rtrindvg edited a discussion: Need help understanding total number of Dags oscillating on UI
I am in the middle of a migration from an Airflow in a Virtual Machine to a kubernetes cluster, for now in a staging environment. After a lot of configuration adjustments on the Helm values.yaml, the cluster seems to be stable and working fine. But for some reason, the UI sometimes shows less DAGs than what are available. For example, we have a total of 93 DAGs. After the initial load, which takes a couple of minutes, it becomes stable for some time. Than it reduces to a smaller number (like 64) and after a couple of minutes, it starts to go back up again, eventually returning to 93 again. We confirmed this is not any kind of browser cache. There were no restarts of any pods in the meantime, no changes to the cluster and no DAGs were changes as well. We are using git-sync in a non-persistent storage, like it's recommended in the docs. We activated the debug logs on it and it seems to be working fine, just downloading changes when the DAGs branch has changes, and they seem to be propagating quickly to all relevant pods. The scheduler logs were not clear on any kind of errors which could justify the drop in total DAGs. Another fix we tried was activating the non-default DAG processor, but the behavior is the same. I tried activating the processor verbose mode using the env parameter, unsuccessfully. The logs are mostly blank, so I have no clue if the DAG processor is the culprit. We also replaced the CeleryExecutor to the KubernetesExecutor, because it is more suited to our purposes. We did not think it had any relation to the issue and, as expected, the behavior persists. Since I am from the cloud-infra team, and have no previous experience in Airflow, can someone help me understand what could be the issue and possible next steps in diagnosing our environment? We are using airflow 2.9.3 (since it's the most recent in the latest Helm available), python 3.12, in a custom Dockerfile. We are not extending the image, we are really customizing it, since we need to perform a couple of compilations and it was more optimal to do this prior to the airflow pip installs, to make the rebuild faster and the final image smaller. I did not know if it was safe to point the image to the latest Airflow available (since I assume an updated Helm would be published if this was the case), so we kept using it this one. Embedding the DAGs onto the image is not an option, since they are changed constantly and the time to rebuild and the process of redeploying the cluster several times a day is not ideal for us. If updating the cluster to 2.10.3 is safe and has any known issues regarding this behavior, please point me in the right way. Thanks for any tips! GitHub link: https://github.com/apache/airflow/discussions/44495 ---- This is an automatically sent email for commits@airflow.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org