potiuk commented on issue #42542: URL: https://github.com/apache/airflow/issues/42542#issuecomment-2553919773
> I’m encountering the same issue on Airflow 2.10.2. Specifically, after removing certain helper Python files and even some DAG files from my local dags/ directory (which is mounted into the Airflow containers), these files continue to appear inside the running containers. Although they’re deleted on the host machine, they still show up in the container as if they remain present and are being parsed by the scheduler or DAG processor. > > It feels like there’s some form of caching or delayed sync at play. How can I ensure that once files are removed locally, they are also removed from the container’s view and no longer recognized by Airflow? Any guidance or known workarounds would be greatly appreciated. It's entirely dependent on your deployment - which is totally outside of Airlfow realm - we have a `git-sync` in the reference chart of ours, but how syncing of hte works for other charts or various deployments - it's not something that Airflow 2 controis and you need to look at details on how this mounting/syncing is done - because that totally depends on deployment manager. In Airflow 3 it will change, when you are going to use DAG Bundles https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356 - where syncing will be controlled inside Airflow and then it's more of an Airlfow problem - but until then it's more of how airflow is deployed and what syncing mechanims `You` implemented. We can try to help here if you describe your case, but when you - for example use 3rd-party chart like @hpereira98 -> the question is better answered and discussed there. Ig you @klisira describe in detail your deployment in detail and it's different than the other chart, this might be a good idea to explain all the details and maybe we can find something. There are several things that could happen - one of them is that you have **some** filesystem caching that prevents the files from deletion when they are deleted locally. Another one is that pre-compiled bytecode files remain where they were and they are a) generated and b) not removed when git-sync swaps the directory (if you are using git-sync). In this case setting the variable: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE and cleaning those .pyc files might help. Other than that the investigation should happen in your container that is used for dag parsing. It could be that it is scheduler or if you are using standalone dag processor, it's the processor - then you have to exec to those containers and see if the python files (or .pyc files) are there and whether they seem to parsed by the scheduler/dag processor (you will see it in dag file processor logs). Finally (and this happened to a few of our users) - they might have some old version of scheduler/ dag file processor running - with old un-synced folders and connected to the same database - in such case such "additional" scheduler/dag file processor will continue scanning and adding those files even if they are removed in the "regular" scheduler/dag file processor. Those are all ideas/hypotheses that come to my mind that you could explore - but most of them are just wild guesses and well, hypotheses, that only you can verify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
