potiuk commented on issue #42542:
URL: https://github.com/apache/airflow/issues/42542#issuecomment-2553919773

   > I’m encountering the same issue on Airflow 2.10.2. Specifically, after 
removing certain helper Python files and even some DAG files from my local 
dags/ directory (which is mounted into the Airflow containers), these files 
continue to appear inside the running containers. Although they’re deleted on 
the host machine, they still show up in the container as if they remain present 
and are being parsed by the scheduler or DAG processor.
   > 
   > It feels like there’s some form of caching or delayed sync at play. How 
can I ensure that once files are removed locally, they are also removed from 
the container’s view and no longer recognized by Airflow? Any guidance or known 
workarounds would be greatly appreciated.
   
   It's entirely dependent on your deployment - which is totally outside of 
Airlfow realm - we have a `git-sync` in the reference chart of ours, but how 
syncing of hte works for other charts or various deployments - it's not 
something that Airflow 2 controis and you need to look at details on how this 
mounting/syncing is done - because that totally depends on deployment manager.
   
   In Airflow 3 it will change, when you are going to use DAG Bundles 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356 - 
where syncing will be controlled inside Airflow and then it's more of an 
Airlfow problem - but until then it's more of how airflow is deployed and what 
syncing mechanims `You` implemented. We can try to help here if you describe 
your case, but when you - for example use 3rd-party chart like @hpereira98 -> 
the question is better answered and discussed there. Ig you @klisira describe 
in detail your deployment in detail and it's different than the other chart, 
this might be a good idea to explain all the details and maybe we can find 
something.
   
   There are several things that could happen - one of them is that you have 
**some** filesystem caching that prevents the files from deletion when they are 
deleted locally. Another one is that pre-compiled bytecode files remain where 
they were and they are a) generated and b) not removed when git-sync swaps the 
directory (if you are using git-sync). In this case setting the variable: 
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE and 
cleaning those .pyc files might help. 
   
   Other than that the investigation should happen in your container that is 
used for dag parsing. It could be that it is scheduler or if you are using 
standalone dag processor, it's the processor - then you have to exec to those 
containers and see if the python files (or .pyc files) are there and whether 
they seem to parsed by the scheduler/dag processor (you will see it in dag file 
processor logs). Finally (and this happened to a few of our users) - they might 
have some old version of scheduler/ dag file processor running - with old 
un-synced folders and connected to the same database - in such case such 
"additional" scheduler/dag file processor will continue scanning and adding 
those files even if they are removed in the "regular" scheduler/dag file 
processor.
   
   Those are all ideas/hypotheses that come to my mind that you could explore - 
but most of them are just wild guesses and well, hypotheses, that only you can 
verify.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to