potiuk commented on PR #56044: URL: https://github.com/apache/airflow/pull/56044#issuecomment-3425483616
I think we shoudl also understand the context better. We are not talking about "generic" utility but about specific ways Airflow uses forking where forking interacts with SQLAlchemy DB initialization. There are several places where we fork in Airflow (and in Airflow 3 this is quite a bit different than in Airflow 2): 1) Workers. Here, the thing is that worker does not even have connection configured. It's not supposed to access the DB at all. So we have no issue here. 2) Local Executor - > here executors are only forked at the start of the scheduler in a single process that forks/spawns executor processes, so clossing the DB conneciton before that is quite OK - because it is long before anything else happens and there are (I think) no side effects of it. 3) Dag Processor - again DB There is only initialized in the internal API server. The Processor Manager itself and Dag Processor parsing processes should not even attempt to initialize database. 4) Triggerer - same as DagProcessor. It should not access the DB in the process that forks sub-processes. 5) Webserver - there I am not 100% sure what is the mechanism used by uvicorn now, but I would not see the need for the main process to use database at all, database should also only be inititalized and used in the forks (which are worker processes that handle async loops of starlette - I guess) Now 3), 4) and 5) should be verified, if that's the case for sure (and fixed if not) - which leaves only Scheduler case where both forking and forked process need a database. I think once we verify those assumptions, your concern about > Disposing of the engine means temporarily disconnecting all connections in the pool, which could potentially trigger race conditions that are difficult to predict. might not be valid - and then the only thing we **really** need to do is dispose connections before we fork Local Executor processes and re-create the pools. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
