Hello everyone, As we work on finishing off the code-level separation of Task SDK and Core (scheduler etc) we have come across some situations where we would like to share code between these.
However it’s not as straight forward of “just put it in a common dist they both depend upon” because one of the goals of the Task SDK separation was to have 100% complete version independence between the two, ideally even if they are built into the same image and venv. Most of the reason why this isn’t straight forward comes down to backwards compatibility - if we make an change to the common/shared distribution We’ve listed the options we have thought about in https://github.com/apache/airflow/issues/51545 (but that covers some more things that I don’t want to get in to in this discussion such as possibly separating operators and executors out of a single provider dist.) To give a concrete example of some code I would like to share https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py — logging config. Another thing we will want to share will be the AirflowConfigParser class from airflow.configuration (but notably: only the parser class, _not_ the default config values, again, lets not dwell on the specifics of that) So to bring the options listed in the issue here for discussion, broadly speaking there are two high-level approaches: 1. A single shared distribution 2. No shared package and copy/duplicate code The advantage of Approach 1 is that we only have the code in one place. However for me, at least in this specific case of Logging config or AirflowConfigParser class is that backwards compatibility is much much harder. The main advantage of Approach 2 is the the code is released with/embedded in the dist (i.e. apache-airflow-task-sdk would contain the right version of the logging config and ConfigParser etc). The downside is that either the code will need to be duplicated in the repo, or better yet it would live in a single place in the repo, but some tooling (TBD) will automatically handle the duplication, either at commit time, or my preference, at release time. For this kind of shared “utility” code I am very strongly leaning towards option 2 with automation, as otherwise I think the backwards compatibility requirements would make it unworkable (very quickly over time the combinations we would have to test would just be unreasonable) and I don’t feel confident we can have things as stable as we need to really deliver the version separation/independency I want to delivery with AIP-72. So unless someone feels very strongly about this, I will come up with a draft PR for further discussion that will implement code sharing via “vendoring” it at build time. I have an idea of how I can achieve this so we have a single version in the repo and it’ll work there, but at runtime we vendor it in to the shipped dist so it lives at something like `airflow.sdk._vendor` etc. In terms of repo layout, this likely means we would end up with: airflow-core/pyproject.toml airflow-core/src/ airflow-core/tests/ task-sdk/pyproject.toml task-sdk/src/ task-sdk/tests/ airflow-common/src airflow-common/tests/ # Possibly no airflow-common/pyproject.toml, as deps would be included in the downstream projects. TBD. Thoughts and feedback welcomed.