I support option 2 with proper automation & CI - the reasonings you've
shown for that make sense to me.


Shahar


On Wed, Jul 2, 2025 at 3:36 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> Hello everyone,
>
> As we work on finishing off the code-level separation of Task SDK and Core
> (scheduler etc) we have come across some situations where we would like to
> share code between these.
>
> However it’s not as straight forward of “just put it in a common dist they
> both depend upon” because one of the goals of the Task SDK separation was
> to have 100% complete version independence between the two, ideally even if
> they are built into the same image and venv. Most of the reason why this
> isn’t straight forward comes down to backwards compatibility - if we make
> an change to the common/shared distribution
>
>
> We’ve listed the options we have thought about in
> https://github.com/apache/airflow/issues/51545 (but that covers some more
> things that I don’t want to get in to in this discussion such as possibly
> separating operators and executors out of a single provider dist.)
>
> To give a concrete example of some code I would like to share
> https://github.com/apache/airflow/blob/84897570bf7e438afb157ba4700768ea74824295/airflow-core/src/airflow/_logging/structlog.py
> — logging config. Another thing we will want to share will be the
> AirflowConfigParser class from airflow.configuration (but notably: only the
> parser class, _not_ the default config values, again, lets not dwell on the
> specifics of that)
>
> So to bring the options listed in the issue here for discussion, broadly
> speaking there are two high-level approaches:
>
> 1. A single shared distribution
> 2. No shared package and copy/duplicate code
>
> The advantage of Approach 1 is that we only have the code in one place.
> However for me, at least in this specific case of Logging config or
> AirflowConfigParser class is that backwards compatibility is much much
> harder.
>
> The main advantage of Approach 2 is the the code is released with/embedded
> in the dist (i.e. apache-airflow-task-sdk would contain the right version
> of the logging config and ConfigParser etc). The downside is that either
> the code will need to be duplicated in the repo, or better yet it would
> live in a single place in the repo, but some tooling (TBD) will
> automatically handle the duplication, either at commit time, or my
> preference, at release time.
>
> For this kind of shared “utility” code I am very strongly leaning towards
> option 2 with automation, as otherwise I think the backwards compatibility
> requirements would make it unworkable (very quickly over time the
> combinations we would have to test would just be unreasonable) and I don’t
> feel confident we can have things as stable as we need to really deliver
> the version separation/independency I want to delivery with AIP-72.
>
> So unless someone feels very strongly about this, I will come up with a
> draft PR for further discussion that will implement code sharing via
> “vendoring” it at build time. I have an idea of how I can achieve this so
> we have a single version in the repo and it’ll work there, but at runtime
> we vendor it in to the shipped dist so it lives at something like
> `airflow.sdk._vendor` etc.
>
> In terms of repo layout, this likely means we would end up with:
>
> airflow-core/pyproject.toml
> airflow-core/src/
> airflow-core/tests/
> task-sdk/pyproject.toml
> task-sdk/src/
> task-sdk/tests/
> airflow-common/src
> airflow-common/tests/
> # Possibly no airflow-common/pyproject.toml, as deps would be included in
> the downstream projects. TBD.
>
> Thoughts and feedback welcomed.

Reply via email to