Oh super - nice. I am definitely going to take a look (at both) :D

On Tue, Apr 28, 2026 at 9:55 AM Tzu-ping Chung via dev <
[email protected]> wrote:

> Hi all,
>
> I’ve merged the ADRs and the previous LC message (in the other thread)
> into a document formatted as an AIP:
>
> AIP-108 Java Task SDK and the Language Coordinator Layer - Airflow -
> Apache Software Foundation <https://cwiki.apache.org/confluence/x/pY4mGQ>
> cwiki.apache.org <https://cwiki.apache.org/confluence/x/pY4mGQ>
> [image: favicon.ico] <https://cwiki.apache.org/confluence/x/pY4mGQ>
> <https://cwiki.apache.org/confluence/x/pY4mGQ>
>
>
> Various sections are taken mostly directly from the other documents, so
> you can probably glance over them if you’ve already read them. If you
> haven’t, the AIP document would be a good place to start since it removes
> some more detailed descriptions and focuses on the high level interfaces.
> More details are still available in the ADR documents included in the PRs
> Jason opened.
>
> TP
>
>
> On 28 Apr 2026, at 11:12, Zhe-You(Jason) Liu <[email protected]> wrote:
>
> Hi Jens,
>
> The ADRs are now available here [1] , hope that helps clarify some of your
> concerns.
>
> As I understood Java is static compiled JAR files, no on-the-fly compile
>
> from Java source tree (correct?)
>
> That's correct -- the user will compile themselves and set the `[java]
> bundles_folder` config to point to the directory of those JAR files.
>
> so actually the Dag parsing concept then is quite "static" and once
>
> generated actually no need to re-parse the Dag in Java mode?
>
> Still if static JAR deployed how does the deploy lifecacly look like?
>
> Would you need to restart with new deploy the Dag Parsr and respective
> workers?
>
> The lifecycle for dag-parsing will be the same as how the current
> `DagFileProcessorProcess` acts. The coordinator comes into play before we
> start the actual parse file entrypoint [2]. Regardless of what language the
> parse file subprocess is implemented in, as long as it returns a valid
> serialized Dag JSON in msgpack over IPC, the behavior will remain the same.
> So there is no need to restart the `airflow dag-processor`, and the
> `airflow worker` will not be involved in dag processing at all.
>
> Is it really realistic that a "LocalExecutor" needs to be supported or
>
> can we limit it to e-g- only remote executors to reduce coupling and
> complexity of operating the core?
>
> The coordinator is the interface that decides how we want to launch the
> subprocess for both dag-processing and workload-execution. This means it
> will support **any** executor out of the box, as we integrate the
> coordinator at the TaskSDK level for workload-execution [3].
>
> What overhead does the "Coordinator layer" generate compared to a Java
>
> specific supervisor implementation?
>
> The "Coordinator layer" is the interface for Airflow-Core to interact with
> the target language subprocess. We still need a Java-specific supervisor
> implementation, which is the first PR I mentioned in another thread -- Java
> SDK [4].
>
> Is it not only a new / additional process but also IPC involved then. And
>
> at least I saw also performance problems e.g. using very large XComs where
> even heartbeats are lost due to long running IPC
>
> I would consider this out of scope of the "Java SDK and the Coordinator
> Layer" AIP, as the current Python-native TaskSDK supervisor will encounter
> the same issue. The multi-language support here follows the same protocol
> that the current TaskSDK uses.
>
> [1]
>
> https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0
> [2]
>
> https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573
> [3]
>
> https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001
> [4] https://github.com/apache/airflow/pull/65956
>
> Thanks.
>
> Best,
> Jason
>
> Best,
> Jason
>
> On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected]>
> wrote:
>
> Thanks TP for raising this!
>
> I would need a sleep-over the block of information in the described
> details and might have some detail questions just to ensure I understood
> right. So in a ADR or AIP document might be better to comment than in an
> email thread.
>
> Things that jump into my head but there would be more coming thinking
> about it:
>
>  * As I understood Java is static compiled JAR files, no on-the-fly
>    compile from Java source tree (correct?) - in this case also the
>    Dags are "static until re-deploy" - so actually the Dag parsing
>    concept then is quite "static" and once generated actually no need
>    to re-parse the Dag in Java mode?
>  * Still if static JAR deployed how does the deploy lifecacly look
>    like? Would you need to restart with new deploy the Dag Parsr and
>    respective workers?
>  * Is it really realistic that a "LocalExecutor" needs to be supported
>    or can we limit it to e-g- only remote executors to reduce coupling
>    and complexity of operating the core?
>  * What overhead does the "Coordinator layer" generate compared to a
>    Java specific supervisor implementation? Is it not only a new /
>    additional process but also IPC involved then. And at least I saw
>    also performance problems e.g. using very large XComs where even
>    heartbeats are lost due to long running IPC
>    (https://github.com/apache/airflow/issues/64628)
>  * (There might be more coming :-) )
>
> Jens
>
> P.S.: Questions here also do not mean rejection but like to understand
> which complexity and overhead we have adding all this.
>
> On 27.04.26 15:21, Jarek Potiuk wrote:
>
> Also I would like to point out one thing.
>
> This should not be `LAZY CONSENSUS` just yet, this is quite a big thing
>
> to
>
> discuss. I missed the subject already had it.
>
> At this stage this is really a discussion (I renamed the thread. Because
> ... we have never discussed it before.
>
> LAZY CONSENSUS should be really called for after initial discussion (on
> devlist) points to us actually reaching the consensus.
>
> While this one is unlikely to cause much controversy (I think),
>
> sufficient
>
> time for people to discuss and digest it before calling for lazy
>
> consensus
>
> is a necessary prerequisite. We simply need time to build consensus.
>
> We have a few processes where we build a "general" consensus first, and
> then we apply it only to particular cases (such as new providers). But in
> most cases when we have a "big" thing to discuss, we need to build
> consensus on devlist first. While in some cases people discussed things
> off-list and came to some conclusions (whch is perfectly fine) - bringing
> it to the list as a consensus, where we do not know if we achieved it
>
> yet,
>
> is - I think - a bit premature.
>
> See the lazy consensus explanation [1] and "consenesus building [2]
>
> [1] Lazy consensus -
>
> https://community.apache.org/committers/decisionMaking.html#lazy-consensus
>
> [2]  Consensus building -
>
>
> https://community.apache.org/committers/decisionMaking.html#consensus-building
>
>
> J.
>
> On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected]>
> wrote:
>
> Hey TP,
>
> Overall +1, This is quite an interesting implementation. A couple
> questions, is provider the right place for the coordinator? Don't have
> strong opinions or alternatives, but I am curious.
>
> Also for the parser wanted to understand a bit better how it works? I
>
> tried
>
> going through the SDK but wasn't able to fully understand it. Also +1 to
> Jarek's recommendation for documentation.
>
>
>
> --
> Regards,
> Aritra Basu
>
> On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, <
> [email protected]> wrote:
>
> Hi all,
>
> As mentioned in the latest dev call, we have been developing a Java SDK
> with changes to Airflow in a separate fork[1]. We plan to start merging
>
> the
>
> Java SDK work back into the OSS repository.
>
> We see this as a natural step following initial work in AIP-72[2],
>
> which
>
> created “a clean language agnostic interface for task execution, with
> support for multiple language bindings” (quoted from the proposal).
>
> The Java SDK also uses Ash’s addition of @task.stub[3] for the Go SDK,
>
> to
>
> declare a task in a DAG to be “implemented elsewhere” (not in the
>
> annotated
>
> function). Similar to the Go SDK, we also created a Java library that
>
> users
>
> can use to write task implementations for Airflow to execute at
>
> runtime.
>
>
> [1]:https://github.com/astronomer/airflow/tree/feature/java-all
> [2]:https://cwiki.apache.org/confluence/x/xgmTEg
> [3]:https://github.com/apache/airflow/pull/56055
>
> The user-facing syntax for a stub task would be the same as implemented
>
> by
>
> the Go SDK:
>
>     @task.stub(queue="java-tasks")
>     def my_task(): ...
>
> With a new configuration option to map tasks in a pool to be executed
>
> by
>
> a
>
> specific SDK:
>
>     [sdk]
>     queue_to_sdk = {"java-tasks": "java"}
>
> The configuration is needed for some executors the Go SDK currently
>
> does
>
> not support. The Go SDK currently relies on each executor worker
>
> process
>
> to
>
> specify which queues they listen to, but this is not always viable,
>
> since
>
> some executors—LocalExecutor, for example—do not have the concept of
>
> worker
>
> processes.
>
> The Coordinator Layer
> =====================
>
> When the Go SDK was implemented, it left out Runtime Airflow plugins
>
> as a
>
> future topic. This includes custom XCom backends, secrets backends
>
> lookup
>
> for connections and variables, etc. These components are implemented in
> Python, and a Java task cannot easily use the feature unless we also
> implement the lookup logic in Java. We don’t want to do that since it
> introduces significant overhead to writing plugins, and the overhead
> multiplies with each new language SDK.
>
> Fortunately, the current execution-time task runner already uses a
> two-layer design. When an executor wants to run a task, it starts a
> (Python) task runner process that talks to Airflow Core through the
> Execution API, and *forks* another (Python) process, which talks to the
> task runner through TCP, to run the actual task code. Airflow plugins
> simply go into the task runner process.
>
> This design works well for us since it keeps all the Airflow plugins in
> Python. The only thing missing is an abstraction for the task runner
> process to run tasks in any language. We are calling this new layer the
> **Coordinator**.
>
> When a DAG bundle is loaded, it not only tells Airflow how to find the
> DAGs (and the tasks in them), but also how to *run* each task. Current
> Python tasks use the Python Coordinator, running tasks by forking as
> previously described. A new JVM Coordinator will instruct the task
>
> runner
>
> how to run tasks packaged in JAR files.
>
> Each coordinator implements a base interface (BaseRuntimeCoordinator)
>
> that
>
> handles three concerns:
>
> - Discovery: determining whether a given file belongs to this
>
> coordinator
>
> (e.g. JAR files for Java).
> - DAG parsing: returning a runtime-specific subprocess command to parse
> DAG files in the target language.
> - Task execution: returning a runtime-specific subprocess command to
> execute tasks in the target runtime.
>
> The base class owns the full bridge lifecycle—TCP servers, subprocess
> management, and cleanup—so language providers only need to implement
>
> these
>
> three methods.
>
> The coordinator translates a DagFileParseRequest (for DAG parsing) and
> StartupDetails (for Task execution) data model (as declared in Airflow)
> into the appropriate commands for the target runtime. For example, a
>
> “java
>
> -classpath ... /path/to/MainClass ...” subprocess command that points
>
> to
>
> the correct JAR file and main class in this case.
>
> Coordinators as Airflow Providers
> =================================
>
> The base coordinator interface and the Python coordinator will live in
> “airflow.sdk.execution_time”. Other coordinators (for foreign
>
> languages)
>
> are registered through the existing Airflow provider mechanism. Each
>
> SDK
>
> provider declares its coordinator in its provider.yaml under a
> “coordinators” extension point. Both ProvidersManager (airflow-core)
>
> and
>
> ProvidersManagerTaskRuntime (task-sdk) discover coordinators through
>
> this
>
> extension point. This means adding a new language runtime requires
>
> only a
>
> provider package. No changes to Airflow Core are needed.
>
> The new JVM-based coordinator will live in the namespace
> “airflow.providers.sdk.java”. This is not the most accurate name
> (technically it should be “jvm” instead), but in practice most users
>
> will
>
> recognize it, and (from my understanding) other JVM language users
>
> (e.g.
>
> Kotlin, Scala) are already well-versed enough dealing with Java
> interoperability to understand “java” means JVM in this context.
>
> Writing DAGs in Java
> ====================
>
> This is not strictly connected to AIP-72, but considered by us as a
> natural next step since we can now implement tasks in a foreign
>
> language.
>
> Being able to define the DAG in the same language as the task
> implementation is useful since writing Python, even if only with
>
> minimal
>
> syntax, is still a hurdle for those not already familiar with, or even
> allowed to run it. There are mainly three things we need on top of the
>
> task
>
> implementation interface:
>
> - DAG flags (e.g. schedule, max_active_tasks)
> - Task flags (e.g. trigger_rule, weight_rule)
> - Task dependencies
>
> A proof-of-concept implementation is included with other changes
>
> proposed
>
> elsewhere in this document.
>
> Lazy Consensus Topics
> =====================
>
> We’re calling for lazy consensus for the following topics
>
> - A new “queue_to_sdk” configuration option to route tasks to a
>
> specific
>
> language SDK
> - A new coordinator layer in the SDK to route implementations at
>
> execution
>
> time.
> - New providers under airflow.providers.sdk to provide additional
>
> language
>
> support.
> - Develop the Go SDK to support the proposed model and a provider
>
> package
>
> for the coordinator. (Existing features stay as-is; no breaking
>
> changes.)
>
> - Add the new Java SDK and the corresponding provider package.
>
> TP
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:[email protected]
> For additional commands, e-mail:[email protected]
>
>
>
>

Reply via email to