Hi Jens, The ADRs are now available here [1] , hope that helps clarify some of your concerns.
> As I understood Java is static compiled JAR files, no on-the-fly compile from Java source tree (correct?) That's correct -- the user will compile themselves and set the `[java] bundles_folder` config to point to the directory of those JAR files. > so actually the Dag parsing concept then is quite "static" and once generated actually no need to re-parse the Dag in Java mode? > Still if static JAR deployed how does the deploy lifecacly look like? Would you need to restart with new deploy the Dag Parsr and respective workers? The lifecycle for dag-parsing will be the same as how the current `DagFileProcessorProcess` acts. The coordinator comes into play before we start the actual parse file entrypoint [2]. Regardless of what language the parse file subprocess is implemented in, as long as it returns a valid serialized Dag JSON in msgpack over IPC, the behavior will remain the same. So there is no need to restart the `airflow dag-processor`, and the `airflow worker` will not be involved in dag processing at all. > Is it really realistic that a "LocalExecutor" needs to be supported or can we limit it to e-g- only remote executors to reduce coupling and complexity of operating the core? The coordinator is the interface that decides how we want to launch the subprocess for both dag-processing and workload-execution. This means it will support **any** executor out of the box, as we integrate the coordinator at the TaskSDK level for workload-execution [3]. > What overhead does the "Coordinator layer" generate compared to a Java specific supervisor implementation? The "Coordinator layer" is the interface for Airflow-Core to interact with the target language subprocess. We still need a Java-specific supervisor implementation, which is the first PR I mentioned in another thread -- Java SDK [4]. > Is it not only a new / additional process but also IPC involved then. And at least I saw also performance problems e.g. using very large XComs where even heartbeats are lost due to long running IPC I would consider this out of scope of the "Java SDK and the Coordinator Layer" AIP, as the current Python-native TaskSDK supervisor will encounter the same issue. The multi-language support here follows the same protocol that the current TaskSDK uses. [1] https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0 [2] https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573 [3] https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001 [4] https://github.com/apache/airflow/pull/65956 Thanks. Best, Jason Best, Jason On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected]> wrote: > Thanks TP for raising this! > > I would need a sleep-over the block of information in the described > details and might have some detail questions just to ensure I understood > right. So in a ADR or AIP document might be better to comment than in an > email thread. > > Things that jump into my head but there would be more coming thinking > about it: > > * As I understood Java is static compiled JAR files, no on-the-fly > compile from Java source tree (correct?) - in this case also the > Dags are "static until re-deploy" - so actually the Dag parsing > concept then is quite "static" and once generated actually no need > to re-parse the Dag in Java mode? > * Still if static JAR deployed how does the deploy lifecacly look > like? Would you need to restart with new deploy the Dag Parsr and > respective workers? > * Is it really realistic that a "LocalExecutor" needs to be supported > or can we limit it to e-g- only remote executors to reduce coupling > and complexity of operating the core? > * What overhead does the "Coordinator layer" generate compared to a > Java specific supervisor implementation? Is it not only a new / > additional process but also IPC involved then. And at least I saw > also performance problems e.g. using very large XComs where even > heartbeats are lost due to long running IPC > (https://github.com/apache/airflow/issues/64628) > * (There might be more coming :-) ) > > Jens > > P.S.: Questions here also do not mean rejection but like to understand > which complexity and overhead we have adding all this. > > On 27.04.26 15:21, Jarek Potiuk wrote: > > Also I would like to point out one thing. > > > > This should not be `LAZY CONSENSUS` just yet, this is quite a big thing > to > > discuss. I missed the subject already had it. > > > > At this stage this is really a discussion (I renamed the thread. Because > > ... we have never discussed it before. > > > > LAZY CONSENSUS should be really called for after initial discussion (on > > devlist) points to us actually reaching the consensus. > > > > While this one is unlikely to cause much controversy (I think), > sufficient > > time for people to discuss and digest it before calling for lazy > consensus > > is a necessary prerequisite. We simply need time to build consensus. > > > > We have a few processes where we build a "general" consensus first, and > > then we apply it only to particular cases (such as new providers). But in > > most cases when we have a "big" thing to discuss, we need to build > > consensus on devlist first. While in some cases people discussed things > > off-list and came to some conclusions (whch is perfectly fine) - bringing > > it to the list as a consensus, where we do not know if we achieved it > yet, > > is - I think - a bit premature. > > > > See the lazy consensus explanation [1] and "consenesus building [2] > > > > [1] Lazy consensus - > > > https://community.apache.org/committers/decisionMaking.html#lazy-consensus > > [2] Consensus building - > > > https://community.apache.org/committers/decisionMaking.html#consensus-building > > > > J. > > > > On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected]> > > wrote: > > > >> Hey TP, > >> > >> Overall +1, This is quite an interesting implementation. A couple > >> questions, is provider the right place for the coordinator? Don't have > >> strong opinions or alternatives, but I am curious. > >> > >> Also for the parser wanted to understand a bit better how it works? I > tried > >> going through the SDK but wasn't able to fully understand it. Also +1 to > >> Jarek's recommendation for documentation. > >> > >> > >> > >> -- > >> Regards, > >> Aritra Basu > >> > >> On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, < > >> [email protected]> wrote: > >> > >>> Hi all, > >>> > >>> As mentioned in the latest dev call, we have been developing a Java SDK > >>> with changes to Airflow in a separate fork[1]. We plan to start merging > >> the > >>> Java SDK work back into the OSS repository. > >>> > >>> We see this as a natural step following initial work in AIP-72[2], > which > >>> created “a clean language agnostic interface for task execution, with > >>> support for multiple language bindings” (quoted from the proposal). > >>> > >>> The Java SDK also uses Ash’s addition of @task.stub[3] for the Go SDK, > to > >>> declare a task in a DAG to be “implemented elsewhere” (not in the > >> annotated > >>> function). Similar to the Go SDK, we also created a Java library that > >> users > >>> can use to write task implementations for Airflow to execute at > runtime. > >>> > >>> [1]:https://github.com/astronomer/airflow/tree/feature/java-all > >>> [2]:https://cwiki.apache.org/confluence/x/xgmTEg > >>> [3]:https://github.com/apache/airflow/pull/56055 > >>> > >>> The user-facing syntax for a stub task would be the same as implemented > >> by > >>> the Go SDK: > >>> > >>> @task.stub(queue="java-tasks") > >>> def my_task(): ... > >>> > >>> With a new configuration option to map tasks in a pool to be executed > by > >> a > >>> specific SDK: > >>> > >>> [sdk] > >>> queue_to_sdk = {"java-tasks": "java"} > >>> > >>> The configuration is needed for some executors the Go SDK currently > does > >>> not support. The Go SDK currently relies on each executor worker > process > >> to > >>> specify which queues they listen to, but this is not always viable, > since > >>> some executors—LocalExecutor, for example—do not have the concept of > >> worker > >>> processes. > >>> > >>> The Coordinator Layer > >>> ===================== > >>> > >>> When the Go SDK was implemented, it left out Runtime Airflow plugins > as a > >>> future topic. This includes custom XCom backends, secrets backends > lookup > >>> for connections and variables, etc. These components are implemented in > >>> Python, and a Java task cannot easily use the feature unless we also > >>> implement the lookup logic in Java. We don’t want to do that since it > >>> introduces significant overhead to writing plugins, and the overhead > >>> multiplies with each new language SDK. > >>> > >>> Fortunately, the current execution-time task runner already uses a > >>> two-layer design. When an executor wants to run a task, it starts a > >>> (Python) task runner process that talks to Airflow Core through the > >>> Execution API, and *forks* another (Python) process, which talks to the > >>> task runner through TCP, to run the actual task code. Airflow plugins > >>> simply go into the task runner process. > >>> > >>> This design works well for us since it keeps all the Airflow plugins in > >>> Python. The only thing missing is an abstraction for the task runner > >>> process to run tasks in any language. We are calling this new layer the > >>> **Coordinator**. > >>> > >>> When a DAG bundle is loaded, it not only tells Airflow how to find the > >>> DAGs (and the tasks in them), but also how to *run* each task. Current > >>> Python tasks use the Python Coordinator, running tasks by forking as > >>> previously described. A new JVM Coordinator will instruct the task > runner > >>> how to run tasks packaged in JAR files. > >>> > >>> Each coordinator implements a base interface (BaseRuntimeCoordinator) > >> that > >>> handles three concerns: > >>> > >>> - Discovery: determining whether a given file belongs to this > coordinator > >>> (e.g. JAR files for Java). > >>> - DAG parsing: returning a runtime-specific subprocess command to parse > >>> DAG files in the target language. > >>> - Task execution: returning a runtime-specific subprocess command to > >>> execute tasks in the target runtime. > >>> > >>> The base class owns the full bridge lifecycle—TCP servers, subprocess > >>> management, and cleanup—so language providers only need to implement > >> these > >>> three methods. > >>> > >>> The coordinator translates a DagFileParseRequest (for DAG parsing) and > >>> StartupDetails (for Task execution) data model (as declared in Airflow) > >>> into the appropriate commands for the target runtime. For example, a > >> “java > >>> -classpath ... /path/to/MainClass ...” subprocess command that points > to > >>> the correct JAR file and main class in this case. > >>> > >>> Coordinators as Airflow Providers > >>> ================================= > >>> > >>> The base coordinator interface and the Python coordinator will live in > >>> “airflow.sdk.execution_time”. Other coordinators (for foreign > languages) > >>> are registered through the existing Airflow provider mechanism. Each > SDK > >>> provider declares its coordinator in its provider.yaml under a > >>> “coordinators” extension point. Both ProvidersManager (airflow-core) > and > >>> ProvidersManagerTaskRuntime (task-sdk) discover coordinators through > this > >>> extension point. This means adding a new language runtime requires > only a > >>> provider package. No changes to Airflow Core are needed. > >>> > >>> The new JVM-based coordinator will live in the namespace > >>> “airflow.providers.sdk.java”. This is not the most accurate name > >>> (technically it should be “jvm” instead), but in practice most users > will > >>> recognize it, and (from my understanding) other JVM language users > (e.g. > >>> Kotlin, Scala) are already well-versed enough dealing with Java > >>> interoperability to understand “java” means JVM in this context. > >>> > >>> Writing DAGs in Java > >>> ==================== > >>> > >>> This is not strictly connected to AIP-72, but considered by us as a > >>> natural next step since we can now implement tasks in a foreign > language. > >>> Being able to define the DAG in the same language as the task > >>> implementation is useful since writing Python, even if only with > minimal > >>> syntax, is still a hurdle for those not already familiar with, or even > >>> allowed to run it. There are mainly three things we need on top of the > >> task > >>> implementation interface: > >>> > >>> - DAG flags (e.g. schedule, max_active_tasks) > >>> - Task flags (e.g. trigger_rule, weight_rule) > >>> - Task dependencies > >>> > >>> A proof-of-concept implementation is included with other changes > proposed > >>> elsewhere in this document. > >>> > >>> Lazy Consensus Topics > >>> ===================== > >>> > >>> We’re calling for lazy consensus for the following topics > >>> > >>> - A new “queue_to_sdk” configuration option to route tasks to a > specific > >>> language SDK > >>> - A new coordinator layer in the SDK to route implementations at > >> execution > >>> time. > >>> - New providers under airflow.providers.sdk to provide additional > >> language > >>> support. > >>> - Develop the Go SDK to support the proposed model and a provider > package > >>> for the coordinator. (Existing features stay as-is; no breaking > changes.) > >>> - Add the new Java SDK and the corresponding provider package. > >>> > >>> TP > >>> > >>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail:[email protected] > >>> For additional commands, e-mail:[email protected] > >>> > >>>
