Oh super - nice. I am definitely going to take a look (at both) :D On Tue, Apr 28, 2026 at 9:55 AM Tzu-ping Chung via dev < [email protected]> wrote:
> Hi all, > > I’ve merged the ADRs and the previous LC message (in the other thread) > into a document formatted as an AIP: > > AIP-108 Java Task SDK and the Language Coordinator Layer - Airflow - > Apache Software Foundation <https://cwiki.apache.org/confluence/x/pY4mGQ> > cwiki.apache.org <https://cwiki.apache.org/confluence/x/pY4mGQ> > [image: favicon.ico] <https://cwiki.apache.org/confluence/x/pY4mGQ> > <https://cwiki.apache.org/confluence/x/pY4mGQ> > > > Various sections are taken mostly directly from the other documents, so > you can probably glance over them if you’ve already read them. If you > haven’t, the AIP document would be a good place to start since it removes > some more detailed descriptions and focuses on the high level interfaces. > More details are still available in the ADR documents included in the PRs > Jason opened. > > TP > > > On 28 Apr 2026, at 11:12, Zhe-You(Jason) Liu <[email protected]> wrote: > > Hi Jens, > > The ADRs are now available here [1] , hope that helps clarify some of your > concerns. > > As I understood Java is static compiled JAR files, no on-the-fly compile > > from Java source tree (correct?) > > That's correct -- the user will compile themselves and set the `[java] > bundles_folder` config to point to the directory of those JAR files. > > so actually the Dag parsing concept then is quite "static" and once > > generated actually no need to re-parse the Dag in Java mode? > > Still if static JAR deployed how does the deploy lifecacly look like? > > Would you need to restart with new deploy the Dag Parsr and respective > workers? > > The lifecycle for dag-parsing will be the same as how the current > `DagFileProcessorProcess` acts. The coordinator comes into play before we > start the actual parse file entrypoint [2]. Regardless of what language the > parse file subprocess is implemented in, as long as it returns a valid > serialized Dag JSON in msgpack over IPC, the behavior will remain the same. > So there is no need to restart the `airflow dag-processor`, and the > `airflow worker` will not be involved in dag processing at all. > > Is it really realistic that a "LocalExecutor" needs to be supported or > > can we limit it to e-g- only remote executors to reduce coupling and > complexity of operating the core? > > The coordinator is the interface that decides how we want to launch the > subprocess for both dag-processing and workload-execution. This means it > will support **any** executor out of the box, as we integrate the > coordinator at the TaskSDK level for workload-execution [3]. > > What overhead does the "Coordinator layer" generate compared to a Java > > specific supervisor implementation? > > The "Coordinator layer" is the interface for Airflow-Core to interact with > the target language subprocess. We still need a Java-specific supervisor > implementation, which is the first PR I mentioned in another thread -- Java > SDK [4]. > > Is it not only a new / additional process but also IPC involved then. And > > at least I saw also performance problems e.g. using very large XComs where > even heartbeats are lost due to long running IPC > > I would consider this out of scope of the "Java SDK and the Coordinator > Layer" AIP, as the current Python-native TaskSDK supervisor will encounter > the same issue. The multi-language support here follows the same protocol > that the current TaskSDK uses. > > [1] > > https://github.com/apache/airflow/pull/65956/changes/876179ab55d3b31486ee52f4c27abd4e215b0fd0 > [2] > > https://github.com/apache/airflow/pull/65958/changes#diff-564fd0a8fbe4cc47864a8043fcc1389b33120c88bb35852b26f45c36b902f70bR540-R573 > [3] > > https://github.com/apache/airflow/pull/65958/changes#diff-5bef10ab2956abf7360dbf9b509b6e1113407874d24abcc1b276475051f13abfR1991-R2001 > [4] https://github.com/apache/airflow/pull/65956 > > Thanks. > > Best, > Jason > > Best, > Jason > > On Tue, Apr 28, 2026 at 3:03 AM Jens Scheffler <[email protected]> > wrote: > > Thanks TP for raising this! > > I would need a sleep-over the block of information in the described > details and might have some detail questions just to ensure I understood > right. So in a ADR or AIP document might be better to comment than in an > email thread. > > Things that jump into my head but there would be more coming thinking > about it: > > * As I understood Java is static compiled JAR files, no on-the-fly > compile from Java source tree (correct?) - in this case also the > Dags are "static until re-deploy" - so actually the Dag parsing > concept then is quite "static" and once generated actually no need > to re-parse the Dag in Java mode? > * Still if static JAR deployed how does the deploy lifecacly look > like? Would you need to restart with new deploy the Dag Parsr and > respective workers? > * Is it really realistic that a "LocalExecutor" needs to be supported > or can we limit it to e-g- only remote executors to reduce coupling > and complexity of operating the core? > * What overhead does the "Coordinator layer" generate compared to a > Java specific supervisor implementation? Is it not only a new / > additional process but also IPC involved then. And at least I saw > also performance problems e.g. using very large XComs where even > heartbeats are lost due to long running IPC > (https://github.com/apache/airflow/issues/64628) > * (There might be more coming :-) ) > > Jens > > P.S.: Questions here also do not mean rejection but like to understand > which complexity and overhead we have adding all this. > > On 27.04.26 15:21, Jarek Potiuk wrote: > > Also I would like to point out one thing. > > This should not be `LAZY CONSENSUS` just yet, this is quite a big thing > > to > > discuss. I missed the subject already had it. > > At this stage this is really a discussion (I renamed the thread. Because > ... we have never discussed it before. > > LAZY CONSENSUS should be really called for after initial discussion (on > devlist) points to us actually reaching the consensus. > > While this one is unlikely to cause much controversy (I think), > > sufficient > > time for people to discuss and digest it before calling for lazy > > consensus > > is a necessary prerequisite. We simply need time to build consensus. > > We have a few processes where we build a "general" consensus first, and > then we apply it only to particular cases (such as new providers). But in > most cases when we have a "big" thing to discuss, we need to build > consensus on devlist first. While in some cases people discussed things > off-list and came to some conclusions (whch is perfectly fine) - bringing > it to the list as a consensus, where we do not know if we achieved it > > yet, > > is - I think - a bit premature. > > See the lazy consensus explanation [1] and "consenesus building [2] > > [1] Lazy consensus - > > https://community.apache.org/committers/decisionMaking.html#lazy-consensus > > [2] Consensus building - > > > https://community.apache.org/committers/decisionMaking.html#consensus-building > > > J. > > On Mon, Apr 27, 2026 at 1:27 PM Aritra Basu<[email protected]> > wrote: > > Hey TP, > > Overall +1, This is quite an interesting implementation. A couple > questions, is provider the right place for the coordinator? Don't have > strong opinions or alternatives, but I am curious. > > Also for the parser wanted to understand a bit better how it works? I > > tried > > going through the SDK but wasn't able to fully understand it. Also +1 to > Jarek's recommendation for documentation. > > > > -- > Regards, > Aritra Basu > > On Mon, 27 Apr 2026, 11:39 am Tzu-ping Chung via dev, < > [email protected]> wrote: > > Hi all, > > As mentioned in the latest dev call, we have been developing a Java SDK > with changes to Airflow in a separate fork[1]. We plan to start merging > > the > > Java SDK work back into the OSS repository. > > We see this as a natural step following initial work in AIP-72[2], > > which > > created “a clean language agnostic interface for task execution, with > support for multiple language bindings” (quoted from the proposal). > > The Java SDK also uses Ash’s addition of @task.stub[3] for the Go SDK, > > to > > declare a task in a DAG to be “implemented elsewhere” (not in the > > annotated > > function). Similar to the Go SDK, we also created a Java library that > > users > > can use to write task implementations for Airflow to execute at > > runtime. > > > [1]:https://github.com/astronomer/airflow/tree/feature/java-all > [2]:https://cwiki.apache.org/confluence/x/xgmTEg > [3]:https://github.com/apache/airflow/pull/56055 > > The user-facing syntax for a stub task would be the same as implemented > > by > > the Go SDK: > > @task.stub(queue="java-tasks") > def my_task(): ... > > With a new configuration option to map tasks in a pool to be executed > > by > > a > > specific SDK: > > [sdk] > queue_to_sdk = {"java-tasks": "java"} > > The configuration is needed for some executors the Go SDK currently > > does > > not support. The Go SDK currently relies on each executor worker > > process > > to > > specify which queues they listen to, but this is not always viable, > > since > > some executors—LocalExecutor, for example—do not have the concept of > > worker > > processes. > > The Coordinator Layer > ===================== > > When the Go SDK was implemented, it left out Runtime Airflow plugins > > as a > > future topic. This includes custom XCom backends, secrets backends > > lookup > > for connections and variables, etc. These components are implemented in > Python, and a Java task cannot easily use the feature unless we also > implement the lookup logic in Java. We don’t want to do that since it > introduces significant overhead to writing plugins, and the overhead > multiplies with each new language SDK. > > Fortunately, the current execution-time task runner already uses a > two-layer design. When an executor wants to run a task, it starts a > (Python) task runner process that talks to Airflow Core through the > Execution API, and *forks* another (Python) process, which talks to the > task runner through TCP, to run the actual task code. Airflow plugins > simply go into the task runner process. > > This design works well for us since it keeps all the Airflow plugins in > Python. The only thing missing is an abstraction for the task runner > process to run tasks in any language. We are calling this new layer the > **Coordinator**. > > When a DAG bundle is loaded, it not only tells Airflow how to find the > DAGs (and the tasks in them), but also how to *run* each task. Current > Python tasks use the Python Coordinator, running tasks by forking as > previously described. A new JVM Coordinator will instruct the task > > runner > > how to run tasks packaged in JAR files. > > Each coordinator implements a base interface (BaseRuntimeCoordinator) > > that > > handles three concerns: > > - Discovery: determining whether a given file belongs to this > > coordinator > > (e.g. JAR files for Java). > - DAG parsing: returning a runtime-specific subprocess command to parse > DAG files in the target language. > - Task execution: returning a runtime-specific subprocess command to > execute tasks in the target runtime. > > The base class owns the full bridge lifecycle—TCP servers, subprocess > management, and cleanup—so language providers only need to implement > > these > > three methods. > > The coordinator translates a DagFileParseRequest (for DAG parsing) and > StartupDetails (for Task execution) data model (as declared in Airflow) > into the appropriate commands for the target runtime. For example, a > > “java > > -classpath ... /path/to/MainClass ...” subprocess command that points > > to > > the correct JAR file and main class in this case. > > Coordinators as Airflow Providers > ================================= > > The base coordinator interface and the Python coordinator will live in > “airflow.sdk.execution_time”. Other coordinators (for foreign > > languages) > > are registered through the existing Airflow provider mechanism. Each > > SDK > > provider declares its coordinator in its provider.yaml under a > “coordinators” extension point. Both ProvidersManager (airflow-core) > > and > > ProvidersManagerTaskRuntime (task-sdk) discover coordinators through > > this > > extension point. This means adding a new language runtime requires > > only a > > provider package. No changes to Airflow Core are needed. > > The new JVM-based coordinator will live in the namespace > “airflow.providers.sdk.java”. This is not the most accurate name > (technically it should be “jvm” instead), but in practice most users > > will > > recognize it, and (from my understanding) other JVM language users > > (e.g. > > Kotlin, Scala) are already well-versed enough dealing with Java > interoperability to understand “java” means JVM in this context. > > Writing DAGs in Java > ==================== > > This is not strictly connected to AIP-72, but considered by us as a > natural next step since we can now implement tasks in a foreign > > language. > > Being able to define the DAG in the same language as the task > implementation is useful since writing Python, even if only with > > minimal > > syntax, is still a hurdle for those not already familiar with, or even > allowed to run it. There are mainly three things we need on top of the > > task > > implementation interface: > > - DAG flags (e.g. schedule, max_active_tasks) > - Task flags (e.g. trigger_rule, weight_rule) > - Task dependencies > > A proof-of-concept implementation is included with other changes > > proposed > > elsewhere in this document. > > Lazy Consensus Topics > ===================== > > We’re calling for lazy consensus for the following topics > > - A new “queue_to_sdk” configuration option to route tasks to a > > specific > > language SDK > - A new coordinator layer in the SDK to route implementations at > > execution > > time. > - New providers under airflow.providers.sdk to provide additional > > language > > support. > - Develop the Go SDK to support the proposed model and a provider > > package > > for the coordinator. (Existing features stay as-is; no breaking > > changes.) > > - Add the new Java SDK and the corresponding provider package. > > TP > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail:[email protected] > For additional commands, e-mail:[email protected] > > > >
