Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Tzu-ping Chung via dev Sat, 09 May 2026 12:24:02 -0700

> On 10 May 2026, at 02:35, Jarek Potiuk <[email protected]> wrote:
> 
> > Coordinators are Python. They are imported into Airflow. Not separate 
> > processes. tasks run in Python, and the coordinator knows how to talk to 
> > them. How the messages are exchanged (not the messages themselves) is 
> > purely between the coordinator (Python) and Java is only between them, and 
> > the same goes for other language coordinator-SDK. It is not public and thus 
> > not needed for specification.
> 
> Oh I think that's a misunderstanding then. I do understand that the 
> "coordinator" part runs in Python on Scheduler. I think it's more about where 
> the process executes, and whether it is long-running or started for every 
> task (and where).
> Just to compare the two executors:
> 
> * LocalExecutor: The process for the task runs on the same "machine" 
> (container etc.) as the Scheduler.
> * CeleryExecutor: The process runs somewhere from which the Celery worker 
> pulls it from the queue
> 
> So my assumption (and correct me if I am wrong or if this is already 
> explained and yeah - I missed it ):
> 
> DagProcessor with Java Coordinator:
> a) A new Java process will start every time a new DAG is created.Is the Java 
> file about to be parsed?


One process for one round to parse one file. Same as Python dag files. (In some 
sense it’s more like the compiled JAR “parses itself” and returns the result.)


> b) Or will a long-running Java process run locally that the DagProcessor will 
> communicate with? 
> 
> Both approaches are possible, each has different characteristics 
> (performance, cachine, warm-up time, JIT, potential pollution between several 
> independent Dags parsed by the same DagFileProcessor). 

Correct, but from my research JVM does not make this reusing a process easy 
without serious restriction to what libraries people can use, and/or how global 
states can be manipulated. All in all not suitable for using Java in Airflow 
IMO.

I am not ruling out the possibility though (my understanding to the JVM 
ecosystem has a lot of gaps, to put it mildly), so the AIP also does not really 
commit either way. It doesn’t need to though since whether to launch 
long-running worker processes is still entirely within the Java coordinator, 
and does not require a public interface change.

Approach b also may make sense for some other runtimes, but that’s out of the 
scope for now.


> 
> Task Execution with Java Coordinator:
> 
> a) Will all Java tasks run locally as a new Java process on the same machine 
> (container/machine) as the scheduler?
> b) Or will there be a long-running Java process (like in DagFileProcessor) 
> that the Scheduler communicates with to execute the task? 
> c) Or will it depend on the Executor? If we use CeleryExecutor with a Java 
> Coordinator, does that mean the executing task will run a Python task on 
> Celery worker and that Python task will create Java process? 
> d) Or, is there a long-running Java process started on the Celery worker in 
> this case? 
> e) How about Edge Executor? Same question regarding long-running versus new 
> processes?

It is run on the same environment as if the task is written in Python. Where 
that is depends on the executor.


> 
> I was under the impression that one of the promises of "run all languages, 
> everywhere" was that we could have a standalone "language" component running 
> tasks in a given language—somewhere—that executes individual tasks without 
> the overhead of "python interpreter" starting every time the task is started. 
> This is what the talk "Run Airflow Tasks on a Coffee Machine" basically 
> promised: 
> https://airflowsummit.org/sessions/2025/run-airflow-tasks-on-your-coffee-machine
>  - and this is what the "edge" executor and its API promised to provide 
> "eventually". And we might depart from this vision... But I think a diagram 
> whene we see processes running (Python / Java) and how long they are running 
> (new process/interpreter per task - or long running) would be useful to 
> understand what we are proposing here.

Having your coffee machine able to run Airflow tasks is a great idea, but not 
something needed by most prospective Airflow users and existing Airflow users 
that want to leverage Java. Those people want existing Airflow things to work 
with Java, and running Python with their Java program is the best way to 
achieve this. On the other hand, your coffee machine can’t afford to run a 
Python interpreter, but probably also doesn’t need its own secret backend or 
custom XCom.

This is also why the AIP makes no mention of the Edge Executor, while the 
linked talk considers it one of the big components. These are seemingly 
similar, but ultimately very different goals, and require different solutions. 
This AIP does not attempt to only solve one of them.

Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Reply via email to