Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Jarek Potiuk Sat, 09 May 2026 11:37:15 -0700

> Coordinators are Python. They are imported into Airflow. Not separate
processes. tasks run in Python, and the coordinator knows how to talk to
them. How the messages are exchanged (not the messages themselves) is
purely between the coordinator (Python) and Java is only between them, and
the same goes for other language coordinator-SDK. It is not public and thus
not needed for specification.

Oh I think that's a misunderstanding then. I do understand that the
"coordinator" part runs in Python on Scheduler. I think it's more about
where the process executes, and whether it is long-running or started for
every task (and where).
Just to compare the two executors:

* LocalExecutor: The process for the task runs on the same "machine"
(container etc.) as the Scheduler.
* CeleryExecutor: The process runs somewhere from which the Celery worker
pulls it from the queue

So my assumption (and correct me if I am wrong or if this is
already explained and yeah - I missed it ):

DagProcessor with Java Coordinator:
a) A new Java process will start every time a new DAG is created.Is the
Java file about to be parsed?
b) Or will a long-running Java process run locally that the DagProcessor
will communicate with?

Both approaches are possible, each has different characteristics
(performance, cachine, warm-up time, JIT, potential pollution between
several independent Dags parsed by the same DagFileProcessor).

Task Execution with Java Coordinator:

a) Will all Java tasks run locally as a new Java process on the same
machine (container/machine) as the scheduler?
b) Or will there be a long-running Java process (like in DagFileProcessor)
that the Scheduler communicates with to execute the task?
c) Or will it depend on the Executor? If we use CeleryExecutor with a Java
Coordinator, does that mean the executing task will run a Python task on
Celery worker and that Python task will create Java process?
d) Or, is there a long-running Java process started on the Celery worker in
this case?
e) How about Edge Executor? Same question regarding long-running versus new
processes?

I was under the impression that one of the promises of "run all languages,
everywhere" was that we could have a standalone "language" component
running tasks in a given language—somewhere—that executes individual tasks
without the overhead of "python interpreter" starting every time the task
is started. This is what the talk "Run Airflow Tasks on a Coffee Machine"
basically promised:
https://airflowsummit.org/sessions/2025/run-airflow-tasks-on-your-coffee-machine
- and this is what the "edge" executor and its API promised to provide
"eventually". And we might depart from this vision... But I think a diagram
whene we see processes running (Python / Java) and how long they are
running (new process/interpreter per task - or long running) would be
useful to understand what we are proposing here.

J.

On Sat, May 9, 2026 at 7:26 PM Tzu-ping Chung <[email protected]> wrote:

> I’m replying to the 5th point separately since IMO it is quite obvious in
> the AIP. I also made some edits to the document so it is ABUNDANTLY clear.
>
> Coordinators are Python. They are imported into Airflow. Not separate
> processes. tasks run in Python, and the coordinator knows how to talk to
> them. How the messages are exchanged (not the messages themselves) is
> purely between the coordinator (Python) and Java is only between them, and
> the same goes for other language coordinator-SDK. It is not public and thus
> not needed for specification.
>
> Once the message is in the coordinator, it is a Pydantic object (this is
> why we need to standardise the message themselves, as discussed previously)
> that the dag processor or executor can use in memory.
>
>
> *5. Distribution of Jars and Java processes running *-> We should specify
> how we envision the distribution of code Jars working. As I understand
> coordinators have two sides: "can_handle_dag" (Python implementation so we
> do not run Java code in the scheduler?) and is it continuously running?
> Started on demand? How? Process runs on Java: one for DagFileProcessor (so
> that it can potentially parse the Dag Definition from the Java definition
> if the whole Dag is defined in Java?). Can this Java process live elsewhere
> and should the Dag Processor communicate with it? Or will it run as a
> subprocess of DagProcessor? Will it be one process or many processes? Does
> it start per DAG or run continuously? Similarly, consider the workers. Will
> those be the same?What jars does the DagFile processor use? Or different?
> How will they relate to the Dag Bundle? Are the .Are jars always present in
> DagBundle and distributed? I think at least a rough outline of the
> deployment "process" assumption is needed. Maybe it's already there and I
> badly missed it - but those are the questions that immediately come to mind
> when I see the proposal.
>
>
>

Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Reply via email to