Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Jarek Potiuk Sat, 09 May 2026 12:51:03 -0700

I think we should make it then explicit in the proposal. It's not spelled
out currently.


On Sat, May 9, 2026 at 9:23 PM Tzu-ping Chung <[email protected]> wrote:

>
> On 10 May 2026, at 02:35, Jarek Potiuk <[email protected]> wrote:
>
> > Coordinators are Python. They are imported into Airflow. Not separate
> processes. tasks run in Python, and the coordinator knows how to talk to
> them. How the messages are exchanged (not the messages themselves) is
> purely between the coordinator (Python) and Java is only between them, and
> the same goes for other language coordinator-SDK. It is not public and thus
> not needed for specification.
>
> Oh I think that's a misunderstanding then. I do understand that the
> "coordinator" part runs in Python on Scheduler. I think it's more about
> where the process executes, and whether it is long-running or started for
> every task (and where).
> Just to compare the two executors:
>
> * LocalExecutor: The process for the task runs on the same "machine"
> (container etc.) as the Scheduler.
> * CeleryExecutor: The process runs somewhere from which the Celery worker
> pulls it from the queue
>
> So my assumption (and correct me if I am wrong or if this is
> already explained and yeah - I missed it ):
>
> DagProcessor with Java Coordinator:
> a) A new Java process will start every time a new DAG is created.Is the
> Java file about to be parsed?
>
>
> One process for one round to parse one file. Same as Python dag files. (In
> some sense it’s more like the compiled JAR “parses itself” and returns the
> result.)
>
>
> b) Or will a long-running Java process run locally that the DagProcessor
> will communicate with?
>
> Both approaches are possible, each has different characteristics
> (performance, cachine, warm-up time, JIT, potential pollution between
> several independent Dags parsed by the same DagFileProcessor).
>
>
> Correct, but from my research JVM does not make this reusing a process
> easy without serious restriction to what libraries people can use, and/or
> how global states can be manipulated. All in all not suitable for using
> Java in Airflow IMO.
>
> I am not ruling out the possibility though (my understanding to the JVM
> ecosystem has a lot of gaps, to put it mildly), so the AIP also does not
> really commit either way. It doesn’t need to though since whether to launch
> long-running worker processes is still entirely within the Java
> coordinator, and does not require a public interface change.
>
> Approach b also may make sense for some other runtimes, but that’s out of
> the scope for now.
>
>
>
> Task Execution with Java Coordinator:
>
> a) Will all Java tasks run locally as a new Java process on the same
> machine (container/machine) as the scheduler?
>
> b) Or will there be a long-running Java process (like in DagFileProcessor)
> that the Scheduler communicates with to execute the task?
> c) Or will it depend on the Executor? If we use CeleryExecutor with a Java
> Coordinator, does that mean the executing task will run a Python task on
> Celery worker and that Python task will create Java process?
> d) Or, is there a long-running Java process started on the Celery worker
> in this case?
> e) How about Edge Executor? Same question regarding long-running versus
> new processes?
>
>
> It is run on the same environment as if the task is written in Python.
> Where that is depends on the executor.
>
>
>
> I was under the impression that one of the promises of "run all languages,
> everywhere" was that we could have a standalone "language" component
> running tasks in a given language—somewhere—that executes individual tasks
> without the overhead of "python interpreter" starting every time the task
> is started. This is what the talk "Run Airflow Tasks on a Coffee Machine"
> basically promised:
> https://airflowsummit.org/sessions/2025/run-airflow-tasks-on-your-coffee-machine
> - and this is what the "edge" executor and its API promised to provide
> "eventually". And we might depart from this vision... But I think a diagram
> whene we see processes running (Python / Java) and how long they are
> running (new process/interpreter per task - or long running) would be
> useful to understand what we are proposing here.
>
>
> Having your coffee machine able to run Airflow tasks is a great idea, but
> not something needed by most prospective Airflow users and existing Airflow
> users that want to leverage Java. Those people want existing Airflow things
> to work with Java, and running Python with their Java program is the best
> way to achieve this. On the other hand, your coffee machine can’t afford to
> run a Python interpreter, but probably also doesn’t need its own secret
> backend or custom XCom.
>
> This is also why the AIP makes no mention of the Edge Executor, while the
> linked talk considers it one of the big components. These are seemingly
> similar, but ultimately very different goals, and require different
> solutions. This AIP does not attempt to only solve one of them.
>
>
>
>
>
>
>
>

Re: [DISCUSS] Java SDK and the Language Coordinator Layer [was: LAZY CONSENSUS]

Reply via email to