uranusjr commented on code in PR #65956: URL: https://github.com/apache/airflow/pull/65956#discussion_r3286832108
########## java-sdk/adr/0004-dag-parsing.md: ########## @@ -0,0 +1,370 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# ADR-0002: DAG Parsing — Language-Specific DAG File Processing + +## Status + +Accepted + +## Context + +Airflow's standard DAG file processor only understands Python files. To support DAGs defined in other languages (Java, Go, Rust, etc.), the pipeline needs an extension point where a language-specific processor can intercept the parsing request, delegate to an external runtime, and return a result in the same format the Airflow scheduler expects. + +This ADR details the DAG parsing side of the coordinator architecture described in [ADR-0001](0001-java-sdk-airflow-integration.md). It starts with the generic model — the abstract contracts and expected behavior that any language must implement — then walks through Java as a concrete example. + +## Decision + +### Extension Point: `BaseCoordinator` + +A single abstract base class — `BaseCoordinator` — handles both DAG parsing and task execution. It is registered in `provider.yaml` under `coordinators`. For DAG parsing, a subclass must implement two methods: + +| Method | Signature | Responsibility | +|---|---|---| +| `can_handle_dag_file` | `(bundle_name, path) -> bool` | Return `True` if this coordinator should handle the given file. Default returns `False`; subclasses add language-specific checks (e.g., "is this a JAR with a Main-Class?"). | +| `dag_parsing_cmd` | `(dag_file_path, bundle_name, bundle_path, comm_addr, logs_addr) -> list[str]` | Return the full command to launch the language runtime. `comm_addr` and `logs_addr` are `host:port` strings the process must connect to. | + +### Registration + +In the provider's `provider.yaml`: + +```yaml +process-coordinators: + - airflow.providers.sdk.<lang>.coordinator.<LangCoordinator> +``` + +A single registration covers both DAG parsing and task execution — there are no separate `dag-file-processors` or `task-coordinators` keys. + +### Discovery: `_resolve_processor_target()` + +When `DagFileProcessorProcess.start()` needs to parse a file: + +``` +_resolve_processor_target(path, bundle_name, bundle_path) + for each class_path in ProvidersManager().coordinators: + coordinator_cls = import_string(class_path) + if coordinator_cls.can_handle_dag_file(bundle_name, path): + return functools.partial(coordinator_cls.run_dag_parsing, path=..., bundle_name=..., bundle_path=...) + return None # fall back to default Python parser +``` + +The first coordinator whose `can_handle_dag_file()` returns `True` wins. If none match, the default Python `_parse_file_entrypoint` runs. + +### What the Base Class Handles Automatically + +The matched coordinator's `run_dag_parsing()` (a concrete method on `BaseCoordinator`) delegates to `_runtime_subprocess_entrypoint()`, which handles all the TCP/process plumbing: + +1. Creates two TCP servers on `127.0.0.1` with random ports (comm + logs) Review Comment: This ADR is moved back to Proposed since it’s not included in AIP-108. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
