> IMHO - if we do not want to support DB access at all from workers,
triggerrers and DAG file processors, we should replace the current "DB"
bound interface with a new one specifically designed for this
bi-directional direct communication Executor <-> Workers, 

That is exactly what I was thinking too (both that no DB should be the only 
option in v3, and that we need a bidirectional purpose designed interface) and 
am working up the details.

One of the key features of this will be giving each task try a "strong 
identity" that the API server can use to identify and trust the requests, 
likely some form of signed JWT.

I just need to finish off some other work before I can move over to focus 
Airflow fully

-a

On 7 June 2024 18:01:56 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
>I added some comments here and I think there is one big thing  that should
>be clarified when we get to "task isolation" - mainly dependance of it on
>AIP-44.
>
>The Internal gRPC API (AIP-44) was only designed in the way it was designed
>to allow using the same codebase to be used with/without DB. It's based on
>the assumption that a limited set of changes will be needed (that was
>underestimated) in order to support both DB and GRPC ways of communication
>between workers/triggerers/DAG file processors at the same time. That was a
>basic assumption for AIP-44 - that we will want to keep both ways and
>maximum backwards compatibility (including "pull" model of worker getting
>connections, variables, and updating task state in the Airflow DB). We are
>still using "DB" as a way to communicate between those components and this
>does not change with AIP-44.
>
>But for Airflow 3 the whole context is changed. If we go with the
>assumption that Airflow 3 will only have isolated tasks and no DB "option",
>I personally think using AIP-44 for that is a mistake. AIP-44 is merely a
>wrapper over existing DB calls designed to be kept updated together with
>the DB code, and the whole synchronisation of state, heartbeats, variables
>and connection access still uses the same "DB communication" model and
>there is basically no way we can get it more scalable this way. We will
>still have the same limitations on the DB - where a number of DB
>connections will be replaced with a number of GRPC connections, Essentially
>- more scalability and performance has never been the goal of AIP-44- all
>the assumptions are that it only brings isolation but nothing more will
>change. So I think it does not address some of the fundamental problems
>stated in this "isolation" document.
>
>Essentially AIP-44 merely exposes a small-ish number of methods (bigger
>than initially anticipated) but it only wraps around the existing DB
>mechanism. Essentially from the performance and scalability point of view -
>we do not get much more than currently when using pgbouncer. This one
>essentially turns a big number of connections coming from workers into a
>smaller number of pooled connections that pgbounder manages internal and
>multiplexes the calls over. With the difference that unlike AIP-44 Internal
>API server, pgbouncer does not limit the operations you can do from the
>worker/triggerer/dag file processor - that's the main difference between
>using pgbouncer and using our own Internal-API server.
>
>IMHO - if we do not want to support DB access at all from workers,
>triggerrers and DAG file processors, we should replace the current "DB"
>bound interface with a new one specifically designed for this
>bi-directional direct communication Executor <-> Workers, more in line with
>what Jens described in AIP-69 (and for example WebSocket and asynchronous
>communication comes immediately to my mind if I did not have to use DB for
>that communication). This is also why I put the AIP-67 on hold because IF
>we go that direction that we have "new" interface between worker, triggerer
>, dag file processor - it might be way easier (and safer) to introduce
>multi-team in Airflow 3 rather than 2 (or we can implement it differently
>in Airflow 2 and differently in Airflow 3).
>
>
>
>On Tue, Jun 4, 2024 at 3:58 PM Vikram Koka <vik...@astronomer.io.invalid>
>wrote:
>
>> Fellow Airflowers,
>>
>> I am following up on some of the proposed changes in the Airflow 3 proposal
>> <
>> https://docs.google.com/document/d/1MTr53101EISZaYidCUKcR6mRKshXGzW6DZFXGzetG3E/
>> >,
>> where more information was requested by the community, specifically around
>> the injection of Task Execution Secrets. This topic has been discussed at
>> various times with a variety of names, but here is a holistic proposal
>> around the whole task context mechanism.
>>
>> This is not yet a full fledged AIP, but is intended to facilitate a
>> structured discussion, which will then be followed up with a formal AIP
>> within the next two weeks. I have included most of the text here, but
>> please give detailed feedback in the attached document
>> <
>> https://docs.google.com/document/d/1BG8f4X2YdwNgHTtHoAyxA69SC_X0FFnn17PlzD65ljA/
>> >,
>> so that we can have a contextual discussion around specific points which
>> may need more detail.
>> ---
>> Motivation
>>
>> Historically, Airflow’s task execution context has been oriented around
>> local execution within a relatively trusted networking cluster.
>>
>> This includes:
>>
>>    -
>>
>>    the interaction between the Executor and the process of launching a task
>>    on Airflow Workers,
>>    -
>>
>>    the interaction between the Workers and the Airflow meta-database for
>>    connection and environment information as part of initial task startup,
>>    -
>>
>>    the interaction between the Airflow Workers and the rest of Airflow for
>>    heartbeat information, and so on.
>>
>> This has been accomplished by colocating all of the Airflow task execution
>> code with the user task code in the same container and process.
>>
>>
>>
>> For Airflow users at scale i.e. supporting multiple data teams, this has
>> posed many operational challenges:
>>
>>    -
>>
>>    Dependency conflicts for administrators supporting data teams using
>>    different versions of providers, libraries, or python packages
>>    -
>>
>>    Security challenge in the running of customer-defined code (task code
>>    within the DAGs) for multiple customers within the same operating
>>    environment and service accounts
>>    -
>>
>>    Scalability of Airflow since one of the core Airflow scalability
>>    limitations has been the number of concurrent database connections
>>    supported by the underlying database instance. To alleviate this
>> problem,
>>    we have consistently, as an Airflow community, recommended the use of
>>    PgBouncer for connection pooling, as part of an Airflow deployment.
>>    -
>>
>>    Operational issues caused by unintentional reliance on internal Airflow
>>    constructs within the DAG/Task code, which only and unexpectedly show
>> up as
>>    part of Airflow production operations, coincidentally with, but not
>> limited
>>    to upgrades and migrations.
>>    -
>>
>>    Operational management based on the above for Airflow platform teams at
>>    scale, because different data teams naturally operate at different
>>    velocities. Attempting to support these different teams with a common
>>    Airflow environment is unnecessarily challenging.
>>
>>
>>
>> The internal API to reduce the need for interaction between the Airflow
>> Workers and the metadatabase is a big and necessary step forward. However,
>> it doesn’t fully address the above challenges. The proposal below builds on
>> the internal API proposal and goes significantly further to not only
>> address these challenges above, but also enable the following key use
>> cases:
>>
>>    1.
>>
>>    Ensure that this interface reduces the interaction between the code
>>    running within the Task and the rest of Airflow. This is to address
>>    unintended ripple effects from core Airflow changes which has caused
>>    numerous Airflow upgrade issues, because Task (i.e. DAG) code relied on
>>    Core Airflow abstractions. This has been a common problem pointed out by
>>    numerous Airflow users including early adopters.
>>    2.
>>
>>    Enable quick, performant execution of tasks on local, trusted networks,
>>    without requiring the Airflow workers / tasks to connect to the Airflow
>>    database to obtain all the information required for task startup,
>>    3.
>>
>>    Enable remote execution of Airflow tasks across network boundaries, by
>>    establishing a clean interface for Airflow workers on remote networks
>> to be
>>    able to connect back to a central Airflow service to access all
>> information
>>    needed for task execution. This is foundational work for remote
>> execution.
>>    4.
>>
>>    Enable a clean language agnostic interface for task execution, with
>>    support for multiple language bindings, so that Airflow tasks can be
>>    written in languages beyond Python.
>>
>> Proposal
>>
>> The proposal here has multiple parts as detailed below.
>>
>>    1.
>>
>>    Formally split out the Task Execution Interface as the Airflow Task SDK
>>    (possibly name it as the Airflow SDK), which would be the only
>> interface to
>>    and from Airflow Task User code to the Airflow system components
>> including
>>    the meta-database, Airflow Executor, etc.
>>    2.
>>
>>    Disable all direct database interaction from the Airflow Workers
>>    including Tasks being run on those Airflow Workers and the Airflow
>>    meta-database.
>>    3.
>>
>>    The Airflow Task SDK will include interfaces for:
>>    -
>>
>>       Access to needed Airflow Connections, Variables, and XCom values
>>       -
>>
>>       Report heartbeat
>>       -
>>
>>       Record logs
>>       -
>>
>>       Report metrics
>>       4.
>>
>>    The Airflow Task SDK will support a Push mechanism for speedy local
>>    execution in trusted environments.
>>    5.
>>
>>    The Airflow Task SDK will also support a Pull mechanism for the remote
>>    Task execution environments to access information from an Airflow
>> instance
>>    over network boundaries.
>>    6.
>>
>>    The Airflow Task SDK will be designed to support multiple language
>>    bindings, with the first language binding of course being Python.
>>
>>
>> Assumption: The existing AIP for Internal API covers the interaction
>> between the Airflow workers and Airflow metadatabase for heartbeat
>> information, persisting XComs, and so on.
>> --
>>
>> Best regards,
>>
>> Vikram Koka, Ash Berlin-Taylor, Kaxil Naik, and Constance Martineau
>>

Reply via email to