IMHO this is not splitting but completely rewriting the scheduler from
scratch. If you decouple the database from the scheduler there is
nothing left, because essentially about 80% of the code for it is
SQL-alchemy / Relational database bound.

So I would say this post should be named "Should we start building a
new Airflow from scratch". But this is just my opinion, I might be
biased and very wrong on that.

J,

On Thu, Aug 3, 2023 at 6:18 PM Huang Junyao <[email protected]> wrote:
>
> Description
>
> In the beginning, the Airflow community takes integrity as the first
> priority,
>
>    - use Celery as a task schedule framework
>    - use PostgreSQL, MySQL, or MSSQL as meta database backend
>
> And the community splits providers from the architecture, which brings a
> large number of providers
> <https://airflow.apache.org/docs/apache-airflow-providers/> into Airflow.
>
> Now Airflow has been the popular distributed, cloud-native workflow
> management platform.
>
> I think maybe we can make the scheduler pluggable.
>
> Now we have the following constraints:
>
>    1. Scheduler Database Requirements
>    
> <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#database-requirements>
> bring
>    some performance bottleneck
>    2. SQL-Compatible meta database backend requirements
>
> In fact, the Airflow platform relies on these dependencies:
>
>    1. AMQP-Compatible Task Queue, which is relied on by the Celery
>    framework and uses Redis as the default implementation, is optional since
>    we bring Kubernetes Executor
>    
> <https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/kubernetes.html#>
> as
>    an option.
>    2. metadata storage
>    3. distributed lock (maybe we can partition scheduler/executor in the
>    future)
>
> Now 2/3 actually binds into the SQL-Compatible meta database backend
> requirements.
>
> If we can make these 3 dependencies pluggable, we can definitely use some
> k8s-compatible solution,
> like *ETCD*, which can undertake these 3 duties instead of bringing new
> external dependencies in the k8s environment.
>
> But I am indeed a freshman in the community, all these above are my
> immature thinking.
>
> welcome to correct me if wrong.
>
> I am willing to learn much more about architectural thinking in our
> community.
> Use case/motivation
>
>    1. further decoupling airflow from specific meta-database backend
>    implementation
>    2. brings ETCD as meta database backend/task queue, which may benefit
>    airflow cloud-native roadmap
>    3. make the scheduler pluggable

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to