This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push: new 89f1737afb Make `graphviz` dependency optional (#36647) 89f1737afb is described below commit 89f1737afb27f6e708c2e83e3d8e751d9a36f91e Author: Jarek Potiuk <ja...@potiuk.com> AuthorDate: Sun Jan 7 17:02:32 2024 +0100 Make `graphviz` dependency optional (#36647) The `graphviz` dependency has been problematic as Airflow required dependency - especially for ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a limitation, but they also require to install graphviz Python bindings to be build and installed. This does not work for older Linux installation but - more importantly - when you try to install Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because Python bindings compilation for M1 can only work for Python 3.10+. There is not an easy solution for that except commenting out graphviz dependency from setup.py, when you want to install Airflow for Python 3.8, 3.9 for MacBook M1. However Graphviz is really used in two places: * when you want to render DAGs wia airflow CLI - either to an image or directly to terminal (for terminals/systems supporting imgcat) * when you want to render ER diagram after you modified Airflow models The latter is a development-only feature, the former is production feature, however it is a very niche one. This PR turns rendering of the images in Airflow in optional feature (only working when graphviz python bindings are installed) and effectively turns graphviz into an optional extra (and removes it from requirements). This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you already have graphviz installed, it will continue working as it did before. The only problem when it does not work is where you do not have graphviz installed for fresh installation and it will raise an error and inform that you need it. Graphviz will remain to be installed for most users: * the Airflow Image will still contain graphviz library, because it is added there as extra * when previous version of Airflow has been installed already, then graphviz library is already installed there and Airflow will continue working as it did The only change will be a new installation of new version of Airflow from the scratch, where graphviz will need to be specified as extra or installed separately in order to enable DAG rendering option. Taking into account this behaviour (which only requires to install a graphviz package), this should not be considered as a breaking change. Extracted from: #36537 --- CONTRIBUTING.rst | 14 +++++++------- Dockerfile | 2 +- INSTALL | 14 +++++++------- airflow/utils/dot_renderer.py | 15 ++++++++++++++- dev/breeze/src/airflow_breeze/global_constants.py | 1 + docs/apache-airflow/extra-packages-ref.rst | 2 ++ docs/docker-stack/build-arg-ref.rst | 1 + docs/spelling_wordlist.txt | 1 + images/breeze/output_prod-image_build.txt | 2 +- newsfragments/36647.significant.rst | 23 +++++++++++++++++++++++ setup.cfg | 1 - setup.py | 3 +++ 12 files changed, 61 insertions(+), 18 deletions(-) diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 469f45c7ce..a9ffa69a38 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -856,13 +856,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, azure, cassandra, celery, cg cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, dbt.cloud, deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc, doc_gen, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github_enterprise, google, -google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, -ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql, -mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, -pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, -salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, -spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, -webhdfs, winrm, yandex, zendesk +google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, +kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, +mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, +pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, +redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, +snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, +weaviate, webhdfs, winrm, yandex, zendesk .. END EXTRAS HERE Provider packages diff --git a/Dockerfile b/Dockerfile index c1ed35a61e..5468bcff35 100644 --- a/Dockerfile +++ b/Dockerfile @@ -35,7 +35,7 @@ # much smaller. # # Use the same builder frontend version for everyone -ARG AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv" +ARG AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,graphviz,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv" ARG ADDITIONAL_AIRFLOW_EXTRAS="" ARG ADDITIONAL_PYTHON_DEPS="" diff --git a/INSTALL b/INSTALL index 4b6da57ae8..e0778619dd 100644 --- a/INSTALL +++ b/INSTALL @@ -101,13 +101,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, azure, cassandra, celery, cg cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, dbt.cloud, deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, doc, doc_gen, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, github_enterprise, google, -google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, -ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mssql, -mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, -pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, -salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, -spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, -webhdfs, winrm, yandex, zendesk +google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, kerberos, +kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, +mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, otel, +pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, rabbitmq, +redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, +snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, virtualenv, +weaviate, webhdfs, winrm, yandex, zendesk # END EXTRAS HERE # For installing Airflow in development environments - see CONTRIBUTING.rst diff --git a/airflow/utils/dot_renderer.py b/airflow/utils/dot_renderer.py index 41281fbbb1..4d44d1e2ec 100644 --- a/airflow/utils/dot_renderer.py +++ b/airflow/utils/dot_renderer.py @@ -19,9 +19,14 @@ """Renderer DAG (tasks and dependencies) to the graphviz object.""" from __future__ import annotations +import warnings from typing import TYPE_CHECKING, Any -import graphviz +try: + import graphviz +except ImportError: + warnings.warn("Could not import graphviz. Rendering graph to the graphical format will not be possible.") + graphviz = None from airflow.exceptions import AirflowException from airflow.models.baseoperator import BaseOperator @@ -151,6 +156,10 @@ def render_dag_dependencies(deps: dict[str, list[DagDependency]]) -> graphviz.Di :param deps: List of DAG dependencies :return: Graphviz object """ + if not graphviz: + raise AirflowException( + "Could not import graphviz. Install the graphviz python package to fix this error." + ) dot = graphviz.Digraph(graph_attr={"rankdir": "LR"}) for dag, dependencies in deps.items(): @@ -179,6 +188,10 @@ def render_dag(dag: DAG, tis: list[TaskInstance] | None = None) -> graphviz.Digr :param tis: List of task instances :return: Graphviz object """ + if not graphviz: + raise AirflowException( + "Could not import graphviz. Install the graphviz python package to fix this error." + ) dot = graphviz.Digraph( dag.dag_id, graph_attr={ diff --git a/dev/breeze/src/airflow_breeze/global_constants.py b/dev/breeze/src/airflow_breeze/global_constants.py index 4f08a42fd4..5de443d2ec 100644 --- a/dev/breeze/src/airflow_breeze/global_constants.py +++ b/dev/breeze/src/airflow_breeze/global_constants.py @@ -432,6 +432,7 @@ DEFAULT_EXTRAS = [ "ftp", "google", "google_auth", + "graphviz", "grpc", "hashicorp", "http", diff --git a/docs/apache-airflow/extra-packages-ref.rst b/docs/apache-airflow/extra-packages-ref.rst index 082595312e..5bab6fd7d9 100644 --- a/docs/apache-airflow/extra-packages-ref.rst +++ b/docs/apache-airflow/extra-packages-ref.rst @@ -52,6 +52,8 @@ python dependencies for the provided package. +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+ | google_auth | ``pip install 'apache-airflow[google_auth]'`` | Google auth backend | +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+ +| graphviz | ``pip install 'apache-airflow[graphviz]'`` | Graphviz renderer for converting DAG to graphical output | ++---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+ | kerberos | ``pip install 'apache-airflow[kerberos]'`` | Kerberos integration for Kerberized services (Hadoop, Presto, Trino) | +---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+ | ldap | ``pip install 'apache-airflow[ldap]'`` | LDAP authentication for users | diff --git a/docs/docker-stack/build-arg-ref.rst b/docs/docker-stack/build-arg-ref.rst index a07760558e..73c30a3892 100644 --- a/docs/docker-stack/build-arg-ref.rst +++ b/docs/docker-stack/build-arg-ref.rst @@ -91,6 +91,7 @@ List of default extras in the production Dockerfile: * ftp * google * google_auth +* graphviz * grpc * hashicorp * http diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index c1fe295997..191c787751 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -687,6 +687,7 @@ googleapiclient GoogleDisplayVideo gpu gpus +graphviz greenlet Groupalia groupId diff --git a/images/breeze/output_prod-image_build.txt b/images/breeze/output_prod-image_build.txt index b5499f9c68..ceeb24a418 100644 --- a/images/breeze/output_prod-image_build.txt +++ b/images/breeze/output_prod-image_build.txt @@ -1 +1 @@ -aa383b195f2991035d5333a6f37bebaa +f7a753d66923772bfb6250d5b87d1f51 diff --git a/newsfragments/36647.significant.rst b/newsfragments/36647.significant.rst new file mode 100644 index 0000000000..dc3f0faad8 --- /dev/null +++ b/newsfragments/36647.significant.rst @@ -0,0 +1,23 @@ +Graphviz dependency is now an optional one, not required one. + +The ``graphviz`` dependency has been problematic as Airflow required dependency - especially for +ARM-based installations. Graphviz packages require binary graphviz libraries - which is already a +limitation, but they also require to install graphviz Python bindings to be build and installed. +This does not work for older Linux installation but - more importantly - when you try to install +Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail to install because +Python bindings compilation for M1 can only work for Python 3.10+. + +This is not a breaking change technically - the CLIs to render the DAGs is still there and IF you +already have graphviz installed, it will continue working as it did before. The only problem when it +does not work is where you do not have graphviz installed it will raise an error and inform that you need it. + +Graphviz will remain to be installed for most users: + +* the Airflow Image will still contain graphviz library, because + it is added there as extra +* when previous version of Airflow has been installed already, then + graphviz library is already installed there and Airflow will + continue working as it did + +The only change will be a new installation of new version of Airflow from the scratch, where graphviz will +need to be specified as extra or installed separately in order to enable DAG rendering option. diff --git a/setup.cfg b/setup.cfg index d10e0bb6d9..6d7bb58e6b 100644 --- a/setup.cfg +++ b/setup.cfg @@ -107,7 +107,6 @@ install_requires = flask-wtf>=0.15 fsspec>=2023.10.0 google-re2>=1.0 - graphviz>=0.12 gunicorn>=20.1.0 httpx importlib_metadata>=1.7;python_version<"3.9" diff --git a/setup.py b/setup.py index 4b43a0add9..a5f29d694e 100644 --- a/setup.py +++ b/setup.py @@ -318,12 +318,14 @@ doc = [ ] doc_gen = [ "eralchemy2", + "graphviz>=0.12", ] flask_appbuilder_oauth = [ "authlib>=1.0.0", # The version here should be upgraded at the same time as flask-appbuilder in setup.cfg "flask-appbuilder[oauth]==4.3.10", ] +graphviz = ["graphviz>=0.12"] kerberos = [ "pykerberos>=1.1.13", "requests_kerberos>=0.10.0", @@ -593,6 +595,7 @@ CORE_EXTRAS_DEPENDENCIES: dict[str, list[str]] = { "deprecated_api": deprecated_api, "github_enterprise": flask_appbuilder_oauth, "google_auth": flask_appbuilder_oauth, + "graphviz": graphviz, "kerberos": kerberos, "ldap": ldap, "leveldb": leveldb,