This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 89f1737afb Make `graphviz` dependency optional (#36647)
89f1737afb is described below

commit 89f1737afb27f6e708c2e83e3d8e751d9a36f91e
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Sun Jan 7 17:02:32 2024 +0100

    Make `graphviz` dependency optional (#36647)
    
    The `graphviz` dependency has been problematic as Airflow required
    dependency - especially for ARM-based installations. Graphviz
    packages require binary graphviz libraries - which is already a
    limitation, but they also require to install graphviz Python
    bindings to be build and installed. This does not work for older
    Linux installation but - more importantly - when you try
    to install Graphviz libraries for Python 3.8, 3.9 for ARM M1
    MacBooks, the packages fail to install because Python bindings
    compilation for M1 can only work for Python 3.10+.
    
    There is not an easy solution for that except commenting out
    graphviz dependency from setup.py, when you want to install Airflow
    for Python 3.8, 3.9 for MacBook M1.
    
    However Graphviz is really used in two places:
    
    * when you want to render DAGs wia airflow CLI - either to an image
      or directly to terminal (for terminals/systems supporting imgcat)
    
    * when you want to render ER diagram after you modified Airflow
      models
    
    The latter is a development-only feature, the former is production
    feature, however it is a very niche one.
    
    This PR turns rendering of the images in Airflow in optional feature
    (only working when graphviz python bindings are installed) and
    effectively turns graphviz into an optional extra (and removes it
    from requirements).
    
    This is not a breaking change technically - the CLIs to render the
    DAGs is still there and IF you already have graphviz installed, it
    will continue working as it did before. The only problem when it
    does not work is where you do not have graphviz installed for
    fresh installation and it will raise an error and inform that you need it.
    
    Graphviz will remain to be installed for most users:
    
    * the Airflow Image will still contain graphviz library, because
      it is added there as extra
    * when previous version of Airflow has been installed already, then
      graphviz library is already installed there and Airflow will
      continue working as it did
    
    The only change will be a new installation of new version of Airflow
    from the scratch, where graphviz will need to be specified as extra
    or installed separately in order to enable DAG rendering option.
    
    Taking into account this behaviour (which only requires to install
    a graphviz package), this should not be considered as a breaking
    change.
    
    Extracted from: #36537
---
 CONTRIBUTING.rst                                  | 14 +++++++-------
 Dockerfile                                        |  2 +-
 INSTALL                                           | 14 +++++++-------
 airflow/utils/dot_renderer.py                     | 15 ++++++++++++++-
 dev/breeze/src/airflow_breeze/global_constants.py |  1 +
 docs/apache-airflow/extra-packages-ref.rst        |  2 ++
 docs/docker-stack/build-arg-ref.rst               |  1 +
 docs/spelling_wordlist.txt                        |  1 +
 images/breeze/output_prod-image_build.txt         |  2 +-
 newsfragments/36647.significant.rst               | 23 +++++++++++++++++++++++
 setup.cfg                                         |  1 -
 setup.py                                          |  3 +++
 12 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index 469f45c7ce..a9ffa69a38 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -856,13 +856,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, 
azure, cassandra, celery, cg
 cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, 
dbt.cloud,
 deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, 
doc, doc_gen, docker,
 druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, 
github_enterprise, google,
-google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, 
kerberos, kubernetes,
-ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, 
microsoft.winrm, mongo, mssql,
-mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, 
oracle, otel, pagerduty,
-pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, 
rabbitmq, redis, s3, s3fs,
-salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, 
smtp, snowflake,
-spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, 
virtualenv, weaviate,
-webhdfs, winrm, yandex, zendesk
+google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, 
jdbc, jenkins, kerberos,
+kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, 
microsoft.winrm, mongo,
+mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, 
opsgenie, oracle, otel,
+pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, 
presto, rabbitmq,
+redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, 
singularity, slack, smtp,
+snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, 
vertica, virtualenv,
+weaviate, webhdfs, winrm, yandex, zendesk
   .. END EXTRAS HERE
 
 Provider packages
diff --git a/Dockerfile b/Dockerfile
index c1ed35a61e..5468bcff35 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -35,7 +35,7 @@
 #                        much smaller.
 #
 # Use the same builder frontend version for everyone
-ARG 
AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv"
+ARG 
AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf.kubernetes,common.io,docker,elasticsearch,ftp,google,google_auth,graphviz,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,virtualenv"
 ARG ADDITIONAL_AIRFLOW_EXTRAS=""
 ARG ADDITIONAL_PYTHON_DEPS=""
 
diff --git a/INSTALL b/INSTALL
index 4b6da57ae8..e0778619dd 100644
--- a/INSTALL
+++ b/INSTALL
@@ -101,13 +101,13 @@ arangodb, asana, async, atlas, atlassian.jira, aws, 
azure, cassandra, celery, cg
 cncf.kubernetes, cohere, common.io, common.sql, crypto, databricks, datadog, 
dbt.cloud,
 deprecated_api, devel, devel_all, devel_ci, devel_hadoop, dingding, discord, 
doc, doc_gen, docker,
 druid, elasticsearch, exasol, fab, facebook, ftp, gcp, gcp_api, github, 
github_enterprise, google,
-google_auth, grpc, hashicorp, hdfs, hive, http, imap, influxdb, jdbc, jenkins, 
kerberos, kubernetes,
-ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, 
microsoft.winrm, mongo, mssql,
-mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, 
oracle, otel, pagerduty,
-pandas, papermill, password, pgvector, pinecone, pinot, postgres, presto, 
rabbitmq, redis, s3, s3fs,
-salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, 
smtp, snowflake,
-spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, vertica, 
virtualenv, weaviate,
-webhdfs, winrm, yandex, zendesk
+google_auth, graphviz, grpc, hashicorp, hdfs, hive, http, imap, influxdb, 
jdbc, jenkins, kerberos,
+kubernetes, ldap, leveldb, microsoft.azure, microsoft.mssql, microsoft.psrp, 
microsoft.winrm, mongo,
+mssql, mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, 
opsgenie, oracle, otel,
+pagerduty, pandas, papermill, password, pgvector, pinecone, pinot, postgres, 
presto, rabbitmq,
+redis, s3, s3fs, salesforce, samba, saml, segment, sendgrid, sentry, sftp, 
singularity, slack, smtp,
+snowflake, spark, sqlite, ssh, statsd, tableau, tabular, telegram, trino, 
vertica, virtualenv,
+weaviate, webhdfs, winrm, yandex, zendesk
 # END EXTRAS HERE
 
 # For installing Airflow in development environments - see CONTRIBUTING.rst
diff --git a/airflow/utils/dot_renderer.py b/airflow/utils/dot_renderer.py
index 41281fbbb1..4d44d1e2ec 100644
--- a/airflow/utils/dot_renderer.py
+++ b/airflow/utils/dot_renderer.py
@@ -19,9 +19,14 @@
 """Renderer DAG (tasks and dependencies) to the graphviz object."""
 from __future__ import annotations
 
+import warnings
 from typing import TYPE_CHECKING, Any
 
-import graphviz
+try:
+    import graphviz
+except ImportError:
+    warnings.warn("Could not import graphviz. Rendering graph to the graphical 
format will not be possible.")
+    graphviz = None
 
 from airflow.exceptions import AirflowException
 from airflow.models.baseoperator import BaseOperator
@@ -151,6 +156,10 @@ def render_dag_dependencies(deps: dict[str, 
list[DagDependency]]) -> graphviz.Di
     :param deps: List of DAG dependencies
     :return: Graphviz object
     """
+    if not graphviz:
+        raise AirflowException(
+            "Could not import graphviz. Install the graphviz python package to 
fix this error."
+        )
     dot = graphviz.Digraph(graph_attr={"rankdir": "LR"})
 
     for dag, dependencies in deps.items():
@@ -179,6 +188,10 @@ def render_dag(dag: DAG, tis: list[TaskInstance] | None = 
None) -> graphviz.Digr
     :param tis: List of task instances
     :return: Graphviz object
     """
+    if not graphviz:
+        raise AirflowException(
+            "Could not import graphviz. Install the graphviz python package to 
fix this error."
+        )
     dot = graphviz.Digraph(
         dag.dag_id,
         graph_attr={
diff --git a/dev/breeze/src/airflow_breeze/global_constants.py 
b/dev/breeze/src/airflow_breeze/global_constants.py
index 4f08a42fd4..5de443d2ec 100644
--- a/dev/breeze/src/airflow_breeze/global_constants.py
+++ b/dev/breeze/src/airflow_breeze/global_constants.py
@@ -432,6 +432,7 @@ DEFAULT_EXTRAS = [
     "ftp",
     "google",
     "google_auth",
+    "graphviz",
     "grpc",
     "hashicorp",
     "http",
diff --git a/docs/apache-airflow/extra-packages-ref.rst 
b/docs/apache-airflow/extra-packages-ref.rst
index 082595312e..5bab6fd7d9 100644
--- a/docs/apache-airflow/extra-packages-ref.rst
+++ b/docs/apache-airflow/extra-packages-ref.rst
@@ -52,6 +52,8 @@ python dependencies for the provided package.
 
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
 | google_auth         | ``pip install 'apache-airflow[google_auth]'``       | 
Google auth backend                                                        |
 
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
+| graphviz            | ``pip install 'apache-airflow[graphviz]'``          | 
Graphviz renderer for converting DAG to graphical output                   |
++---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
 | kerberos            | ``pip install 'apache-airflow[kerberos]'``          | 
Kerberos integration for Kerberized services (Hadoop, Presto, Trino)       |
 
+---------------------+-----------------------------------------------------+----------------------------------------------------------------------------+
 | ldap                | ``pip install 'apache-airflow[ldap]'``              | 
LDAP authentication for users                                              |
diff --git a/docs/docker-stack/build-arg-ref.rst 
b/docs/docker-stack/build-arg-ref.rst
index a07760558e..73c30a3892 100644
--- a/docs/docker-stack/build-arg-ref.rst
+++ b/docs/docker-stack/build-arg-ref.rst
@@ -91,6 +91,7 @@ List of default extras in the production Dockerfile:
 * ftp
 * google
 * google_auth
+* graphviz
 * grpc
 * hashicorp
 * http
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index c1fe295997..191c787751 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -687,6 +687,7 @@ googleapiclient
 GoogleDisplayVideo
 gpu
 gpus
+graphviz
 greenlet
 Groupalia
 groupId
diff --git a/images/breeze/output_prod-image_build.txt 
b/images/breeze/output_prod-image_build.txt
index b5499f9c68..ceeb24a418 100644
--- a/images/breeze/output_prod-image_build.txt
+++ b/images/breeze/output_prod-image_build.txt
@@ -1 +1 @@
-aa383b195f2991035d5333a6f37bebaa
+f7a753d66923772bfb6250d5b87d1f51
diff --git a/newsfragments/36647.significant.rst 
b/newsfragments/36647.significant.rst
new file mode 100644
index 0000000000..dc3f0faad8
--- /dev/null
+++ b/newsfragments/36647.significant.rst
@@ -0,0 +1,23 @@
+Graphviz dependency is now an optional one, not required one.
+
+The ``graphviz`` dependency has been problematic as Airflow required 
dependency - especially for
+ARM-based installations. Graphviz packages require binary graphviz libraries - 
which is already a
+limitation, but they also require to install graphviz Python bindings to be 
build and installed.
+This does not work for older Linux installation but - more importantly - when 
you try to install
+Graphviz libraries for Python 3.8, 3.9 for ARM M1 MacBooks, the packages fail 
to install because
+Python bindings compilation for M1 can only work for Python 3.10+.
+
+This is not a breaking change technically - the CLIs to render the DAGs is 
still there and IF you
+already have graphviz installed, it will continue working as it did before. 
The only problem when it
+does not work is where you do not have graphviz installed it will raise an 
error and inform that you need it.
+
+Graphviz will remain to be installed for most users:
+
+* the Airflow Image will still contain graphviz library, because
+  it is added there as extra
+* when previous version of Airflow has been installed already, then
+  graphviz library is already installed there and Airflow will
+  continue working as it did
+
+The only change will be a new installation of new version of Airflow from the 
scratch, where graphviz will
+need to be specified as extra or installed separately in order to enable DAG 
rendering option.
diff --git a/setup.cfg b/setup.cfg
index d10e0bb6d9..6d7bb58e6b 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -107,7 +107,6 @@ install_requires =
     flask-wtf>=0.15
     fsspec>=2023.10.0
     google-re2>=1.0
-    graphviz>=0.12
     gunicorn>=20.1.0
     httpx
     importlib_metadata>=1.7;python_version<"3.9"
diff --git a/setup.py b/setup.py
index 4b43a0add9..a5f29d694e 100644
--- a/setup.py
+++ b/setup.py
@@ -318,12 +318,14 @@ doc = [
 ]
 doc_gen = [
     "eralchemy2",
+    "graphviz>=0.12",
 ]
 flask_appbuilder_oauth = [
     "authlib>=1.0.0",
     # The version here should be upgraded at the same time as flask-appbuilder 
in setup.cfg
     "flask-appbuilder[oauth]==4.3.10",
 ]
+graphviz = ["graphviz>=0.12"]
 kerberos = [
     "pykerberos>=1.1.13",
     "requests_kerberos>=0.10.0",
@@ -593,6 +595,7 @@ CORE_EXTRAS_DEPENDENCIES: dict[str, list[str]] = {
     "deprecated_api": deprecated_api,
     "github_enterprise": flask_appbuilder_oauth,
     "google_auth": flask_appbuilder_oauth,
+    "graphviz": graphviz,
     "kerberos": kerberos,
     "ldap": ldap,
     "leveldb": leveldb,

Reply via email to