This is an automated email from the ASF dual-hosted git repository. eladkal pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push: new ccbd536d28f Added ADR document describing why the notion of dialects is introduced (#45456) ccbd536d28f is described below commit ccbd536d28fa6d5285d59b50915d832e02ba141d Author: David Blain <i...@dabla.be> AuthorDate: Fri Jan 24 09:45:27 2025 +0100 Added ADR document describing why the notion of dialects is introduced (#45456) * docs: Added a markdown ADR document describing why the notion of dialects was introduced in the common sql provider * docs: Added reference to the dialects in the Airflow common sql provider documentation * docs: Reformatted the ADR * refactor: Added dialects reference in mssql and postgres provider * refactor: Duplicated dialects.rst to mssql and postgres provider until we sort out how to redirect to the common sql provider one --------- Co-authored-by: David Blain <david.bl...@infrabel.be> --- .../dialects.rst | 55 +++++++++++++++++++ docs/apache-airflow-providers-common-sql/index.rst | 1 + .../dialects.rst | 55 +++++++++++++++++++ .../index.rst | 1 + .../apache-airflow-providers-postgres/dialects.rst | 55 +++++++++++++++++++ docs/apache-airflow-providers-postgres/index.rst | 1 + ...03-introduce-notion-of-dialects-in-dbapihook.md | 61 ++++++++++++++++++++++ 7 files changed, 229 insertions(+) diff --git a/docs/apache-airflow-providers-common-sql/dialects.rst b/docs/apache-airflow-providers-common-sql/dialects.rst new file mode 100644 index 00000000000..22feedc9454 --- /dev/null +++ b/docs/apache-airflow-providers-common-sql/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-common-sql/index.rst b/docs/apache-airflow-providers-common-sql/index.rst index b24ee78510c..6659b1affbc 100644 --- a/docs/apache-airflow-providers-common-sql/index.rst +++ b/docs/apache-airflow-providers-common-sql/index.rst @@ -43,6 +43,7 @@ Python API <_api/airflow/providers/common/sql/index> Supported Database Types </supported-database-types> + Dialects <dialects> .. toctree:: :hidden: diff --git a/docs/apache-airflow-providers-microsoft-mssql/dialects.rst b/docs/apache-airflow-providers-microsoft-mssql/dialects.rst new file mode 100644 index 00000000000..22feedc9454 --- /dev/null +++ b/docs/apache-airflow-providers-microsoft-mssql/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-microsoft-mssql/index.rst b/docs/apache-airflow-providers-microsoft-mssql/index.rst index 62089e86545..b574f05bef9 100644 --- a/docs/apache-airflow-providers-microsoft-mssql/index.rst +++ b/docs/apache-airflow-providers-microsoft-mssql/index.rst @@ -43,6 +43,7 @@ :caption: References Python API <_api/airflow/providers/microsoft/mssql/index> + Dialects <dialects> .. toctree:: :hidden: diff --git a/docs/apache-airflow-providers-postgres/dialects.rst b/docs/apache-airflow-providers-postgres/dialects.rst new file mode 100644 index 00000000000..22feedc9454 --- /dev/null +++ b/docs/apache-airflow-providers-postgres/dialects.rst @@ -0,0 +1,55 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +SQL Dialects +============= + +The :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` offers an abstraction layer between the +:class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` implementation and the database. For some database multiple +connection types are available, like native, ODBC and or JDBC. As the :class:`~airflow.providers.odbc.hooks.odbc.OdbcHook` +and the :class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook` are generic hooks which allows you to interact with any +database that has a driver for it, it needed an abstraction layer which allows us to run specialized queries +depending of the database to which we connect and that's why dialects where introduced. + +The default :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` class has following operations +available which underneath use SQLAlchemy to execute, but can be overloaded with specialized implementations +per database: + +- ``placeholder`` specifies the database specific placeholder used in prepared statements (default: ``%s``); +- ``inspector`` returns the SQLAlchemy inspector which allows us to retrieve database metadata; +- ``extract_schema_from_table`` allows us to extract the schema name from a string. +- ``get_column_names`` returns the column names for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_primary_keys`` returns the primary keys for the given table and schema (optional) using the SQLAlchemy inspector. +- ``get_target_fields`` returns the columns names that aren't identity or auto incremented columns, this will be used by the insert_rows method of the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook` if the target_fields parameter wasn't specified and the Airflow property ``core.dbapihook_resolve_target_fields`` is set to True (default: False). +- ``reserved_words`` returns the reserved words in SQL for the target database using the SQLAlchemy inspector. +- ``generate_insert_sql`` generates the insert SQL statement for the target database. +- ``generate_replace_sql`` generates the upsert SQL statement for the target database. + +At the moment there are only 3 dialects available: + +- ``default`` :class:`~airflow.providers.common.sql.dialects.dialect.Dialect` reuses the generic functionality that was already available in the :class:`~airflow.providers.common.sql.hooks.sql.DbApiHook`; +- ``mssql`` :class:`~airflow.providers.microsoft.mssql.dialects.mssql.MsSqlDialect` specialized for Microsoft SQL Server; +- ``postgresql`` :class:`~airflow.providers.postgres.dialects.postgres.PostgresDialect` specialized for PostgreSQL; + +The dialect to be used will be derived from the connection string, which sometimes won't be possible. There is always +the possibility to specify the dialect name through the extra options of the connection: + +.. code-block:: + + dialect_name: 'mssql' + +If a specific dialect isn't available for a database, the default one will be used, same when a non-existing dialect name is specified. diff --git a/docs/apache-airflow-providers-postgres/index.rst b/docs/apache-airflow-providers-postgres/index.rst index 0979b194b18..b4d698ceb72 100644 --- a/docs/apache-airflow-providers-postgres/index.rst +++ b/docs/apache-airflow-providers-postgres/index.rst @@ -43,6 +43,7 @@ :caption: References Python API <_api/airflow/providers/postgres/index> + Dialects <dialects> .. toctree:: :hidden: diff --git a/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md b/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md new file mode 100644 index 00000000000..09df3145e4e --- /dev/null +++ b/providers/src/airflow/providers/common/sql/doc/adr/0003-introduce-notion-of-dialects-in-dbapihook.md @@ -0,0 +1,61 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + --> + +# 3. Introduce notion of dialects in DbApiHook + +Date: 2025-01-07 + +## Status + +Accepted + +## Context + +This ADR describes the proposition why we wanted to introduce dialects in the ``DBAPIHook`` as we experienced +that the ``_insert_statement_format`` and ``_replace_statement_format`` string formatting properties used by the +``insert_rows`` method in the ``DbApiHook`` where lacking in some cases as the number of parameters passed to the +string format are hard-coded and aren't always sufficient when using different database through the +generic JBDC and ODBC connection types. + +That's why we wanted a generic approach in which the code isn't tied to a specific database hook. + +For example when using MsSQL through ODBC instead of the native ``MsSqlHook``, you won't have the merge into +(e.g. replace) functionality for MSSQL when using the ODBC connection type as that one was only available in +the native ``MsSqlHook``. + +That's where the notion of dialects come into play and allow us to benefit of the same functionalities +independently of which connection type you want to use (ODBC/JDBC or native if available) for a specific +database. + + +## Decision + +We decided the introduce the notion of dialects which allows us to implement database specific functionalities +independently of the used connection type (e.g. hook). That way when using for example the ``insert_rows`` method on +the ``DbApiHook`` for as well ODBC as JDBC as native connection types, it will always be possible to use the replace +into (e.g. merge into) functionality as that won't be tied to a specific implementation with a Hook an thus the +connection type. + + +## Consequences + +The consequence of this decision is that from now on database specific implementations should be done within the +dialect for that database instead of the specialized hook, unless the connection type is tied to the hook, +meaning that there is only one connection type possible and an ODBC/JDBC and in the future maybe even ADBC +(e.g. Apache Arrow) isn't available.