Joffreybvn commented on code in PR #36015:
URL: https://github.com/apache/airflow/pull/36015#discussion_r1413141467


##########
airflow/providers/common/sql/doc/adr/0002-return-common-data-structure-from-dbapihook-derived-hooks.md:
##########
@@ -0,0 +1,139 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# 2. Return common data structure from DBApiHook derived hooks
+
+Date: 2023-12-01
+
+## Status
+
+Accepted
+
+## Context
+
+Note: This ADR describes the decision made (but not recorded) when common.sql 
provider has been
+introduced in July 2022, but this ADR is recorded in December 2023 to make 
sure the decision is
+recorded.
+
+Before common.sql provider, we had a number of DBApi-derived Hooks which 
interfaced with Python
+DBAPI-compliant database and the format of data returned from these hooks were 
different and
+dependent on the implementation of the DBAPI interface as well as 
implementation of the Hooks and operators.
+We also had a lot of very similar operators that performed similar tasks - 
like Querying, Sensing the
+database, column check etc. This led to a lot of code duplication, and we 
decided that we need a common set
+of operators that can be used across all the DBAPI-compliant databases.
+
+Unfortunately there is no common standard for data returned by Python DBAPI 
interface. It's partially
+standardized by [PEP-0249](https://peps.python.org/pep-0249/), but the 
specification is not very
+strict and it allows for a lot of flexibility and interpretation. 
Consequently, the data returned
+by the DBAPI interface can contain tuples, named tuples, lists, dictionaries, 
or even custom objects and
+there are no guarantees that the data returned by the DBAPI interface is 
directly serializable.
+
+Not having a common standard format made it difficult to implement planned 
open-lineage column-level
+integration with the database Operators and - in later stage Hooks.
+
+## Decision
+
+We decided to introduce a common.sql provider that would contain a set of 
operators that can be used
+across all the DBAPI-compliant databases. We also decided that the return data 
structure from the
+operators should be consistent across all the operators. For simplicity of 
transition we chose the format
+returned by the `run` method of the ``DBApiHook`` class that was very close to 
what most of the
+DBAPI-compliant Hooks already returned, even if it was not the most optimal 
format. In this case
+backwards compatibility trumped the optimal format.
+
+The decision has been made that more optimal formats (possibly some form of 
DataFrames) might be
+introduced in the future if we find a need for that. However, this is likely 
not even needed in the
+future because Pandas and similar libraries already have excellent support for 
converting many of
+the formats returned by the DBAPI interface to DataFrames and we are already 
leveraging those
+by directly using Pandas functionality via `get_pands_df` method of the 
``DBApiHook`` class
+
+The goal of the change was to standardize the format of the data returned that 
could be
+directly used through existing DBApi Airflow operators, with minimal  
backwards-compatibility problems,
+and resulting in deprecating and redirecting all the existing DBApi operators 
to the new operators
+defined in the common.sql provider.
+
+Therefore, the format of data returned by the Hook can be one of:
+
+* base return value is a list of tuples where each tuple is a row returned by 
the query. The tuples

Review Comment:
   Rules here will apply to all Hooks based on the DBApiHook, right ?
   
   For the rest, less, simpler and more 'standard' hooks and operators make 
them easier to maintain and extend for the maintainers, and their behavior 
become more predictable for the dag developer. Thus, yes, agreeing !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to