VladaZakharova commented on code in PR #66342:
URL: https://github.com/apache/airflow/pull/66342#discussion_r3257007947


##########
providers/openlineage/src/airflow/providers/openlineage/token_provider.py:
##########
@@ -0,0 +1,126 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from typing import Any
+
+from airflow.providers.common.compat.sdk import AirflowException, BaseHook
+
+AIRFLOW_CONNECTION_API_KEY_AUTH_TYPE = "airflow_connection_api_key"
+OPENLINEAGE_CONFIG_EXTRA_KEY = "openlineage_config"
+_DEFAULT_EXTRA_KEYS = ("apiKey", "api_key", "apikey", "token", "access_token")
+
+
+class OpenLineageAirflowConnectionAuthError(AirflowException):
+    """Raised when OpenLineage API key auth cannot be resolved from an Airflow 
connection."""
+
+
+class OpenLineageAirflowConnectionConfigError(AirflowException):
+    """Raised when OpenLineage config cannot be resolved from an Airflow 
connection."""

Review Comment:
   I agree that a dedicated openlineage connection type would be nicer for 
users. I’m just not sure we should add it in this PR, because it feels like a 
separate feature from loading the config from a connection.
   
   For now I updated the docs and provider metadata to say that config_conn_id 
should point to a Generic Airflow connection. That at least gives users a clear 
choice instead of “pick any connection type”. We can add a proper OpenLineage 
connection type later in a separate PR. WDYT?



##########
providers/openlineage/src/airflow/providers/openlineage/token_provider.py:
##########
@@ -0,0 +1,126 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from typing import Any
+
+from airflow.providers.common.compat.sdk import AirflowException, BaseHook
+
+AIRFLOW_CONNECTION_API_KEY_AUTH_TYPE = "airflow_connection_api_key"
+OPENLINEAGE_CONFIG_EXTRA_KEY = "openlineage_config"
+_DEFAULT_EXTRA_KEYS = ("apiKey", "api_key", "apikey", "token", "access_token")
+
+
+class OpenLineageAirflowConnectionAuthError(AirflowException):
+    """Raised when OpenLineage API key auth cannot be resolved from an Airflow 
connection."""
+
+
+class OpenLineageAirflowConnectionConfigError(AirflowException):
+    """Raised when OpenLineage config cannot be resolved from an Airflow 
connection."""
+
+
+class AirflowConnectionConfigProvider:
+    """
+    Resolve OpenLineage client configuration from an Airflow connection.
+
+    The connection extra can contain the full OpenLineage client config, for 
example
+    ``{"transport": {"type": "console"}}``. For convenience, it can also 
contain only the transport
+    config, for example ``{"type": "console"}``.
+    """
+
+    def __init__(self, conn_id: str) -> None:
+        if not conn_id:
+            raise OpenLineageAirflowConnectionConfigError(
+                "OpenLineage connection config requires a non-empty connection 
ID."
+            )
+        self.conn_id = conn_id
+
+    def get_config(self) -> dict[str, Any]:
+        connection = BaseHook.get_connection(self.conn_id)
+        extra = connection.extra_dejson
+        config = self._get_config_from_extra(extra)
+        if config is not None:
+            return config
+
+        raise OpenLineageAirflowConnectionConfigError(
+            "OpenLineage connection config could not find configuration in 
connection "
+            f"`{self.conn_id}`. Expected full OpenLineage config or transport 
config in connection extra."
+        )
+
+    def _get_config_from_extra(self, extra: dict[str, Any]) -> dict[str, Any] 
| None:
+        if OPENLINEAGE_CONFIG_EXTRA_KEY in extra:
+            return self._validate_config(extra[OPENLINEAGE_CONFIG_EXTRA_KEY])
+
+        if "transport" in extra:
+            return self._validate_config(extra)
+
+        if "type" in extra:
+            return {"transport": extra}
+
+        return None
+
+    def _validate_config(self, config: Any) -> dict[str, Any]:

Review Comment:
   I thought about using OpenLineageClient(config=...) for this, but I think it 
would be a bit too heavy for validation here. It would create the 
client/transport once just to check the config, and then we would create it 
again later in the adapter.
   
   So for now I kept this check very small: the Airflow connection extra must 
be a JSON object with a transport object. The OpenLineage client still does the 
real transport/auth validation when it is created. If the OpenLineage client 
gets a dedicated validation method later, we can switch to that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to