samuelkhtu opened a new issue, #53333:
URL: https://github.com/apache/airflow/issues/53333

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.10.5
   
   ### What happened?
   
   In `airflow/providers/microsoft/azure/fs/adls.py`, the `get_fs()` function 
constructs a dictionary of `options` by pulling connection information from 
Airflow's connection system and then passes these options to 
`AzureBlobFileSystem(**options)`.
   
   By default, the function constructs `account_url` using 
`parse_blob_account_url(conn.host, conn.login)`, which assumes the Azure Blob 
endpoint will use the standard `core.windows.net` domain. While this works for 
default endpoints, it does not support scenarios where users want to override 
the domain — for example, when using a private endpoint like 
`.core.mydomain.io`.
   
   The root issue is:
   
   * `account_host` (the correct field expected by `adlfs.AzureBlobFileSystem`) 
is not included in the list of parsed fields.
   * Even if the user provides `account_host` in the connection extras, 
`get_fs()` ignores it and always constructs `account_url` using the hardcoded 
domain logic.
   * `AzureBlobFileSystem` does not support `account_url` as a constructor 
parameter, so the custom domain is never applied — silently falling back to the 
default.
   
   As a result, there is **no way for users to override the account URL** via 
Airflow connection configuration, even though `adlfs.AzureBlobFileSystem` 
supports this through its `account_host` parameter.
   
   This blocks use cases such as:
   
   * Custom domains
   * Private endpoints
   * Sovereign or air-gapped cloud regions
   
   This limitation exists even though the underlying library (`adlfs`) already 
supports the necessary parameter (`account_host`).
   
   
   ### What you think should happen instead?
   
   ### What you think should happen instead?
   
   The `get_fs()` function should support passing a user-defined `account_host` 
value from the Airflow connection extras directly to the `AzureBlobFileSystem` 
constructor.
   
   Specifically:
   
   * Add `"account_host"` to the list of fields extracted from `extras`.
   * If `account_host` is provided, it should be passed directly to 
`AzureBlobFileSystem` as a supported parameter.
   * Currently, `get_fs()` sets `account_url` using 
`parse_blob_account_url(...)`, but `account_url` is **not** a valid parameter 
for `AzureBlobFileSystem`. It can be removed or renamed to `account_host`.
   
   However, to maintain backward compatibility:
   
   * We could retain the existing `account_url` logic as a fallback.
   * But prefer `account_host` when it is explicitly defined in the extras.
   
   This would allow users to configure non-standard Azure Blob endpoints — such 
as custom domains or private links — via the standard Airflow connection 
mechanism, while maintaining compatibility with existing deployments.
   
   
   ### How to reproduce
   
   
   
   1. Create an Airflow `Connection` object with a custom domain in the `extra` 
field:
   
      ```python
      from airflow.models import Connection
      from airflow.providers.microsoft.azure.fs.adls import get_fs
   
      conn = Connection(
          conn_id="testconn",
          conn_type="wasb",
          login="testaccountname",
          password="p",
          host="testaccountID",
          extra={
              "account_name": "n",
              "tenant_id": "t",
              "account_host": 
"https://testaccountname.blob.core.customdomain.io";,
          },
      )
      # Insert or mock this connection in Airflow metadata
      ```
   
   2. Call `get_fs()` with this connection ID:
   
      ```python
      fs = get_fs("testconn")
      ```
   
   3. Observe that:
   
      * Despite `account_host` being set to a custom domain in extras, 
`get_fs()` ignores it.
      * The `options` passed to `adlfs.AzureBlobFileSystem` do **not** include 
`account_host`.
      * Instead, `get_fs()` builds and passes an `account_url` derived from the 
default domain based on `host` and `login`.
      * As a result, the custom domain override does not take effect.
   
   This confirms the current limitation that `account_host` cannot be used to 
override the default Azure Blob endpoint via Airflow’s connection system.
   
   
   ### Operating System
   
   MacOS 15.5
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-microsoft-azure==12.5.0
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   k8s
   
   ### Anything else?
   
   
   
   * This is a backward-compatible enhancement since adding support for 
`account_host` does not remove or change existing parameters.
   
   * Supporting `account_host` enables Airflow to better integrate with Azure 
environments using private endpoints, custom domains, or sovereign clouds.
   
   * The underlying `adlfs.AzureBlobFileSystem` already supports 
`account_host`, so this change leverages existing functionality.
   
   * Implementing this will improve user experience and reduce the need for 
workarounds or custom patches.
   
   * I want to submit a PR but would appreciate suggestions on the best 
approach.
   
   * My current thinking is to simply add `"account_host"` to the existing 
`fields` list in `get_fs()` so that this block picks it up automatically:
   
     ```python
     fields = [
         "account_name",
         "account_key",
         "sas_token",
         "tenant_id",
         "managed_identity_client_id",
         "workload_identity_client_id",
         "workload_identity_tenant_id",
         "anon",
         "account_host",  # <- add here
     ]
     for field in fields:
         value = get_field(conn_id=conn_id, conn_type=conn_type, extras=extras, 
field_name=field)
         if value is not None:
             if value == "":
                 options.pop(field, "")
             else:
                 options[field] = value
     ```
   
   * Would this be the preferred way, or are there alternative approaches to 
consider?
   
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to