KirillTsyganov commented on code in PR #46837:
URL: https://github.com/apache/arrow/pull/46837#discussion_r2155860652
##########
python/pyarrow/_azurefs.pyx:
##########
@@ -66,6 +66,15 @@ cdef class AzureFileSystem(FileSystem):
SAS token for the storage account, used as an alternative to
account_key. If sas_token
and account_key are None the default credential will be used. The
parameters
account_key and sas_token are mutually exclusive.
+ tenant_id : str, default None
+ Tenant ID for Azure Active Directory authentication. Must be provided
together with
+ `client_id` and `client_secret` to use ClientSecretCredential.
+ client_id : str, default None
+ Client ID for Azure Active Directory authentication. Must be provided
together with
+ `tenant_id` and `client_secret` to use ClientSecretCredential.
Review Comment:
Hi,
I can definitely can do that, this would future proof `pyarrow`, but for my
immediate need, that's not required because `ManagedIdentityCredential`
requires `client_id` of **user assigned identity**, which we can't make, but
this is just our internal policy.
## General notes, just fyi
Azure has a few different ways to authentic to it's resources. I'm primarily
concerned about AzureML (AML) service, which includes things like ACR, VMs,
keyvault and storage account as core services. Note that VM's are managed by
AzureML service and are not visible outside of AzureML. Also note that AzureML
"buckets" things into workspace, meaning two different workspace will have
different VM's and storage accounts at least.
Just in case, but I find this [diagram for credential
flow](https://learn.microsoft.com/en-au/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview)
useful.
Our internal setup is a little convoluted, but as far as I understand when
working interactively i.e on ComputeInstance (AzureML compute type)
```
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
```
will resolve to managed identity (system assigned for us) of the workspace,
which has access to storage account. But when I'm running the same script
non-iteractively i.e as a AML job `DefaultAzureCredential` will resolve to
managed identity of the VM (ComputeCluster), which for us has no access to
storage account.
We have a separate Service Principle (SP) whose identity we allowed access
to storage account, hence why adding `ClientSecretCredential` which is how you
create credential for SP.
as an aside, more than happy to try and stay involved with pyarrow and Azure
related things
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]