KirillTsyganov commented on code in PR #46837:
URL: https://github.com/apache/arrow/pull/46837#discussion_r2155860652


##########
python/pyarrow/_azurefs.pyx:
##########
@@ -66,6 +66,15 @@ cdef class AzureFileSystem(FileSystem):
         SAS token for the storage account, used as an alternative to 
account_key. If sas_token
         and account_key are None the default credential will be used. The 
parameters 
         account_key and sas_token are mutually exclusive.
+    tenant_id : str, default None
+        Tenant ID for Azure Active Directory authentication. Must be provided 
together with
+        `client_id` and `client_secret` to use ClientSecretCredential.
+    client_id : str, default None
+        Client ID for Azure Active Directory authentication. Must be provided 
together with
+        `tenant_id` and `client_secret` to use ClientSecretCredential.

Review Comment:
   Hi, 
   
   I can definitely do that, this would future proof `pyarrow`, but for my 
immediate need, that's not required because `ManagedIdentityCredential` 
requires `client_id` of **user assigned identity**, which we can't make, but 
this is just our internal policy. 
   
   ## General notes, just fyi
   
   Azure has a few different ways to authentic to it's resources. I'm primarily 
concerned about AzureML (AML) service, which includes things like ACR, VMs, 
keyvault and storage account as core services. Note that VM's are managed by 
AzureML service and are not visible outside of AzureML. Also note that AzureML 
"buckets" things into workspace, meaning two different workspace will have 
different VM's and storage accounts at least.
   
   Just in case, but I find this [diagram for credential 
flow](https://learn.microsoft.com/en-au/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview)
 useful.
   
   Our internal setup is a little convoluted, but as far as I understand when 
working interactively i.e on ComputeInstance (AzureML compute type)
   
   ```
   from azure.identity import DefaultAzureCredential
   
   credential = DefaultAzureCredential()
   ```
   
   will resolve to managed identity (system assigned for us) of the workspace, 
which has access to storage account. But when I'm running the same script 
non-iteractively i.e as a AML job `DefaultAzureCredential` will resolve to 
managed identity of the VM (ComputeCluster), which for us has no access to 
storage account. 
   
   We have a separate Service Principle (SP) whose identity we allowed access 
to storage account,  hence why adding `ClientSecretCredential` which is how you 
create credential for SP.
   
   as an aside, more than happy to try and stay involved with pyarrow and Azure 
related things



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to