[ https://issues.apache.org/jira/browse/HADOOP-19620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007723#comment-18007723 ]
Anuj Modi edited comment on HADOOP-19620 at 7/17/25 6:25 AM: ------------------------------------------------------------- services.AbfsClient (AbfsRestOperation.java:signRequest(565)) - Authenticating request with OAuth2 access token oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenUsingClientCreds(112)) - AADToken: starting to fetch token using client creds for client ID 3a700eba-b905-4b60-a303-1785e2348ed8 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 1 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 2 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 3 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 4 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 5 The exception which you shared above is thrown outside retry loop as follows Auth failure: HTTP Error -1; url='' AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException : login.microsoftonline.com org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1; url='https://login.microsoftonline.com/da654cf2-07d3-4fc4-a83f-d7c3372fe94a/oauth2/token' AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException : login.microsoftonline.com at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:410) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:323) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:289) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.measureDurationOfInvocation(IOStatisticsBinding.java:494) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:465) was (Author: JIRAUSER307456): services.AbfsClient (AbfsRestOperation.java:signRequest(565)) - Authenticating request with OAuth2 access token oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenUsingClientCreds(112)) - AADToken: starting to fetch token using client creds for client ID 3a700eba-b905-4b60-a303-1785e2348ed8 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 1 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 2 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 3 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 4 oauth2.AzureADAuthenticator (AzureADAuthenticator.java:getTokenCall(367)) - Retrying getTokenSingleCall. RetryCount = 5 The exception which you shared above is thrown outside retry loop as follows Auth failure: HTTP Error -1; url='https://login.microsoftonline.com/da654cf2-07d3-4fc4-a83f-d7c3372fe94a/oauth2/token' AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException : login.microsoftonline.com org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator$HttpException: HTTP Error -1; url='https://login.microsoftonline.com/da654cf2-07d3-4fc4-a83f-d7c3372fe94a/oauth2/token' AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException : login.microsoftonline.com at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:410) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:323) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:289) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.measureDurationOfInvocation(IOStatisticsBinding.java:494) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:465) > [ABFS] AzureADAuthenticator should be able to retry on UnknownHostException > --------------------------------------------------------------------------- > > Key: HADOOP-19620 > URL: https://issues.apache.org/jira/browse/HADOOP-19620 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure > Affects Versions: 3.4.1 > Reporter: Serhii Nesterov > Priority: Minor > > When Hadoop is requested to perform operations against ADLS Gen2 storage, > *AbfsRestOperation* attempts to obtain an access token from Microsoft. > Underneath the hood, it uses a simple *java.net.HttpURLConnection* HTTP > client. > Occasionally, environments may run into network intermittent issues, > including DNS-related {*}UnknownHostException{*}. Technically, the HTTP > client throws *IOException* whose cause is {*}UnknownHostException{*}. > *AzureADAuthenticator* in its turn catches {*}IOException{*}, sets *httperror > = -1* and then checks whether the error is recoverable and can be retried. > However, it's neither an instance of {*}MalformedURLException{*}, nor an > instance of {*}FileNotFoundException{*}, nor a recoverable status code ({*}< > 100 || == 408 || >= 500 && != 501 && != 505{*}), hence a retry never occurs > which is sensitive for our project causing problems with state recovery. > The final exception stack trace on the client side looks as follows (Apache > Spark application, tenant ID is redacted): > {code:java} > Job aborted due to stage failure: Task 14 in stage 384.0 failed 4 times, most > recent failure: Lost task 14.3 in stage 384.0 TID 3087 10.244.91.7 executor > 29 : Status code: -1 error code: null error message: Auth failure: HTTP Error > -1; url='https://login.microsoftonline.com/$TENANT_ID/oauth2/v2.0/token' > AzureADAuthenticator.getTokenCall threw java.net.UnknownHostException: > login.microsoftonline.com > at > org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation > AbfsRestOperation.java:321 > at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute > AbfsRestOperation.java:263 > at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0 > AbfsRestOperation.java:235 > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.measureDurationOfInvocation > IOStatisticsBinding.java:494 > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation > IOStatisticsBinding.java:465 > at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute > AbfsRestOperation.java:233 > at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus > AbfsClient.java:1099 > at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus > AzureBlobFileSystemStore.java:1164 > at org.apache.hadoop.fs.azurebfs.Azure BlobFileSystem.getFileStatus > AzureBlobFileSystem.java:766 > at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus > AzureBlobFileSystem.java:756 > at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath > HadoopInputFile.java:39 > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter > ParquetFooterReader.java:39 > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$lzycompute$1 > ParquetFileFormat.scala:211 > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.footerFileMetaData$1 > ParquetFile Format.scala:210 > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2 > ParquetFileFormat.scala:213 > ...{code} > I can see this exception is recovered in other parts of the Hadoop project > (e.g., {*}DefaultAMSProcessor{*}) > We would like to have similar retry mechanisms for fetching tokens. Moreover, > *AbfsRestOperation* already handles and retries *UnknownHostException* but > that part seems to be applicable only to storage communication, not token > retrieval. I suppose the solution would be simple - just match the cause's > class name of *IOException* if it is an instance of *UnknownHostException* > and apply retry policies as for other types of recoverable errors. > The link to the code where I believe *UnknownHostException* would be checked > for: > [https://github.com/apache/hadoop/blob/61096793f6368d16a21cde8b1c8f8dce41a4c102/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/oauth2/AzureADAuthenticator.java#L354] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org