[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557682#comment-17557682
 ] 

Jason Sleight commented on SPARK-38934:
---------------------------------------

After continuing to see some errors in a few edge cases (even without env 
variables) I recently noticed that [the default provider 
list|https://github.com/apache/hadoop/blob/release-3.3.1-RC3/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L595]
 is:
 # TemporaryAWSCredentialsProvider
 # SimpleAWSCredentialsProvider
 # EnvironmentVariableCredentialsProvider
 # IAMInstanceCredentialsProvider

Thus in principle explicitly setting the provider to be 
TemporaryAWSCredentialsProvider is unnecessary since that is the first default. 
 Weirdly if I just leave the provider unspecified then my errors disappeared.   
I /think/ my spark session is using the TemporaryAWSCredentialsProvider from 
the default when unspecified but I'm actually not sure how to verify this since 
the spark ui is showing the provider as the entire default list.

Anyway, try not explicitly setting the provider and letting the default 
resolution pick TemporaryAWSCredentialsProvider.

> Provider TemporaryAWSCredentialsProvider has no credentials
> -----------------------------------------------------------
>
>                 Key: SPARK-38934
>                 URL: https://issues.apache.org/jira/browse/SPARK-38934
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.2.1
>            Reporter: Lily
>            Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a://<path>/*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a://<path>/<file>.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>       at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>       at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>       at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>       at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>       at scala.collection.immutable.Stream.map(Stream.scala:418)
>       at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>       at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>       at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>       at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>       at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>       at org.apache.spark.scheduler.Task.run(Task.scala:131)
>       at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>       at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>       at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
>       at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
>       at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5445)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6420)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6393)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5430)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5392)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5386)
>       at 
> com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$7(S3AFileSystem.java:2116)
>       at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:489)
>       at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
>       at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2107)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:1750)
>       at 
> org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:62)
>       at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>       ... 3 more {code}
> Would you explain why we are having this warning and tell us how we can 
> prevent  experiencing this issue again?
> Thank you in advance.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to