[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-10-24 Thread Lily Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623521#comment-17623521
 ] 

Lily Kim commented on SPARK-38934:
--

Great! {color:#FF}*Setting fs.s3a.bucket.probe to 2*{color} resolved our 
issue! Thank you very much!

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-10-21 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622378#comment-17622378
 ] 

Steve Loughran commented on SPARK-38934:


sounds like there is a race condition, which surfaces now startup probes for 
bucket existence are skipped for performance reasons.

set fs.s3a.bucket.probe to 2 to reinstate that probe

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585275#comment-17585275
 ] 

Steve Loughran commented on SPARK-38934:


[~graceee318] try explicitly setting the aws secrets as a per-bucket option, If 
the bucket is called  "my-bucket", set them in your config into these options

{code}

spark.hadoop.fs.s3a.bucket.my-bucket.access.key
spark.hadoop.fs.s3a.bucket.my-bucket.secret.key
spark.hadoop.fs.s3a.bucket.my-bucket.session.token

{code}

This will stop any env var propagation from interfering with the values for the 
bucket

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585268#comment-17585268
 ] 

Steve Loughran commented on SPARK-38934:


staring at this some more, as there's enough occurrences of this with spark 
alone that maybe there is a problem, not with the s3a code, but how spark 
passed configurations around, including env var adoption.

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-01 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573653#comment-17573653
 ] 

Steve Loughran commented on SPARK-38934:


bq. our system set the provider as WebIdentityTokenCredentialsProvider as a 
default, I had to explicitly set as TemporaryAWSCredentialsProvider.

if TemporaryAWSCredentialsProvider had a bug then it would be a JIRA


> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-08-01 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573652#comment-17573652
 ] 

Steve Loughran commented on SPARK-38934:


because its your deployment setup, not anybody's code

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-07-29 Thread Lily Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573193#comment-17573193
 ] 

Lily Kim commented on SPARK-38934:
--

Hey. It hasn't resolved yet. Why did you close it?

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-07-29 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573138#comment-17573138
 ] 

Steve Loughran commented on SPARK-38934:


so that's a config problem? not a bug? closing

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-06-26 Thread Lily Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17558967#comment-17558967
 ] 

Lily Kim commented on SPARK-38934:
--

Since our system set the provider as WebIdentityTokenCredentialsProvider as a 
default, I had to explicitly set as TemporaryAWSCredentialsProvider.

Otherwise, AWS SDK returns Access Denied Error.

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-06-22 Thread Jason Sleight (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557682#comment-17557682
 ] 

Jason Sleight commented on SPARK-38934:
---

After continuing to see some errors in a few edge cases (even without env 
variables) I recently noticed that [the default provider 
list|https://github.com/apache/hadoop/blob/release-3.3.1-RC3/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L595]
 is:
 # TemporaryAWSCredentialsProvider
 # SimpleAWSCredentialsProvider
 # EnvironmentVariableCredentialsProvider
 # IAMInstanceCredentialsProvider

Thus in principle explicitly setting the provider to be 
TemporaryAWSCredentialsProvider is unnecessary since that is the first default. 
 Weirdly if I just leave the provider unspecified then my errors disappeared.   
I /think/ my spark session is using the TemporaryAWSCredentialsProvider from 
the default when unspecified but I'm actually not sure how to verify this since 
the spark ui is showing the provider as the entire default list.

Anyway, try not explicitly setting the provider and letting the default 
resolution pick TemporaryAWSCredentialsProvider.

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-05-23 Thread Lily Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540848#comment-17540848
 ] 

Lily Kim commented on SPARK-38934:
--

Unfortunately, mine only sets temporary AWS credentials on spark config (don't 
set AWS credential environment variables).

But I appreciate your reply on my ticket.

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
> 

[jira] [Commented] (SPARK-38934) Provider TemporaryAWSCredentialsProvider has no credentials

2022-05-19 Thread Jason Sleight (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539699#comment-17539699
 ] 

Jason Sleight commented on SPARK-38934:
---

I had a similar issue and found a workaround in HADOOP-18233.  May or may not 
help you.

> Provider TemporaryAWSCredentialsProvider has no credentials
> ---
>
> Key: SPARK-38934
> URL: https://issues.apache.org/jira/browse/SPARK-38934
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.2.1
>Reporter: Lily
>Priority: Major
>
>  
> We are using Jupyter Hub on K8s as a notebook based development environment 
> and Spark on K8s as a backend cluster of Jupyter Hub on K8s with Spark 3.2.1 
> and Hadoop 3.3.1.
> When we run a code like the one below in the Jupyter Hub on K8s,
>  
> {code:java}
> val perm = ... // get AWS temporary credential by AWS STS from AWS assumed 
> role
> // set AWS temporary credential
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider", 
> "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", 
> perm.credential.accessKeyID)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", 
> perm.credential.secretAccessKey)
> spark.sparkContext.hadoopConfiguration.set("fs.s3a.session.token", 
> perm.credential.sessionToken)
> // execute simple Spark action
> spark.read.format("parquet").load("s3a:///*").show(1) {code}
>  
>  
> the first few executors left a warning like the one below in the first code 
> execution, but we were able to get the proper result thanks to Spark task 
> retry function. 
> {code:java}
> 22/04/18 09:13:50 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2) 
> (10.197.5.15 executor 1): java.nio.file.AccessDeniedException: 
> s3a:///.parquet: 
> org.apache.hadoop.fs.s3a.CredentialInitializationException: Provider 
> TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:206)
>   at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2810)
>   at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:225)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:131)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.s3a.CredentialInitializationException: 
> Provider TemporaryAWSCredentialsProvider has no credentials
>   at 
> org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider.getCredentials(AbstractSessionCredentialsProvider.java:130)
>   at 
> org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:177)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1266)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:842)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:792)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
>   at 
>