[ https://issues.apache.org/jira/browse/HADOOP-18054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464894#comment-17464894 ]
Akira Ajisaka commented on HADOOP-18054: ---------------------------------------- JIRA is not for end-user questions. Please use u...@hadoop.apache.org or consulting with AWS customer support. https://hadoop.apache.org/mailing_lists.html > Unable to load AWS credentials from any provider in the chain > ------------------------------------------------------------- > > Key: HADOOP-18054 > URL: https://issues.apache.org/jira/browse/HADOOP-18054 > Project: Hadoop Common > Issue Type: Bug > Components: auth, fs, fs/s3, security > Affects Versions: 3.3.1 > Environment: From top to down. > Kubernetes version 1.18.20 > Spark Version: 2.4.4 > Kubernetes Setup: Pod with serviceAccountName that binds with IAM Role using > IRSA (EKS Feature). > {code:java} > apiVersion: v1 > automountServiceAccountToken: true > kind: ServiceAccount > metadata: > annotations: > eks.amazonaws.com/role-arn: > arn:aws:iam::999999999999:role/EKSDefaultPolicyFor-Spark > name: spark > namespace: spark {code} > AWS Setup: > IAM Role with permissions over the S3 Bucket > Bucket with permissions granted over the IAM Role. > Code: > {code:java} > def run_etl(): > sc = > SparkSession.builder.appName("TXD-PYSPARK-ORACLE-SIEBEL-CASOS").getOrCreate() > sqlContext = SQLContext(sc) > args = sys.argv > load_date = args[1] # Ej: "2019-05-21" > output_path = args[2] # Ej: s3://mybucket/myfolder > print(args, "load_date", load_date, "output_path", output_path) > sc._jsc.hadoopConfiguration().set( > "fs.s3a.aws.credentials.provider", > "com.amazonaws.auth.DefaultAWSCredentialsProviderChain" > ) > sc._jsc.hadoopConfiguration().set("com.amazonaws.services.s3.enableV4", > "true") > sc._jsc.hadoopConfiguration().set("fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") > # sc._jsc.hadoopConfiguration().set("fs.s3.impl", > "org.apache.hadoop.fs.s3native.NativeS3FileSystem") > sc._jsc.hadoopConfiguration().set("fs.AbstractFileSystem.s3a.impl", > "org.apache.hadoop.fs.s3a.S3A") > session = boto3.session.Session() > client = session.client(service_name='secretsmanager', > region_name="us-east-1") > get_secret_value_response = client.get_secret_value( > SecretId="Siebel_Connection_Info" > ) > secret = get_secret_value_response["SecretString"] > secret = json.loads(secret) > db_username = secret.get("db_username") > db_password = secret.get("db_password") > db_host = secret.get("db_host") > db_port = secret.get("db_port") > db_name = secret.get("db_name") > db_url = "jdbc:oracle:thin:@{}:{}/{}".format(db_host, db_port, db_name) > jdbc_driver_name = "oracle.jdbc.OracleDriver" > dbtable = """(SELECT * FROM SIEBEL.REPORTE_DE_CASOS WHERE JOB_ID IN > (SELECT JOB_ID FROM SIEBEL.SERVICE_CONSUMED_STATUS WHERE > PUBLISH_INFORMATION_DT BETWEEN TO_DATE('{} 00:00:00', 'YYYY-MM-DD > HH24:MI:SS') AND TO_DATE('{} 23:59:59', 'YYYY-MM-DD > HH24:MI:SS')))""".format(load_date, load_date) > df = sqlContext.read\ > .format("jdbc")\ > .option("charset", "utf8")\ > .option("driver", jdbc_driver_name)\ > .option("url",db_url)\ > .option("dbtable", dbtable)\ > .option("user", db_username)\ > .option("password", db_password)\ > .option("oracle.jdbc.timezoneAsRegion", "false")\ > .load() > # Particionado > a_load_date = load_date.split('-') > df = df.withColumn("year", lit(a_load_date[0])) > df = df.withColumn("month", lit(a_load_date[1])) > df = df.withColumn("day", lit(a_load_date[2])) > df.write.mode("append").partitionBy(["year", "month", > "day"]).csv(output_path, header=True) > # Es importante cerrar la conexion para evitar problemas como el > reportado en > # > https://stackoverflow.com/questions/40830638/cannot-load-main-class-from-jar-file > sc.stop() > if __name__ == '__main__': > run_etl() {code} > Log's > {code:java} > + '[' -z s3://mybucket.spark.jobs/siebel-casos-actividades ']' > + aws s3 cp s3://mybucket.spark.jobs/siebel-casos-actividades /opt/ > --recursive --include '*' > download: > s3://mybucket.spark.jobs/siebel-casos-actividades/txd-pyspark-siebel-casos.py > to ../../txd-pyspark-siebel-casos.py > download: > s3://mybucket.spark.jobs/siebel-casos-actividades/txd-pyspark-siebel-actividades.py > to ../../txd-pyspark-siebel-actividades.py > download: s3://mybucket.jobs/siebel-casos-actividades/hadoop-aws-3.3.1.jar to > ../../hadoop-aws-3.3.1.jar > download: s3://mybucket.spark.jobs/siebel-casos-actividades/ojdbc8.jar to > ../../ojdbc8.jar > download: > s3://mybucket.spark.jobs/siebel-casos-actividades/aws-java-sdk-bundle-1.11.901.jar > to ../../aws-java-sdk-bundle-1.11.901.jar > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + set -e > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + SPARK_K8S_CMD=driver-py > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '2021-12-18 s3a://mybucket.raw/siebel/casos/' ']' > + PYSPARK_ARGS='2021-12-18 s3a://mybucket.raw/siebel/casos/' > + R_ARGS= > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.6.9' > + export PYTHON_VERSION=3.6.9 > + PYTHON_VERSION=3.6.9 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@" $PYSPARK_PRIMARY $PYSPARK_ARGS) > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=0.0.0.0/0 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > file:/opt/txd-pyspark-siebel-casos.py 2021-12-18 > s3a://mybucket.raw/siebel/casos/ > 21/12/21 18:37:43 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 21/12/21 18:37:45 INFO SparkContext: Running Spark version 2.4.4 > 21/12/21 18:37:45 INFO SparkContext: Submitted application: > TXD-PYSPARK-ORACLE-SIEBEL-CASOS > 21/12/21 18:37:45 INFO SecurityManager: Changing view acls to: root > 21/12/21 18:37:45 INFO SecurityManager: Changing modify acls to: root > 21/12/21 18:37:45 INFO SecurityManager: Changing view acls groups to: > 21/12/21 18:37:45 INFO SecurityManager: Changing modify acls groups to: > 21/12/21 18:37:45 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 21/12/21 18:37:45 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 21/12/21 18:37:45 INFO SparkEnv: Registering MapOutputTracker > 21/12/21 18:37:45 INFO SparkEnv: Registering BlockManagerMaster > 21/12/21 18:37:45 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 21/12/21 18:37:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint > up > 21/12/21 18:37:45 INFO DiskBlockManager: Created local directory at > /var/data/spark-458585a1-50f9-45c6-a4cf-d552c04a97dc/blockmgr-6c240735-3731-487a-a592-5c9a4d687020 > 21/12/21 18:37:45 INFO MemoryStore: MemoryStore started with capacity 413.9 MB > 21/12/21 18:37:45 INFO SparkEnv: Registering OutputCommitCoordinator > 21/12/21 18:37:46 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 21/12/21 18:37:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://spark-siebel-casos-1640111855179-driver-svc.spark.svc:4040 > 21/12/21 18:37:46 INFO SparkContext: Added JAR file:///opt/ojdbc8.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/ojdbc8.jar > with timestamp 1640111866249 > 21/12/21 18:37:46 INFO SparkContext: Added JAR > file:///opt/aws-java-sdk-bundle-1.11.901.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/aws-java-sdk-bundle-1.11.901.jar > with timestamp 1640111866249 > 21/12/21 18:37:46 INFO SparkContext: Added JAR > file:///opt/hadoop-aws-3.3.1.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/hadoop-aws-3.3.1.jar > with timestamp 1640111866249 > 21/12/21 18:37:46 INFO SparkContext: Added file > file:///opt/txd-pyspark-siebel-casos.py at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/files/txd-pyspark-siebel-casos.py > with timestamp 1640111866266 > 21/12/21 18:37:46 INFO Utils: Copying /opt/txd-pyspark-siebel-casos.py to > /var/data/spark-458585a1-50f9-45c6-a4cf-d552c04a97dc/spark-f99cee68-d203-4a2a-8335-9743eeac5350/userFiles-32cbe539-22db-4547-8d22-d98f85354418/txd-pyspark-siebel-casos.py > 21/12/21 18:37:48 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 21/12/21 18:37:48 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079. > 21/12/21 18:37:48 INFO NettyBlockTransferService: Server created on > spark-siebel-casos-1640111855179-driver-svc.spark.svc:7079 > 21/12/21 18:37:48 INFO BlockManager: Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy > 21/12/21 18:37:48 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, spark-siebel-casos-1640111855179-driver-svc.spark.svc, > 7079, None) > 21/12/21 18:37:48 INFO BlockManagerMasterEndpoint: Registering block manager > spark-siebel-casos-1640111855179-driver-svc.spark.svc:7079 with 413.9 MB RAM, > BlockManagerId(driver, spark-siebel-casos-1640111855179-driver-svc.spark.svc, > 7079, None) > 21/12/21 18:37:48 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, spark-siebel-casos-1640111855179-driver-svc.spark.svc, > 7079, None) > 21/12/21 18:37:48 INFO BlockManager: Initialized BlockManager: > BlockManagerId(driver, spark-siebel-casos-1640111855179-driver-svc.spark.svc, > 7079, None) > 21/12/21 18:37:53 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.3.170.156:58300) > with ID 1 > 21/12/21 18:37:53 INFO BlockManagerMasterEndpoint: Registering block manager > 10.3.170.156:34671 with 413.9 MB RAM, BlockManagerId(1, 10.3.170.156, 34671, > None) > 21/12/21 18:37:54 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.3.170.184:52960) > with ID 2 > 21/12/21 18:37:54 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is > ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 > 21/12/21 18:37:54 INFO SharedState: Setting hive.metastore.warehouse.dir > ('null') to the value of spark.sql.warehouse.dir > ('file:/opt/spark/work-dir/spark-warehouse'). > 21/12/21 18:37:54 INFO SharedState: Warehouse path is > 'file:/opt/spark/work-dir/spark-warehouse'. > 21/12/21 18:37:54 INFO BlockManagerMasterEndpoint: Registering block manager > 10.3.170.184:46293 with 413.9 MB RAM, BlockManagerId(2, 10.3.170.184, 46293, > None) > 21/12/21 18:37:54 INFO StateStoreCoordinatorRef: Registered > StateStoreCoordinator endpoint > ['/opt/txd-pyspark-siebel-casos.py', '2021-12-18', > 's3a://mybucket.raw/siebel/casos/'] load_date 2021-12-18 output_path > s3a://mybucket.raw/siebel/casos/ > Traceback (most recent call last): > File "/opt/txd-pyspark-siebel-casos.py", line 68, in <module> > run_etl() > File "/opt/txd-pyspark-siebel-casos.py", line 60, in run_etl > df.write.mode("append").partitionBy(["year", "month", > "day"]).csv(output_path, header=True) > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line > 931, in csv > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in > deco > File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o83.csv. > : com.amazonaws.AmazonClientException: Unable to load AWS credentials from > any provider in the chain > at > com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) > at > com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) > at > com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424) > at > org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) > at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:664) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:238) > at java.lang.Thread.run(Thread.java:748)21/12/21 18:38:00 INFO > SparkContext: Invoking stop() from shutdown hook > 21/12/21 18:38:00 INFO SparkUI: Stopped Spark web UI at > http://spark-siebel-casos-1640111855179-driver-svc.spark.svc:4040 > 21/12/21 18:38:00 INFO KubernetesClusterSchedulerBackend: Shutting down all > executors > 21/12/21 18:38:00 INFO > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each > executor to shut down > 21/12/21 18:38:00 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has > been closed (this is expected if the application is shutting down.) > 21/12/21 18:38:01 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 21/12/21 18:38:01 INFO MemoryStore: MemoryStore cleared > 21/12/21 18:38:01 INFO BlockManager: BlockManager stopped > 21/12/21 18:38:01 INFO BlockManagerMaster: BlockManagerMaster stopped > 21/12/21 18:38:01 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 21/12/21 18:38:01 INFO SparkContext: Successfully stopped SparkContext > 21/12/21 18:38:01 INFO ShutdownHookManager: Shutdown hook called > 21/12/21 18:38:01 INFO ShutdownHookManager: Deleting directory > /var/data/spark-458585a1-50f9-45c6-a4cf-d552c04a97dc/spark-f99cee68-d203-4a2a-8335-9743eeac5350 > 21/12/21 18:38:01 INFO ShutdownHookManager: Deleting directory > /tmp/spark-245a2532-04c0-4309-9f43-00fbca06d435 > 21/12/21 18:38:01 INFO ShutdownHookManager: Deleting directory > /var/data/spark-458585a1-50f9-45c6-a4cf-d552c04a97dc/spark-f99cee68-d203-4a2a-8335-9743eeac5350/pyspark-83a9a9c2-0b44-4bcf-8d86-6fb388ba275e > {code} > Pod describe: > {code:java} > Containers: > spark-kubernetes-driver: > Container ID: > docker://3606841142e7dc76f3a5b29f7df87da8159a0d0c53897f96444670a04134a2ff > Image: > registry.example.com/myorg/ata/spark/spark-k8s/spark-py:2.4.4 > Image ID: > docker-pullable://registry.example.com/myorg/ata/spark/spark-k8s/spark-py@sha256:744eae637693e0c6f2195ed1e4e2bab9def5b9c7507518c5d4b61b7933c63e10 > Ports: 7078/TCP, 7079/TCP, 4040/TCP > Host Ports: 0/TCP, 0/TCP, 0/TCP > Args: > driver-py > --properties-file > /opt/spark/conf/spark.properties > --class > org.apache.spark.deploy.PythonRunner > State: Terminated > Reason: Error > Exit Code: 1 > Started: Tue, 21 Dec 2021 12:44:59 -0300 > Finished: Tue, 21 Dec 2021 12:45:29 -0300 > Ready: False > Restart Count: 0 > Limits: > cpu: 1 > memory: 1433Mi > Requests: > cpu: 1 > memory: 1433Mi > Environment: > JOB_PATH: > s3://mybucket.spark.jobs/siebel-casos-actividades > SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP) > SPARK_LOCAL_DIRS: > /var/data/spark-9ff6233c-1660-4be4-94b7-2e961412f958 > PYSPARK_PRIMARY: file:/opt/txd-pyspark-siebel-casos.py > PYSPARK_MAJOR_PYTHON_VERSION: 3 > PYSPARK_APP_ARGS: 2021-12-18 > s3a://mybucket.raw/siebel/casos/ > PYSPARK_FILES: > SPARK_CONF_DIR: /opt/spark/conf > AWS_DEFAULT_REGION: us-east-1 > AWS_REGION: us-east-1 > AWS_ROLE_ARN: > arn:aws:iam::999999999999:role/EKSDefaultPolicyFor-Spark > AWS_WEB_IDENTITY_TOKEN_FILE: > /var/run/secrets/eks.amazonaws.com/serviceaccount/token > Mounts: > /opt/spark/conf from spark-conf-volume (rw) > /var/data/spark-9ff6233c-1660-4be4-94b7-2e961412f958 from > spark-local-dir-1 (rw) > /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token > (ro) > /var/run/secrets/kubernetes.io/serviceaccount from spark-token-r6p46 > (ro) {code} > Classpath: > {code:java} > 21/12/21 18:37:46 INFO SparkContext: Added JAR file:///opt/ojdbc8.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/ojdbc8.jar > with timestamp 1640111866249 21/12/21 18:37:46 INFO SparkContext: Added JAR > file:///opt/aws-java-sdk-bundle-1.11.901.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/aws-java-sdk-bundle-1.11.901.jar > with timestamp 1640111866249 21/12/21 18:37:46 INFO SparkContext: Added JAR > file:///opt/hadoop-aws-3.3.1.jar at > spark://spark-siebel-casos-1640111855179-driver-svc.spark.svc:7078/jars/hadoop-aws-3.3.1.jar > with timestamp 1640111866249 {code} > Reporter: Esteban Avendaño > Priority: Major > > Hello everybody, please help with this issue. I have a job running with spark > over kubernetes (AWS EKS) and I get this error: > {code:java} > py4j.protocol.Py4JJavaError: An error occurred while calling o83.csv. > : com.amazonaws.AmazonClientException: Unable to load AWS credentials from > any provider in the chain > at > com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) > at > com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) > at > com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org