cannon-tp commented on issue #8691: URL: https://github.com/apache/hudi/issues/8691#issuecomment-2219537023
Hey, @danfran I think setting hadoop properties in spark conf could be a problem. I faced the same, resolved it using the following code. ``` import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext, SparkConf from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame conf = (SparkConf().setAppName("hudi-1") .set("spark.hadoop.fs.s3a.endpoint", "http://localstack:4566") .set("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") .set("spark.hadoop.fs.s3a.multipart.size", "104857600") .set("spark.hadoop.fs.s3a.access.key", "test") .set("spark.hadoop.fs.s3a.secret.key", "test") .set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") .set("spark.hadoop.fs.s3a.path.style.access", "true") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.jars.packages", "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.15.0,org.apache.hadoop:hadoop-aws:3.3.3") .set("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.hudi.catalog.HoodieCatalog") .set("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") .set("spark.sql.legacy.timeParserPolicy", "LEGACY") ) sc = SparkContext(conf=conf) glueContext = GlueContext(sc) spark = glueContext.spark_session ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org