Hello,
I have a local S3 service that is writable and readable using AWS sdk APIs.
I created the spark session and then set the hadoop configurations as
follows -
// Create Spark Session
val spark = SparkSession
.builder()
.master("local[*]")
.appName("S3Loaders")
.config("spark.sql.streaming.checkpointLocation",
"/Users/atekade/checkpoint-s3-loaders/")
.getOrCreate()
// Take spark context from spark session
val sc = spark.sparkContext
// Configure spark context with S3 values
val accessKey = "00cce9eb2c589b1b1b5b"
val secretKey = "flmheKX9Gb1tTlImO6xR++9kvnUByfRKZfI7LJT8"
val endpoint = "http://s3-region1.mycloudianhyperstore.com:80"
spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", endpoint)
// spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", accessKey)
// spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secretKey)
sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", accessKey)
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", secretKey)
And then trying to write to the s3 as follows -
val query = rawData
.writeStream
.format("csv")
.option("format", "append")
.option("path", "s3a://bucket0/")
.outputMode("append")
.start()
But nothing is actually getting written. Since I am running this from my
local machine, I have an entry for the ip-address and S3 endpoint into the
/etc/hosts file. As you can see this is a streaming dataframe and so can
not write without writeStream API. Can someone help about what am I missing
here? Is there any better way to perform this?
Best,
Aniruddha
-----------
ᐧ