Hello Users,

I am using on-premise object storage and able to perform operations on
different bucket using aws-cli.
However, when I am trying to use the same path from my spark code, it
fails. Here are the details -

Addes dependencies in build.sbt -

   - hadoop-aws-2.7.4.ja
   - aws-java-sdk-1.7.4.jar

Spark Hadoop Configuration setup as -

spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", ENDPOINT);
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY);
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY);
spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true")

And now I try to write data into my custom s3 endpoint as follows -

        val dataStreamWriter: DataStreamWriter[Row] = PM25quality.select(
      dayofmonth(current_date()) as "day",
      month(current_date()) as "month",
      year(current_date()) as "year",
      column("time"),
      column("quality"),
      column("PM25"))
      .writeStream
      .partitionBy("year", "month", "day")
      .format("csv")
      .outputMode("append")
      .option("path",  "s3a://test-bucket/")
        val streamingQuery: StreamingQuery = dataStreamWriter.start()


However, I am getting en error that AmazonHttpClient is not able to execute
HTTP request and
also it is referring to the bucket-name before the URL. Seems like the
hadoop configuration is not being resolved here -


20/05/01 16:51:37 INFO AmazonHttpClient: Unable to execute HTTP request:
test-bucket.s3-region0.cloudian.com
java.net.UnknownHostException: test-bucket.s3-region0.cloudian.com


Is there anything that I am missing here in the configurations? Seems like
even after setting up path style access to true,
it's not working.

--
Aniruddha
-----------
ᐧ

Reply via email to