Hello Users, I am using on-premise object storage and able to perform operations on different bucket using aws-cli. However, when I am trying to use the same path from my spark code, it fails. Here are the details -
Addes dependencies in build.sbt - - hadoop-aws-2.7.4.ja - aws-java-sdk-1.7.4.jar Spark Hadoop Configuration setup as - spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", ENDPOINT); spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY); spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY); spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true") And now I try to write data into my custom s3 endpoint as follows - val dataStreamWriter: DataStreamWriter[Row] = PM25quality.select( dayofmonth(current_date()) as "day", month(current_date()) as "month", year(current_date()) as "year", column("time"), column("quality"), column("PM25")) .writeStream .partitionBy("year", "month", "day") .format("csv") .outputMode("append") .option("path", "s3a://test-bucket/") val streamingQuery: StreamingQuery = dataStreamWriter.start() However, I am getting en error that AmazonHttpClient is not able to execute HTTP request and also it is referring to the bucket-name before the URL. Seems like the hadoop configuration is not being resolved here - 20/05/01 16:51:37 INFO AmazonHttpClient: Unable to execute HTTP request: test-bucket.s3-region0.cloudian.com java.net.UnknownHostException: test-bucket.s3-region0.cloudian.com Is there anything that I am missing here in the configurations? Seems like even after setting up path style access to true, it's not working. -- Aniruddha ----------- ᐧ