Re: unsubscribe

2019-07-14 Thread Raj Adyanthaya
unsubscribe

On Sun, Jul 14, 2019 at 6:34 PM Raj Adyanthaya  wrote:

> unsubscribe
>
> On Fri, Jul 5, 2019 at 6:04 AM Paras Bansal  wrote:
>
>>
>>


How to use HDFS >3.1.1 with spark 2.3.3 to output parquet files to S3?

2019-07-14 Thread Alexander Czech
As the subject suggest I want to output an parquet to S3. I know this was
rather troublesome in the past because of S3 not having a move but needed
to do a copy+delete.
This issues has been discussed before see:
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-files-to-s3-with-out-temporary-directory-tc28088.html

Now Hadoop-13786  is
fixing this problem in Hadoop 3.1.0 and later. How can I use that with
spark 2.3.3? I usually orchestrate my cluster on EC2 with flintrock
. Do I just set in the flintrock
config HDFS to 3.1.1 and everything "just works"? Or do I also have to set
a committer algorithm like this when I create my spark context in pyspark:

.set('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version','some_kind_of_Version')

thanks for the help!