Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Takeshi Yamamuro Sat, 01 Oct 2016 03:54:55 -0700

I got this info. from a hadoop jira ticket:
https://issues.apache.org/jira/browse/MAPREDUCE-5485


// maropu

On Sat, Oct 1, 2016 at 7:14 PM, Igor Berman <igor.ber...@gmail.com> wrote:

> Takeshi, why are you saying this, how have you checked it's only used from
> 2.7.3?
> We use spark 2.0 which is shipped with hadoop dependency of 2.7.2 and we
> use this setting.
> We've sort of "verified" it's used by configuring log of file output
> commiter
>
> On 30 September 2016 at 03:12, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> FYI: Seems 
>> `sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version","2”)`
>> is only available at hadoop-2.7.3+.
>>
>> // maropu
>>
>>
>> On Thu, Sep 29, 2016 at 9:28 PM, joffe.tal <joffe....@gmail.com> wrote:
>>
>>> You can use partition explicitly by adding "/<col_name>=<partition
>>> value>" to
>>> the end of the path you are writing to and then use overwrite.
>>>
>>> BTW in Spark 2.0 you just need to use:
>>>
>>> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.al
>>> gorithm.version","2”)
>>> and use s3a://
>>>
>>> and you can work with regular output committer (actually
>>> DirectParquetOutputCommitter is no longer available in Spark 2.0)
>>>
>>> so if you are planning on upgrading this could be another motivation
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/S3-DirectParquetOutputCommitter-Partit
>>> ionBy-SaveMode-Append-tp26398p27810.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: S3 DirectParquetOutputCommitter + PartitionBy + SaveMode.Append

Reply via email to