Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

Ryan Blue Mon, 17 Jun 2019 13:58:26 -0700

+1 (non-binding)

On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <[email protected]>
wrote:


> +1
>
> Bests,
> Dongjoon.
>
>
> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <[email protected]>
> wrote:
>
>> +1 (binding)
>>
>> Thanks
>> Saisai
>>
>> Imran Rashid <[email protected]> 于2019年6月15日周六 上午3:46写道：
>>
>>> +1 (binding)
>>>
>>> I think this is a really important feature for spark.
>>>
>>> First, there is already a lot of interest in alternative shuffle storage
>>> in the community.  There is already a lot of interest in alternative
>>> shuffle storage, from dynamic allocation in kubernetes, to even just
>>> improving stability in standard on-premise use of Spark.  However, they're
>>> often stuck doing this in forks of Spark, and in ways that are not
>>> maintainable (because they copy-paste many spark internals) or are
>>> incorrect (for not correctly handling speculative execution & stage
>>> retries).
>>>
>>> Second, I think the specific proposal is good for finding the right
>>> balance between flexibility and too much complexity, to allow incremental
>>> improvements.  A lot of work has been put into this already to try to
>>> figure out which pieces are essential to make alternative shuffle storage
>>> implementations feasible.
>>>
>>> Of course, that means it doesn't include everything imaginable; some
>>> things still aren't supported, and some will still choose to use the older
>>> ShuffleManager api to give total control over all of shuffle.  But we know
>>> there are a reasonable set of things which can be implemented behind the
>>> api as the first step, and it can continue to evolve.
>>>
>>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <[email protected]>
>>> wrote:
>>>
>>>> +1 (non-binding). This API is versatile and flexible enough to handle
>>>> Bloomberg's internal use-cases. The ability for us to vary implementation
>>>> strategies is quite appealing. It is also worth to note the minimal changes
>>>> to Spark core in order to make it work. This is a very much needed addition
>>>> within the Spark shuffle story.
>>>>
>>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <[email protected]> wrote:
>>>>
>>>>> +1 This is great work, allowing plugin of different sort shuffle
>>>>> write/read implementation! Also great to see it retain the current Spark
>>>>> configuration
>>>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>>>
>>>>>
>>>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would like to call a vote for the SPIP for SPARK-25299
>>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes
>>>>>> to introduce a pluggable storage API for temporary shuffle data.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You may find the SPIP document here
>>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>> The discussion thread for the SPIP was conducted here
>>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>>>>> .
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please vote on whether or not this proposal is agreeable to you.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matt Cheah
>>>>>>
>>>>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

Reply via email to