+1 (non-binding) On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
> +1 > > Bests, > Dongjoon. > > > On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.s...@gmail.com> > wrote: > >> +1 (binding) >> >> Thanks >> Saisai >> >> Imran Rashid <im...@therashids.com> 于2019年6月15日周六 上午3:46写道: >> >>> +1 (binding) >>> >>> I think this is a really important feature for spark. >>> >>> First, there is already a lot of interest in alternative shuffle storage >>> in the community. There is already a lot of interest in alternative >>> shuffle storage, from dynamic allocation in kubernetes, to even just >>> improving stability in standard on-premise use of Spark. However, they're >>> often stuck doing this in forks of Spark, and in ways that are not >>> maintainable (because they copy-paste many spark internals) or are >>> incorrect (for not correctly handling speculative execution & stage >>> retries). >>> >>> Second, I think the specific proposal is good for finding the right >>> balance between flexibility and too much complexity, to allow incremental >>> improvements. A lot of work has been put into this already to try to >>> figure out which pieces are essential to make alternative shuffle storage >>> implementations feasible. >>> >>> Of course, that means it doesn't include everything imaginable; some >>> things still aren't supported, and some will still choose to use the older >>> ShuffleManager api to give total control over all of shuffle. But we know >>> there are a reasonable set of things which can be implemented behind the >>> api as the first step, and it can continue to evolve. >>> >>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <i...@cornell.edu> >>> wrote: >>> >>>> +1 (non-binding). This API is versatile and flexible enough to handle >>>> Bloomberg's internal use-cases. The ability for us to vary implementation >>>> strategies is quite appealing. It is also worth to note the minimal changes >>>> to Spark core in order to make it work. This is a very much needed addition >>>> within the Spark shuffle story. >>>> >>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyan...@gmail.com> wrote: >>>> >>>>> +1 This is great work, allowing plugin of different sort shuffle >>>>> write/read implementation! Also great to see it retain the current Spark >>>>> configuration >>>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl). >>>>> >>>>> >>>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mch...@palantir.com> >>>>> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> >>>>>> >>>>>> I would like to call a vote for the SPIP for SPARK-25299 >>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes >>>>>> to introduce a pluggable storage API for temporary shuffle data. >>>>>> >>>>>> >>>>>> >>>>>> You may find the SPIP document here >>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit> >>>>>> . >>>>>> >>>>>> >>>>>> >>>>>> The discussion thread for the SPIP was conducted here >>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E> >>>>>> . >>>>>> >>>>>> >>>>>> >>>>>> Please vote on whether or not this proposal is agreeable to you. >>>>>> >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> >>>>>> -Matt Cheah >>>>>> >>>>> -- Ryan Blue Software Engineer Netflix