Curious how SPARK-25299 (where file tracking is pushed to spark drivers, at
least in option-5) interacts with Splash. The shuffle data location in
SPARK-25299 would now have additional "fallback" logic for recovering from
executor loss.

On Thu, Jan 3, 2019 at 6:24 AM Peter Rudenko <petro.rude...@gmail.com>
wrote:

> Hi Matt, i'm a developer of SparkRDMA shuffle manager:
> https://github.com/Mellanox/SparkRDMA
> Thanks for your effort on improving Spark Shuffle API. We are very
> interested in participating in this. Have for now several comments:
> 1. Went through these 4 documents:
>
>
> https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#
> <https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit>
>
>
> https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit
>
>
> https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40
>
>
> https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#
> <https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit>
> As i understood there's 2 discussions: improving shuffle manager API
> itself (Splash manager) and improving external shuffle service
>
> <https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.9o9f7nm01fz6>
> 2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
> <https://issues.apache.org/jira/browse/SPARK-22229> whether to support
> RDMA in the main codebase or at least as a first-class shuffle plugin
> (there are not much other open source shuffle plugins exists). We actively
> develop it, adding new features. RDMA is now available on Azure (
> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
> Alibaba  and other cloud providers. For now we support only memory <->
> memory transfer, but rdma is extensible to NVM and GPU data transfer.
> 3. We have users that are interested in having this feature (
> https://issues.apache.org/jira/browse/SPARK-12196) - we can consider
> adding it to this new API.
>
> Let me know if you need help in review / testing / benchmark.
> I'll look more on documents and PR,
>
> Thanks,
> Peter Rudenko
> Software engineer at Mellanox Technologies.
>
>
> ср, 19 груд. 2018 о 20:54 John Zhuge <john.zh...@gmail.com> пише:
>
>> Matt, appreciate the update!
>>
>> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah <mch...@palantir.com> wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> Earlier this year, we proposed SPARK-25299
>>> <https://issues.apache.org/jira/browse/SPARK-25299>, proposing the idea
>>> of using other storage systems for persisting shuffle files. Since that
>>> time, we have been continuing to work on prototypes for this project. In
>>> the interest of increasing transparency into our work, we have created a 
>>> progress
>>> report document
>>> <https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit?usp=sharing>
>>> where you may find a summary of the work we have been doing, as well as
>>> links to our prototypes on Github. We would ask that anyone who is very
>>> familiar with the inner workings of Spark’s shuffle could provide feedback
>>> and comments on our work thus far. We welcome any further discussion in
>>> this space. You may comment in this e-mail thread or by commenting on the
>>> progress report document.
>>>
>>>
>>>
>>> Looking forward to hearing from you. Thanks,
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>
>>
>> --
>> John
>>
>

Reply via email to