Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-09 Thread Erik Erlandson
Curious how SPARK-25299 (where file tracking is pushed to spark drivers, at
least in option-5) interacts with Splash. The shuffle data location in
SPARK-25299 would now have additional "fallback" logic for recovering from
executor loss.

On Thu, Jan 3, 2019 at 6:24 AM Peter Rudenko 
wrote:

> Hi Matt, i'm a developer of SparkRDMA shuffle manager:
> https://github.com/Mellanox/SparkRDMA
> Thanks for your effort on improving Spark Shuffle API. We are very
> interested in participating in this. Have for now several comments:
> 1. Went through these 4 documents:
>
>
> https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#
> 
>
>
> https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit
>
>
> https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40
>
>
> https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#
> 
> As i understood there's 2 discussions: improving shuffle manager API
> itself (Splash manager) and improving external shuffle service
>
> 
> 2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
>  whether to support
> RDMA in the main codebase or at least as a first-class shuffle plugin
> (there are not much other open source shuffle plugins exists). We actively
> develop it, adding new features. RDMA is now available on Azure (
> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
> Alibaba  and other cloud providers. For now we support only memory <->
> memory transfer, but rdma is extensible to NVM and GPU data transfer.
> 3. We have users that are interested in having this feature (
> https://issues.apache.org/jira/browse/SPARK-12196) - we can consider
> adding it to this new API.
>
> Let me know if you need help in review / testing / benchmark.
> I'll look more on documents and PR,
>
> Thanks,
> Peter Rudenko
> Software engineer at Mellanox Technologies.
>
>
> ср, 19 груд. 2018 о 20:54 John Zhuge  пише:
>
>> Matt, appreciate the update!
>>
>> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah  wrote:
>>
>>> Hi everyone,
>>>
>>>
>>>
>>> Earlier this year, we proposed SPARK-25299
>>> , proposing the idea
>>> of using other storage systems for persisting shuffle files. Since that
>>> time, we have been continuing to work on prototypes for this project. In
>>> the interest of increasing transparency into our work, we have created a 
>>> progress
>>> report document
>>> 
>>> where you may find a summary of the work we have been doing, as well as
>>> links to our prototypes on Github. We would ask that anyone who is very
>>> familiar with the inner workings of Spark’s shuffle could provide feedback
>>> and comments on our work thus far. We welcome any further discussion in
>>> this space. You may comment in this e-mail thread or by commenting on the
>>> progress report document.
>>>
>>>
>>>
>>> Looking forward to hearing from you. Thanks,
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>
>>
>> --
>> John
>>
>


Re: SPARk-25299: Updates As Of December 19, 2018

2019-01-03 Thread Peter Rudenko
Hi Matt, i'm a developer of SparkRDMA shuffle manager:
https://github.com/Mellanox/SparkRDMA
Thanks for your effort on improving Spark Shuffle API. We are very
interested in participating in this. Have for now several comments:
1. Went through these 4 documents:

https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#


https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit

https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40

https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#

As i understood there's 2 discussions: improving shuffle manager API itself
(Splash manager) and improving external shuffle service

2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
 whether to support RDMA
in the main codebase or at least as a first-class shuffle plugin (there are
not much other open source shuffle plugins exists). We actively develop it,
adding new features. RDMA is now available on Azure (
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
Alibaba  and other cloud providers. For now we support only memory <->
memory transfer, but rdma is extensible to NVM and GPU data transfer.
3. We have users that are interested in having this feature (
https://issues.apache.org/jira/browse/SPARK-12196) - we can consider adding
it to this new API.

Let me know if you need help in review / testing / benchmark.
I'll look more on documents and PR,

Thanks,
Peter Rudenko
Software engineer at Mellanox Technologies.


ср, 19 груд. 2018 о 20:54 John Zhuge  пише:

> Matt, appreciate the update!
>
> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah  wrote:
>
>> Hi everyone,
>>
>>
>>
>> Earlier this year, we proposed SPARK-25299
>> , proposing the idea
>> of using other storage systems for persisting shuffle files. Since that
>> time, we have been continuing to work on prototypes for this project. In
>> the interest of increasing transparency into our work, we have created a 
>> progress
>> report document
>> 
>> where you may find a summary of the work we have been doing, as well as
>> links to our prototypes on Github. We would ask that anyone who is very
>> familiar with the inner workings of Spark’s shuffle could provide feedback
>> and comments on our work thus far. We welcome any further discussion in
>> this space. You may comment in this e-mail thread or by commenting on the
>> progress report document.
>>
>>
>>
>> Looking forward to hearing from you. Thanks,
>>
>>
>>
>> -Matt Cheah
>>
>
>
> --
> John
>


Re: SPARk-25299: Updates As Of December 19, 2018

2018-12-19 Thread John Zhuge
Matt, appreciate the update!

On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah  wrote:

> Hi everyone,
>
>
>
> Earlier this year, we proposed SPARK-25299
> , proposing the idea
> of using other storage systems for persisting shuffle files. Since that
> time, we have been continuing to work on prototypes for this project. In
> the interest of increasing transparency into our work, we have created a 
> progress
> report document
> 
> where you may find a summary of the work we have been doing, as well as
> links to our prototypes on Github. We would ask that anyone who is very
> familiar with the inner workings of Spark’s shuffle could provide feedback
> and comments on our work thus far. We welcome any further discussion in
> this space. You may comment in this e-mail thread or by commenting on the
> progress report document.
>
>
>
> Looking forward to hearing from you. Thanks,
>
>
>
> -Matt Cheah
>


-- 
John


SPARk-25299: Updates As Of December 19, 2018

2018-12-19 Thread Matt Cheah
Hi everyone,

 

Earlier this year, we proposed SPARK-25299, proposing the idea of using other 
storage systems for persisting shuffle files. Since that time, we have been 
continuing to work on prototypes for this project. In the interest of 
increasing transparency into our work, we have created a progress report 
document where you may find a summary of the work we have been doing, as well 
as links to our prototypes on Github. We would ask that anyone who is very 
familiar with the inner workings of Spark’s shuffle could provide feedback and 
comments on our work thus far. We welcome any further discussion in this space. 
You may comment in this e-mail thread or by commenting on the progress report 
document.

 

Looking forward to hearing from you. Thanks,

 

-Matt Cheah



smime.p7s
Description: S/MIME cryptographic signature