Re: Integrating ML/DL frameworks with Spark

2018-05-23 Thread Xiangrui Meng
* Bryan Cutler <cutl...@gmail.com> > *Sent:* Monday, May 14, 2018 11:37:20 PM > *To:* Xiangrui Meng > *Cc:* Reynold Xin; dev > > *Subject:* Re: Integrating ML/DL frameworks with Spark > Thanks for starting this discussion, I'd also like to see some > improvement

Re: Integrating ML/DL frameworks with Spark

2018-05-20 Thread Felix Cheung
ing ML/DL frameworks with Spark Thanks for starting this discussion, I'd also like to see some improvements in this area and glad to hear that the Pandas UDFs / Arrow functionality might be useful. I'm wondering if from your initial investigations you found anything lacking from the Arrow

Re: Integrating ML/DL frameworks with Spark

2018-05-17 Thread Daniel Galvez
Hi all, Paul Ogilvie pointed this thread out to me; we overlapped a little at LinkedIn. It’s good to see that this kind of discussion is going on! I have some thoughts regarding the discussion going on: - Practically speaking, one of the lowest hanging fruit is the ability for Spark to

Re: Integrating ML/DL frameworks with Spark

2018-05-15 Thread Bryan Cutler
Thanks for starting this discussion, I'd also like to see some improvements in this area and glad to hear that the Pandas UDFs / Arrow functionality might be useful. I'm wondering if from your initial investigations you found anything lacking from the Arrow format or possible improvements that

Re: Integrating ML/DL frameworks with Spark

2018-05-09 Thread Xiangrui Meng
Shivaram: Yes, we can call it "gang scheduling" or "barrier synchronization". Spark doesn't support it now. The proposal is to have a proper support in Spark's job scheduler, so we can integrate well with MPI-like frameworks. On Tue, May 8, 2018 at 11:17 AM Nan Zhu wrote:

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Nan Zhu
.how I skipped the last part On Tue, May 8, 2018 at 11:16 AM, Reynold Xin wrote: > Yes, Nan, totally agree. To be on the same page, that's exactly what I > wrote wasn't it? > > On Tue, May 8, 2018 at 11:14 AM Nan Zhu wrote: > >> besides

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Reynold Xin
Yes, Nan, totally agree. To be on the same page, that's exactly what I wrote wasn't it? On Tue, May 8, 2018 at 11:14 AM Nan Zhu wrote: > besides that, one of the things which is needed by multiple frameworks is > to schedule tasks in a single wave > > i.e. > > if some

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Nan Zhu
besides that, one of the things which is needed by multiple frameworks is to schedule tasks in a single wave i.e. if some frameworks like xgboost/mxnet requires 50 parallel workers, Spark is desired to provide a capability to ensure that either we run 50 tasks at once, or we should quit the

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Reynold Xin
I think that's what Xiangrui was referring to. Instead of retrying a single task, retry the entire stage, and the entire stage of tasks need to be scheduled all at once. On Tue, May 8, 2018 at 8:53 AM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > >> >>>- Fault tolerance and

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Naveen Swamy
I am committer on the MXNet project and very interested in working on Integrating with Spark. I am wondering how would training proceed in case of 1) training is done on one host with multiple GPUs -- I don't know if Spark's capabilities can leveraged here 2) distributed training with data

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Shivaram Venkataraman
> > > >>- Fault tolerance and execution model: Spark assumes fine-grained >>task recovery, i.e. if something fails, only that task is rerun. This >>doesn’t match the execution model of distributed ML/DL frameworks that are >>typically MPI-based, and rerunning a single task would

Re: Integrating ML/DL frameworks with Spark

2018-05-08 Thread Jörn Franke
Hi, You misunderstood me. I exactly wanted to say that Spark should be aware of them. So I agree with you. The point is to have also the yarn GPU/fpga scheduling as an option aside a potential spark GPU/fpga scheduler. For the other proposal - yes the interfaces are slow, but one has to think

Re: Integrating ML/DL frameworks with Spark

2018-05-07 Thread Reynold Xin
I don't think it's sufficient to have them in YARN (or any other services) without Spark aware of them. If Spark is not aware of them, then there is no way to really efficiently utilize these accelerators when you run anything that require non-accelerators (which is almost 100% of the cases in

Re: Integrating ML/DL frameworks with Spark

2018-05-07 Thread Jörn Franke
Hadoop / Yarn 3.1 added GPU scheduling. 3.2 is planned to add FPGA scheduling, so it might be worth to have the last point generic that not only the Spark scheduler, but all supported schedulers can use GPU. For the other 2 points I just wonder if it makes sense to address this in the ml

Re: Integrating ML/DL frameworks with Spark

2018-05-07 Thread Xiangrui Meng
Thanks Reynold for summarizing the offline discussion! I added a few comments inline. -Xiangrui On Mon, May 7, 2018 at 5:37 PM Reynold Xin wrote: > Hi all, > > Xiangrui and I were discussing with a heavy Apache Spark user last week on > their experiences integrating machine

Integrating ML/DL frameworks with Spark

2018-05-07 Thread Reynold Xin
Hi all, Xiangrui and I were discussing with a heavy Apache Spark user last week on their experiences integrating machine learning (and deep learning) frameworks with Spark and some of their pain points. Couple things were obvious and I wanted to share our learnings with the list. (1) Most