Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-26 Thread Mark Hamstra
Yes, I do expect that the application-level approach outlined in this SPIP will be sufficiently useful to be worth doing despite any concerns about it not being ideal. My concern is not just about this design, however. It feels to me like we are running into limitations of the current Spark

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-26 Thread Imran Rashid
+1 on the updated SPIP I agree with all of Mark's concerns, that eventually we want some way for users to express per-task constraints -- but I feel like this is a still a reasonable step forward. In the meantime, users will either write small spark applications, which just do the steps which

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
+1 on the updated SPIP Xingbo Jiang 于2019年3月26日周二 下午1:32写道: > Hi all, > > Now we have had a few discussions over the updated SPIP, we also updated > the SPIP addressing new feedbacks from some committers. IMO the SPIP is > ready for another round of vote now. > On the updated SPIP, we currently

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
Hi all, Now we have had a few discussions over the updated SPIP, we also updated the SPIP addressing new feedbacks from some committers. IMO the SPIP is ready for another round of vote now. On the updated SPIP, we currently have two +1s (from Tom and Xiangrui), everyone else please vote again.

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
On Mon, Mar 25, 2019 at 8:07 PM Mark Hamstra wrote: > Maybe. > > And I expect that we will end up doing something based on spark.task.cpus > in the short term. I'd just rather that this SPIP not make it look like > this is the way things should ideally be done. I'd prefer that we be quite >

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
Maybe. And I expect that we will end up doing something based on spark.task.cpus in the short term. I'd just rather that this SPIP not make it look like this is the way things should ideally be done. I'd prefer that we be quite explicit in recognizing that this approach is a significant

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
There are certainly use cases where different stages require different number of CPUs or GPUs under an optimal setting. I don't think anyone disagrees that ideally users should be able to do it. We are just dealing with typical engineering trade-offs and see how we break it down into smaller ones.

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
I remain unconvinced that a default configuration at the application level makes sense even in that case. There may be some applications where you know a priori that almost all the tasks for all the stages for all the jobs will need some fixed number of gpus; but I think the more common cases will

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
Say if we support per-task resource requests in the future, it would be still inconvenient for users to declare the resource requirements for every single task/stage. So there must be some default values defined somewhere for task resource requirements. "spark.task.cpus" and

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
Of course there is an issue of the perfect becoming the enemy of the good, so I can understand the impulse to get something done. I am left wanting, however, at least something more of a roadmap to a task-level future than just a vague "we may choose to do something more in the future." At the

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Tom Graves
+1 on the updated SPIP. Tom On Monday, March 18, 2019, 12:56:22 PM CDT, Xingbo Jiang wrote: Hi all, I updated the SPIP doc and stories, I hope it now contains clear scope of the changes and enough details for SPIP vote.Please review the updated docs, thanks! Xiangrui Meng

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Tom Graves
While I agree with you that it would be ideal to have the task level resources and do a deeper redesign for the scheduler, I think that can be a separate enhancement like was discussed earlier in the thread. That feature is useful without GPU's.  I do realize that they overlap some but I think

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Mark Hamstra
I understand the application-level, static, global nature of spark.task.accelerator.gpu.count and its similarity to the existing spark.task.cpus, but to me this feels like extending a weakness of Spark's scheduler, not building on its strengths. That is because I consider binding the number of

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Tom Graves
Tthe proposal here is that all your resources are static and the gpu per task config is global per application, meaning you ask for a certain amount memory, cpu, GPUs for every executor up front just like you do today and every executor you get is that size.  This means that both static or

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-21 Thread Marco Gaido
Thanks for this SPIP. I cannot comment on the docs, but just wanted to highlight one thing. In page 5 of the SPIP, when we talk about DRA, I see: "For instance, if each executor consists 4 CPUs and 2 GPUs, and each task requires 1 CPU and 1GPU, then we shall throw an error on application start

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-20 Thread Xiangrui Meng
Steve, the initial work would focus on GPUs, but we will keep the interfaces general to support other accelerators in the future. This was mentioned in the SPIP and draft design. Imran, you should have comment permission now. Thanks for making a pass! I don't think the proposed 3.0 features

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-20 Thread Imran Rashid
Thanks for sending the updated docs. Can you please give everyone the ability to comment? I have some comments, but overall I think this is a good proposal and addresses my prior concerns. My only real concern is that I notice some mention of "must dos" for spark 3.0. I don't want to make any

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-19 Thread Jörn Franke
Also on AWS and probably some more cloud providers > Am 19.03.2019 um 19:45 schrieb Steve Loughran : > > > you might want to look at the work on FPGA resources; again it should just be > a resource available by a scheduler. Key thing is probably just to keep the > docs generic > >

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-19 Thread Steve Loughran
you might want to look at the work on FPGA resources; again it should just be a resource available by a scheduler. Key thing is probably just to keep the docs generic https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingFPGA.html I don't know where you get those FPGAs to play

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-19 Thread Xiangrui Meng
Sean, thanks for your input and making a pass on the updated SPIP! As the next step, how about having a remote meeting to discuss the remaining topics? I started a doodle poll here . Due to time constraint, I suggest limiting the attendees to committers

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-19 Thread Sean Owen
This looks like a great level of detail. The broad strokes look good to me. I'm happy with just about any story around what to do with Mesos GPU support now, but might at least deserve a mention: does the existing Mesos config simply become a deprecated alias for the

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-18 Thread Xingbo Jiang
Hi all, I updated the SPIP doc and stories , I hope it now contains clear scope of the changes and

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Xiangrui Meng
How about letting Xingbo make a major revision to the SPIP doc to make it clear what proposed are? I like Felix's suggestion to switch to the new Heilmeier template, which helps clarify what are proposed and what are not. Then let's review the new SPIP and resume the vote. On Tue, Mar 5, 2019 at

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Imran Rashid
OK, I suppose then we are getting bogged down into what a vote on an SPIP means then anyway, which I guess we can set aside for now. With the level of detail in this proposal, I feel like there is a reasonable chance I'd still -1 the design or implementation. And the other thing you're

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-05 Thread Tom Graves
So to me most of the questions here are implementation/design questions, I've had this issue in the past with SPIP's where I expected to have more high level design details but was basically told that belongs in the design jira follow on. This makes me think we need to revisit what a SPIP

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Mark Hamstra
I'll try to find some time, but it's really at a premium right now. On Mon, Mar 4, 2019 at 3:17 PM Xiangrui Meng wrote: > > > On Mon, Mar 4, 2019 at 3:10 PM Mark Hamstra > wrote: > >> :) Sorry, that was ambiguous. I was seconding Imran's comment. >> > > Could you also help review Xingbo's

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Mark Hamstra
:) Sorry, that was ambiguous. I was seconding Imran's comment. On Mon, Mar 4, 2019 at 3:09 PM Xiangrui Meng wrote: > > > On Mon, Mar 4, 2019 at 1:56 PM Mark Hamstra > wrote: > >> +1 >> > > Mark, just to be clear, are you +1 on the SPIP or Imran's point? > > >> >> On Mon, Mar 4, 2019 at 12:52

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Xiangrui Meng
On Mon, Mar 4, 2019 at 3:10 PM Mark Hamstra wrote: > :) Sorry, that was ambiguous. I was seconding Imran's comment. > Could you also help review Xingbo's design sketch and help evaluate the cost? > > On Mon, Mar 4, 2019 at 3:09 PM Xiangrui Meng wrote: > >> >> >> On Mon, Mar 4, 2019 at 1:56

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Xiangrui Meng
On Mon, Mar 4, 2019 at 1:56 PM Mark Hamstra wrote: > +1 > Mark, just to be clear, are you +1 on the SPIP or Imran's point? > > On Mon, Mar 4, 2019 at 12:52 PM Imran Rashid wrote: > >> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng wrote: >> >>> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Mark Hamstra
+1 On Mon, Mar 4, 2019 at 12:52 PM Imran Rashid wrote: > On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng wrote: > >> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung >> wrote: >> >>> IMO upfront allocation is less useful. Specifically too expensive for >>> large jobs. >>> >> >> This is also an

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Imran Rashid
On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng wrote: > On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung > wrote: > >> IMO upfront allocation is less useful. Specifically too expensive for >> large jobs. >> > > This is also an API/design discussion. > I agree with Felix -- this is more than just an

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Sean Owen
It sounds like there's a discussion about the details coming, which is fine and good. That should maybe also have a VOTE. The debate here is then merely about what and when to call things a SPIP, but that's not important. On Mon, Mar 4, 2019 at 10:23 AM Xiangrui Meng wrote: > I think the two

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Xiangrui Meng
On Mon, Mar 4, 2019 at 8:23 AM Xiangrui Meng wrote: > > > On Mon, Mar 4, 2019 at 7:24 AM Sean Owen wrote: > >> To be clear, those goals sound fine to me. I don't think voting on >> those two broad points is meaningful, but, does no harm per se. If you >> mean this is just a check to see if

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Xiangrui Meng
On Mon, Mar 4, 2019 at 7:24 AM Sean Owen wrote: > To be clear, those goals sound fine to me. I don't think voting on > those two broad points is meaningful, but, does no harm per se. If you > mean this is just a check to see if people believe this is broadly > worthwhile, then +1 from me. Yes it

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Sean Owen
To be clear, those goals sound fine to me. I don't think voting on those two broad points is meaningful, but, does no harm per se. If you mean this is just a check to see if people believe this is broadly worthwhile, then +1 from me. Yes it is. That means we'd want to review something more

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-04 Thread Xiangrui Meng
iangrui Meng > *Cc:* Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido > *Subject:* Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling > > I think treating SPIPs as this high-level takes away much of the point > of VOTEing on them. I'm not sure that's even

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
To: Xiangrui Meng Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I think treating SPIPs as this high-level takes away much of the point of VOTEing on them. I'm not sure that's even what Reynold is suggesting elsewhere

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Sean Owen
I think treating SPIPs as this high-level takes away much of the point of VOTEing on them. I'm not sure that's even what Reynold is suggesting elsewhere; we're nowhere near discussing APIs here, just what 'accelerator aware' even generally means. If the scope isn't specified, what are we trying to

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Xiangrui Meng
March 3, 2019 8:15 AM > *To:* Felix Cheung > *Cc:* Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido > *Subject:* Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling > > I'm for this in general, at least a +0. I do think this has to have a > story for what to do wi

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
this. From: Sean Owen Sent: Sunday, March 3, 2019 8:15 AM To: Felix Cheung Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I'm for this in general, at least a +0. I do think this has

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Sean Owen
nan Li; Tom Graves; dev; Xingbo Jiang > Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling > > +1, a critical feature for AI/DL! > > Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu > ha scritto: >> >> +1, nice feature! >> >> O

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
; dev; Xingbo Jiang Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling +1, a critical feature for AI/DL! Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu mailto:weichen...@databricks.com>> ha scritto: +1, nice feature! On Sat, Mar 2, 2019 at 6:11 AM Yinan Li mailto:l

Re: SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
+1 on mesos - what Sean says From: Andrew Melo Sent: Friday, March 1, 2019 9:19 AM To: Xingbo Jiang Cc: Sean Owen; Xiangrui Meng; dev Subject: Re: SPIP: Accelerator-aware Scheduling Hi, On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang wrote: > > H

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Marco Gaido
+1, a critical feature for AI/DL! Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu < weichen...@databricks.com> ha scritto: > +1, nice feature! > > On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > >> +1 >> >> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves >> wrote: >> >>> +1 for the SPIP. >>>

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Weichen Xu
+1, nice feature! On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > +1 > > On Fri, Mar 1, 2019 at 12:37 PM Tom Graves > wrote: > >> +1 for the SPIP. >> >> Tom >> >> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < >> jiangxb1...@gmail.com> wrote: >> >> >> Hi all, >> >> I want to call

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Wenchen Fan
+1 On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > +1 > > On Fri, Mar 1, 2019 at 12:37 PM Tom Graves > wrote: > >> +1 for the SPIP. >> >> Tom >> >> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < >> jiangxb1...@gmail.com> wrote: >> >> >> Hi all, >> >> I want to call for a vote of

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Yinan Li
+1 On Fri, Mar 1, 2019 at 12:37 PM Tom Graves wrote: > +1 for the SPIP. > > Tom > > On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < > jiangxb1...@gmail.com> wrote: > > > Hi all, > > I want to call for a vote of SPARK-24615 > . It

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Tom Graves
+1 for the SPIP. Tom On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang wrote: Hi all, I want to call for a vote of SPARK-24615. It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The 

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Andrew Melo
Hi, On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang wrote: > > Hi Sean, > > To support GPU scheduling with YARN cluster, we have to update the hadoop > version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that > version for Spark 3.0, then we just have to disable/fallback the

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Sean Owen
Sounds like a good reason to get in Hadoop 3.1 support. I guess my point is that Spark's Mesos GPU integration has already existed for a long while. It doesn't necessarily need to be expanded, but, seems like it must fit in to the more general framework here. That might be little or no effort,

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Hi Sean, To support GPU scheduling with YARN cluster, we have to update the hadoop version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that version for Spark 3.0, then we just have to disable/fallback the GPU scheduling with YARN, users shall still be able to have that

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Sean Owen
Two late breaking questions: This basically requires Hadoop 3.1 for YARN support? Mesos support is listed as a non goal but it already has support for requesting GPUs in Spark. That would be 'harmonized' with this implementation even if it's not extended? On Fri, Mar 1, 2019, 7:48 AM Xingbo

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xiangrui Meng
+1 Btw, as Ryan pointed out las time, +0 doesn't mean "Don't really care." Official definitions here: https://www.apache.org/foundation/voting.html#expressing-votes-1-0-1-and-fractions - +0: 'I don't feel strongly about it, but I'm okay with this.' - -0: 'I won't get in the way,

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Mingjie
+1 mingjie > On Mar 1, 2019, at 10:18 PM, Xingbo Jiang wrote: > > Start with +1 from myself. > > Xingbo Jiang 于2019年3月1日周五 下午10:14写道: >> Hi all, >> >> I want to call for a vote of SPARK-24615. It improves Spark by making it >> aware of GPUs exposed by cluster managers, and hence Spark can

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Start with +1 from myself. Xingbo Jiang 于2019年3月1日周五 下午10:14写道: > Hi all, > > I want to call for a vote of SPARK-24615 > . It improves Spark by > making it aware of GPUs exposed by cluster managers, and hence Spark can > match GPU resources

[VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Hi all, I want to call for a vote of SPARK-24615 . It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The proposal

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
I think we are aligned on the commitment, I'll start a vote thread for this shortly. Xiangrui Meng 于2019年2月27日周三 上午6:47写道: > In case there are issues visiting Google doc, I attached PDF files to the > JIRA. > > On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang > wrote: > >> Hi all, >> >> I want

Re: SPIP: Accelerator-aware Scheduling

2019-02-26 Thread Xiangrui Meng
In case there are issues visiting Google doc, I attached PDF files to the JIRA. On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang wrote: > Hi all, > > I want send a revised SPIP on implementing Accelerator(GPU)-aware > Scheduling. It improves Spark by making it aware of GPUs exposed by cluster >

SPIP: Accelerator-aware Scheduling

2019-02-26 Thread Xingbo Jiang
Hi all, I want send a revised SPIP on implementing Accelerator(GPU)-aware Scheduling. It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. If you have scenarios that need to run workloads(DL/ML/Signal