Once again, I’d have to agree with Sean. Let’s table the meaning of SPIP for another time, say. I think a few of us are trying to understand what does “accelerator resource aware” mean. As far as I know, no one is discussing API here. But on google doc, JIRA and on email and off list, I have seen questions, questions that are greatly concerning, like “oh scheduler is allocating GPU, but how does it affect memory” and many more, and so I think finer “high level” goals should be defined.
________________________________ From: Sean Owen <sro...@gmail.com> Sent: Sunday, March 3, 2019 5:24 PM To: Xiangrui Meng Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I think treating SPIPs as this high-level takes away much of the point of VOTEing on them. I'm not sure that's even what Reynold is suggesting elsewhere; we're nowhere near discussing APIs here, just what 'accelerator aware' even generally means. If the scope isn't specified, what are we trying to bind with a formal VOTE? The worst I can say is that this doesn't mean much, so the outcome of the vote doesn't matter. The general ideas seems fine to me and I support _something_ like this. I think the subtext concern is that SPIPs become a way to request cover to make a bunch of decisions separately, later. This is, to some extent, how it has to work. A small number of interested parties need to decide the details coherently, not design the whole thing by committee, with occasional check-ins for feedback. There's a balance between that, and using the SPIP as a license to go finish a design and proclaim it later. That's not anyone's bad-faith intention, just the risk of deferring so much. Mesos support is not a big deal by itself but a fine illustration of the point. That seems like a fine question of scope now, even if the 'how' or some of the 'what' can be decided later. I raised an eyebrow here at the reply that this was already judged out-of-scope: how much are we on the same page about this being a point to consider feedback? If one wants to VOTE on more details, then this vote just doesn't matter much. Is a future step to VOTE on some more detailed design doc? Then that's what I call a "SPIP" and it's practically just semantics. On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <men...@gmail.com> wrote: > > Hi Felix, > > Just to clarify, we are voting on the SPIP, not the companion scoping doc. > What is proposed and what we are voting on is to make Spark > accelerator-aware. The companion scoping doc and the design sketch are to > help demonstrate that what features could be implemented based on the use > cases and dev resources the co-authors are aware of. The exact scoping and > design would require more community involvement, by no means we are > finalizing it in this vote thread. > > I think copying the goals and non-goals from the companion scoping doc to the > SPIP caused the confusion. As mentioned in the SPIP, we proposed to make two > major changes at high level: > > At cluster manager level, we update or upgrade cluster managers to include > GPU support. Then we expose user interfaces for Spark to request GPUs from > them. > Within Spark, we update its scheduler to understand available GPUs allocated > to executors, user task requests, and assign GPUs to tasks properly. > > We should keep our vote discussion at this level. It doesn't exclude > Mesos/Windows/TPU/FPGA, nor it commits to support YARN/K8s. Through the > initial scoping work, we found that we certainly need domain experts to > discuss the support of each cluster manager and each accelerator type. But > adding more details on Mesos or FPGA doesn't change the SPIP at high level. > So we concluded the initial scoping, shared the docs, and started this vote.