Once again, I’d have to agree with Sean.

Let’s table the meaning of SPIP for another time, say. I think a few of us are 
trying to understand what does “accelerator resource aware” mean. As far as I 
know, no one is discussing API here. But on google doc, JIRA and on email and 
off list, I have seen questions, questions that are greatly concerning, like 
“oh scheduler is allocating GPU, but how does it affect memory” and many more, 
and so I think finer “high level” goals should be defined.




________________________________
From: Sean Owen <sro...@gmail.com>
Sent: Sunday, March 3, 2019 5:24 PM
To: Xiangrui Meng
Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I think treating SPIPs as this high-level takes away much of the point
of VOTEing on them. I'm not sure that's even what Reynold is
suggesting elsewhere; we're nowhere near discussing APIs here, just
what 'accelerator aware' even generally means. If the scope isn't
specified, what are we trying to bind with a formal VOTE? The worst I
can say is that this doesn't mean much, so the outcome of the vote
doesn't matter. The general ideas seems fine to me and I support
_something_ like this.

I think the subtext concern is that SPIPs become a way to request
cover to make a bunch of decisions separately, later. This is, to some
extent, how it has to work. A small number of interested parties need
to decide the details coherently, not design the whole thing by
committee, with occasional check-ins for feedback. There's a balance
between that, and using the SPIP as a license to go finish a design
and proclaim it later. That's not anyone's bad-faith intention, just
the risk of deferring so much.

Mesos support is not a big deal by itself but a fine illustration of
the point. That seems like a fine question of scope now, even if the
'how' or some of the 'what' can be decided later. I raised an eyebrow
here at the reply that this was already judged out-of-scope: how much
are we on the same page about this being a point to consider feedback?

If one wants to VOTE on more details, then this vote just doesn't
matter much. Is a future step to VOTE on some more detailed design
doc? Then that's what I call a "SPIP" and it's practically just
semantics.


On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <men...@gmail.com> wrote:
>
> Hi Felix,
>
> Just to clarify, we are voting on the SPIP, not the companion scoping doc. 
> What is proposed and what we are voting on is to make Spark 
> accelerator-aware. The companion scoping doc and the design sketch are to 
> help demonstrate that what features could be implemented based on the use 
> cases and dev resources the co-authors are aware of. The exact scoping and 
> design would require more community involvement, by no means we are 
> finalizing it in this vote thread.
>
> I think copying the goals and non-goals from the companion scoping doc to the 
> SPIP caused the confusion. As mentioned in the SPIP, we proposed to make two 
> major changes at high level:
>
> At cluster manager level, we update or upgrade cluster managers to include 
> GPU support. Then we expose user interfaces for Spark to request GPUs from 
> them.
> Within Spark, we update its scheduler to understand available GPUs allocated 
> to executors, user task requests, and assign GPUs to tasks properly.
>
> We should keep our vote discussion at this level. It doesn't exclude 
> Mesos/Windows/TPU/FPGA, nor it commits to support YARN/K8s. Through the 
> initial scoping work, we found that we certainly need domain experts to 
> discuss the support of each cluster manager and each accelerator type. But 
> adding more details on Mesos or FPGA doesn't change the SPIP at high level. 
> So we concluded the initial scoping, shared the docs, and started this vote.

Reply via email to