Sean, thanks for your input and making a pass on the updated SPIP! As the next step, how about having a remote meeting to discuss the remaining topics? I started a doodle poll here <https://doodle.com/poll/33cthyc6f8i8naya>. Due to time constraint, I suggest limiting the attendees to committers and posting the meeting summary to JIRA after.
On Tue, Mar 19, 2019 at 10:16 AM Sean Owen <sro...@gmail.com> wrote: > This looks like a great level of detail. The broad strokes look good to me. > > I'm happy with just about any story around what to do with Mesos GPU > support now, but might at least deserve a mention: does the existing > Mesos config simply become a deprecated alias for the > spark.executor.accelerator.gpu.count? and no further support is added > to Mesos? that seems entirely coherent, and if that's agreeable, could > be worth a line here. > I would go with deprecated alias option. But I would defer the decision to some committer who is willing to shepherd the Mesos sub-project. > > I think it could go into Spark 3 but need not block it. This doesn't > say it does, merely says it's desirable to have it ready for 3.0 if > possible. That seems like a fine position. > > On Mon, Mar 18, 2019 at 1:56 PM Xingbo Jiang <jiangxb1...@gmail.com> > wrote: > > > > Hi all, > > > > I updated the SPIP doc and stories, I hope it now contains clear scope > of the changes and enough details for SPIP vote. > > Please review the updated docs, thanks! > > > > Xiangrui Meng <men...@gmail.com> 于2019年3月6日周三 上午8:35写道: > >> > >> How about letting Xingbo make a major revision to the SPIP doc to make > it clear what proposed are? I like Felix's suggestion to switch to the new > Heilmeier template, which helps clarify what are proposed and what are not. > Then let's review the new SPIP and resume the vote. > >> > >> On Tue, Mar 5, 2019 at 7:54 AM Imran Rashid <im...@therashids.com> > wrote: > >>> > >>> OK, I suppose then we are getting bogged down into what a vote on an > SPIP means then anyway, which I guess we can set aside for now. With the > level of detail in this proposal, I feel like there is a reasonable chance > I'd still -1 the design or implementation. > >>> > >>> And the other thing you're implicitly asking the community for is to > prioritize this feature for continued review and maintenance. There is > already work to be done in things like making barrier mode support dynamic > allocation (SPARK-24942), bugs in failure handling (eg. SPARK-25250), and > general efficiency of failure handling (eg. SPARK-25341, SPARK-20178). I'm > very concerned about getting spread too thin. > >>> > >>> > >>> But if this is really just a vote on (1) is better gpu support > important for spark, in some form, in some release? and (2) is it > *possible* to do this in a safe way? then I will vote +0. > >>> > >>> On Tue, Mar 5, 2019 at 8:25 AM Tom Graves <tgraves...@yahoo.com> > wrote: > >>>> > >>>> So to me most of the questions here are implementation/design > questions, I've had this issue in the past with SPIP's where I expected to > have more high level design details but was basically told that belongs in > the design jira follow on. This makes me think we need to revisit what a > SPIP really need to contain, which should be done in a separate thread. > Note personally I would be for having more high level details in it. > >>>> But the way I read our documentation on a SPIP right now that detail > is all optional, now maybe we could argue its based on what reviewers > request, but really perhaps we should make the wording of that more > required. thoughts? We should probably separate that discussion if people > want to talk about that. > >>>> > >>>> For this SPIP in particular the reason I +1 it is because it came > down to 2 questions: > >>>> > >>>> 1) do I think spark should support this -> my answer is yes, I think > this would improve spark, users have been requesting both better GPUs > support and support for controlling container requests at a finer > granularity for a while. If spark doesn't support this then users may go > to something else, so I think it we should support it > >>>> > >>>> 2) do I think its possible to design and implement it without causing > large instabilities? My opinion here again is yes. I agree with Imran and > others that the scheduler piece needs to be looked at very closely as we > have had a lot of issues there and that is why I was asking for more > details in the design jira: > https://issues.apache.org/jira/browse/SPARK-27005. But I do believe its > possible to do. > >>>> > >>>> If others have reservations on similar questions then I think we > should resolve here or take the discussion of what a SPIP is to a different > thread and then come back to this, thoughts? > >>>> > >>>> Note there is a high level design for at least the core piece, which > is what people seem concerned with, already so including it in the SPIP > should be straight forward. > >>>> > >>>> Tom > >>>> > >>>> On Monday, March 4, 2019, 2:52:43 PM CST, Imran Rashid < > im...@therashids.com> wrote: > >>>> > >>>> > >>>> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <men...@gmail.com> > wrote: > >>>> > >>>> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung < > felixcheun...@hotmail.com> wrote: > >>>> > >>>> IMO upfront allocation is less useful. Specifically too expensive for > large jobs. > >>>> > >>>> > >>>> This is also an API/design discussion. > >>>> > >>>> > >>>> I agree with Felix -- this is more than just an API question. It has > a huge impact on the complexity of what you're proposing. You might be > proposing big changes to a core and brittle part of spark, which is already > short of experts. > >>>> > >>>> I don't see any value in having a vote on "does feature X sound > cool?" We have to evaluate the potential benefit against the risks the > feature brings and the continued maintenance cost. We don't need super > low-level details, but we have to a sketch of the design to be able to make > that tradeoff. >