Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Weichen Xu
+1, nice feature! On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > +1 > > On Fri, Mar 1, 2019 at 12:37 PM Tom Graves > wrote: > >> +1 for the SPIP. >> >> Tom >> >> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < >> jiangxb1...@gmail.com> wrote: >> >> >> Hi all, >> >> I want to call

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Wenchen Fan
+1 On Sat, Mar 2, 2019 at 6:11 AM Yinan Li wrote: > +1 > > On Fri, Mar 1, 2019 at 12:37 PM Tom Graves > wrote: > >> +1 for the SPIP. >> >> Tom >> >> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < >> jiangxb1...@gmail.com> wrote: >> >> >> Hi all, >> >> I want to call for a vote of

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Wenchen Fan
+1, thanks for making it clear that this SPIP focuses on high-level direction! On Sat, Mar 2, 2019 at 9:35 AM Reynold Xin wrote: > Thanks Ryan. +1. > > > > > On Fri, Mar 01, 2019 at 5:33 PM, Ryan Blue wrote: > >> Actually, I went ahead and removed the confusing section. There is no >> public

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Reynold Xin
Thanks Ryan. +1. On Fri, Mar 01, 2019 at 5:33 PM, Ryan Blue < rb...@netflix.com > wrote: > > Actually, I went ahead and removed the confusing section. There is no > public API in the doc now, so that it is clear that it isn't a relevant > part of this vote. > > On Fri, Mar 1, 2019 at 4:58 PM

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Ryan Blue
Actually, I went ahead and removed the confusing section. There is no public API in the doc now, so that it is clear that it isn't a relevant part of this vote. On Fri, Mar 1, 2019 at 4:58 PM Ryan Blue wrote: > I moved the public API to the "Implementation Sketch" section. That API is > not an

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Ryan Blue
I moved the public API to the "Implementation Sketch" section. That API is not an important part of this, as that section notes. I completely agree that SPIPs should be high-level and that the specifics, like method names, are not hard requirements. The proposal was more of a sketch, but I was

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Reynold Xin
Ryan - can you take the public user facing API part out of that SPIP? In general it'd be better to have the SPIPs be higher level, and put the detailed APIs in a separate doc. Alternatively, put them in the SPIP but explicitly vote on the high level stuff and not the detailed APIs.  I don't

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Yinan Li
+1 On Fri, Mar 1, 2019 at 12:37 PM Tom Graves wrote: > +1 for the SPIP. > > Tom > > On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang < > jiangxb1...@gmail.com> wrote: > > > Hi all, > > I want to call for a vote of SPARK-24615 > . It

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-03-01 Thread Anthony Young-Garner
+1 (non-binding) On Thu, Feb 28, 2019 at 5:54 PM John Zhuge wrote: > +1 (non-binding) > > On Thu, Feb 28, 2019 at 9:11 AM Matt Cheah wrote: > >> +1 (non-binding) >> >> >> >> *From: *Jamison Bennett >> *Date: *Thursday, February 28, 2019 at 8:28 AM >> *To: *Ryan Blue , Spark Dev List > > >>

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Tom Graves
+1 for the SPIP. Tom On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang wrote: Hi all, I want to call for a vote of SPARK-24615. It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The 

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Andrew Melo
Hi, On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang wrote: > > Hi Sean, > > To support GPU scheduling with YARN cluster, we have to update the hadoop > version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that > version for Spark 3.0, then we just have to disable/fallback the

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Sean Owen
Sounds like a good reason to get in Hadoop 3.1 support. I guess my point is that Spark's Mesos GPU integration has already existed for a long while. It doesn't necessarily need to be expanded, but, seems like it must fit in to the more general framework here. That might be little or no effort,

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Hi Sean, To support GPU scheduling with YARN cluster, we have to update the hadoop version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that version for Spark 3.0, then we just have to disable/fallback the GPU scheduling with YARN, users shall still be able to have that

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Sean Owen
Two late breaking questions: This basically requires Hadoop 3.1 for YARN support? Mesos support is listed as a non goal but it already has support for requesting GPUs in Spark. That would be 'harmonized' with this implementation even if it's not extended? On Fri, Mar 1, 2019, 7:48 AM Xingbo

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xiangrui Meng
+1 Btw, as Ryan pointed out las time, +0 doesn't mean "Don't really care." Official definitions here: https://www.apache.org/foundation/voting.html#expressing-votes-1-0-1-and-fractions - +0: 'I don't feel strongly about it, but I'm okay with this.' - -0: 'I won't get in the way,

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Mingjie
+1 mingjie > On Mar 1, 2019, at 10:18 PM, Xingbo Jiang wrote: > > Start with +1 from myself. > > Xingbo Jiang 于2019年3月1日周五 下午10:14写道: >> Hi all, >> >> I want to call for a vote of SPARK-24615. It improves Spark by making it >> aware of GPUs exposed by cluster managers, and hence Spark can

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Start with +1 from myself. Xingbo Jiang 于2019年3月1日周五 下午10:14写道: > Hi all, > > I want to call for a vote of SPARK-24615 > . It improves Spark by > making it aware of GPUs exposed by cluster managers, and hence Spark can > match GPU resources

[VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Hi all, I want to call for a vote of SPARK-24615 . It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The proposal

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
I think we are aligned on the commitment, I'll start a vote thread for this shortly. Xiangrui Meng 于2019年2月27日周三 上午6:47写道: > In case there are issues visiting Google doc, I attached PDF files to the > JIRA. > > On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang > wrote: > >> Hi all, >> >> I want

Re: CombinePerKey and GroupByKey

2019-03-01 Thread Etienne Chauchot
That's good to know Thanks Etienne Le jeudi 28 février 2019 à 10:05 -0800, Reynold Xin a écrit : > This should be fine. Dataset.groupByKey is a logical operation, not a > physical one (as in Spark wouldn’t always > materialize all the groups in memory). > On Thu, Feb 28, 2019 at 1:46 AM Etienne