Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
+1 on the updated SPIP Xingbo Jiang 于2019年3月26日周二 下午1:32写道: > Hi all, > > Now we have had a few discussions over the updated SPIP, we also updated > the SPIP addressing new feedbacks from some committers. IMO the SPIP is > ready for another round of vote now. > On the updated SPIP, we currently

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
Hi all, Now we have had a few discussions over the updated SPIP, we also updated the SPIP addressing new feedbacks from some committers. IMO the SPIP is ready for another round of vote now. On the updated SPIP, we currently have two +1s (from Tom and Xiangrui), everyone else please vote again.

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Xiao Li
Thanks, DB! The Hive UDAF fix https://github.com/apache/spark/commit/0cfefa7e864f443cfd76cff8c50617a8afd080fb was merged this weekend. Xiao DB Tsai 于2019年3月25日周一 下午9:46写道: > RC9 was just cut. Will send out another thread once the build is finished. > > Sincerely, > > DB Tsai >

Re: [DISCUSS] Spark Columnar Processing

2019-03-25 Thread Wenchen Fan
Do you have some initial perf numbers? It seems fine to me to remain row-based inside Spark with whole-stage-codegen, and convert rows to columnar batches when communicating with external systems. On Mon, Mar 25, 2019 at 1:05 PM Bobby Evans wrote: > This thread is to discuss adding in support

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Reynold Xin
At some point we should celebrate having the larger RC number ever in Spark ... On Mon, Mar 25, 2019 at 9:44 PM, DB Tsai < dbt...@dbtsai.com.invalid > wrote: > > > > RC9 was just cut. Will send out another thread once the build is finished. > > > > > Sincerely, > > > > DB Tsai >

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
RC9 was just cut. Will send out another thread once the build is finished. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Mar 25, 2019 at 5:10 PM Sean Owen wrote: > > That's all merged now. I think

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Reynold Xin
+1 on doing this in 3.0. On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung < felixcheun...@hotmail.com > wrote: > > I’m +1 if 3.0 > > > >   > *From:* Sean Owen < srowen@ gmail. com ( sro...@gmail.com ) > > *Sent:* Monday, March 25, 2019 6:48 PM > *To:* Hyukjin Kwon > *Cc:* dev; Bryan Cutler;

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Felix Cheung
I’m +1 if 3.0 From: Sean Owen Sent: Monday, March 25, 2019 6:48 PM To: Hyukjin Kwon Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] I don't know a lot about Arrow here, but seems

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
On Mon, Mar 25, 2019 at 8:07 PM Mark Hamstra wrote: > Maybe. > > And I expect that we will end up doing something based on spark.task.cpus > in the short term. I'd just rather that this SPIP not make it look like > this is the way things should ideally be done. I'd prefer that we be quite >

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
Maybe. And I expect that we will end up doing something based on spark.task.cpus in the short term. I'd just rather that this SPIP not make it look like this is the way things should ideally be done. I'd prefer that we be quite explicit in recognizing that this approach is a significant

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
There are certainly use cases where different stages require different number of CPUs or GPUs under an optimal setting. I don't think anyone disagrees that ideally users should be able to do it. We are just dealing with typical engineering trade-offs and see how we break it down into smaller ones.

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Sean Owen
I don't know a lot about Arrow here, but seems reasonable. Is this for Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3 seems right. On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon wrote: > > Hi all, > > We really need to upgrade the minimal version soon. It's actually slowing >

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread shane knapp
thanks for the heads up... i'll test deploy this tomorrow and see what gotchas turn up. we may need to upgrade from python 3.4 to 3.5 IIRC. On Mon, Mar 25, 2019 at 6:16 PM Hyukjin Kwon wrote: > Hi all, > > We really need to upgrade the minimal version soon. It's actually slowing > down the

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
I remain unconvinced that a default configuration at the application level makes sense even in that case. There may be some applications where you know a priori that almost all the tasks for all the stages for all the jobs will need some fixed number of gpus; but I think the more common cases will

Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Hyukjin Kwon
Hi all, We really need to upgrade the minimal version soon. It's actually slowing down the PySpark dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some bugs

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
That's all merged now. I think you're clear to start an RC. On Mon, Mar 25, 2019 at 4:06 PM DB Tsai wrote: > > I am going to cut a 2.4.1 rc9 soon tonight. Besides SPARK-26961 > https://github.com/apache/spark/pull/24126 , anything critical that we > have to wait for 2.4.1 release? Thanks! > >

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
This last test failed again, but, I claim we've actually seen it pass: https://github.com/apache/spark/pull/24126#issuecomment-476410462 Would anybody else endorse merging it into 2.4 to proceed? I'll kick of one more test for good measure. On Mon, Mar 25, 2019 at 4:33 PM Sean Owen wrote: > >

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xiangrui Meng
Say if we support per-task resource requests in the future, it would be still inconvenient for users to declare the resource requirements for every single task/stage. So there must be some default values defined somewhere for task resource requirements. "spark.task.cpus" and

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Sean Owen
Don't wait on this, but, I was going to slip in a message in the 2.4.1 docs saying that Scala 2.11 support is deprecated, as it will be gone in Spark 3. I'll bang that out right now. Still waiting on a clean test build for that last JIRA, but maybe about to happen. On Mon, Mar 25, 2019 at 4:06 PM

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
I am going to cut a 2.4.1 rc9 soon tonight. Besides SPARK-26961 https://github.com/apache/spark/pull/24126 , anything critical that we have to wait for 2.4.1 release? Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID:

[DISCUSS] Spark Columnar Processing

2019-03-25 Thread Bobby Evans
This thread is to discuss adding in support for data frame processing using an in-memory columnar format compatible with Apache Arrow. My main goal in this is to lay the groundwork so we can add in support for GPU accelerated processing of data frames, but this feature has a number of other

Re: Scala 2.11 support removed for Spark 3.0.0

2019-03-25 Thread Darcy Shen
Cool, Scala 2.12 compiles faster than Scala 2.11 . But it runs slower than Scala 2.11 by default. We may enable some compiler optimization options. On Mon, 25 Mar 2019 23:53:18 +0800 Sean Owen wrote I merged https://github.com/apache/spark/pull/23098

Scala 2.11 support removed for Spark 3.0.0

2019-03-25 Thread Sean Owen
I merged https://github.com/apache/spark/pull/23098 . "-Pscala-2.11" won't work anymore in master. I think this shouldn't be a surprise or disruptive as 2.12 is already the default. The change isn't big and I think pretty reliable, but keep an eye out for issues. Shane you are welcome to remove

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Mark Hamstra
Of course there is an issue of the perfect becoming the enemy of the good, so I can understand the impulse to get something done. I am left wanting, however, at least something more of a roadmap to a task-level future than just a vague "we may choose to do something more in the future." At the

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Tom Graves
+1 on the updated SPIP. Tom On Monday, March 18, 2019, 12:56:22 PM CDT, Xingbo Jiang wrote: Hi all, I updated the SPIP doc and stories, I hope it now contains clear scope of the changes and enough details for SPIP vote.Please review the updated docs, thanks! Xiangrui Meng