ASF board report for February

2019-02-09 Thread Matei Zaharia
It’s time to submit Spark's quarterly ASF board report on February 13th, so I wanted to run the report by everyone to make sure we’re not missing something. Let me know whether I missed anything: Apache Spark is a fast and general engine for large-scale data processing. It

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-09 Thread John Zhuge
Not me. I am running zulu8, maven, and hadoop-2.7. On Sat, Feb 9, 2019 at 5:42 PM Felix Cheung wrote: > One test in SparkSubmitSuite is consistently failing for me. Anyone seeing > that? > > > -- > *From:* Takeshi Yamamuro > *Sent:* Saturday, February 9, 2019 5:25

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-09 Thread Felix Cheung
One test in SparkSubmitSuite is consistently failing for me. Anyone seeing that? From: Takeshi Yamamuro Sent: Saturday, February 9, 2019 5:25 AM To: Spark dev list Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2) Sorry, but I forgot to check `

Re: Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Shivaram Venkataraman
Those speedups look awesome! Great work Hyukjin! Thanks Shivaram On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon wrote: > > Guys, as continuation of Arrow optimization for R DataFrame to Spark > DataFrame, > > I am trying to make a vectorized gapply[Collect] implementation as an > experiment like

Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-09 Thread Sean Owen
If many people find the current behavior OK, then honestly just don't make this change. It's been there a while and the logs are available for anyone who wants to browse through YARN. While I think the change is fine, I can't see it being worth a flag to toggle between two pretty trivially

Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Hyukjin Kwon
Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame, I am trying to make a vectorized gapply[Collect] implementation as an experiment like vectorized Pandas UDFs It brought 820%+ performance improvement. See https://github.com/apache/spark/pull/23746 Please come and

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-09 Thread Takeshi Yamamuro
Sorry, but I forgot to check ` -Pdocker-integration-tests` for the JDBC integration tests. I run these tests, and then I checked if they are passed. On Sat, Feb 9, 2019 at 5:26 PM Herman van Hovell wrote: > I count 2 binding votes :)... > > Op vr 8 feb. 2019 om 22:36 schreef Felix Cheung > >