Re: Apache Spark 2.2.3 ?

2019-01-02 Thread Felix Cheung
+1 on 2.2.3 of course From: Dongjoon Hyun Sent: Wednesday, January 2, 2019 12:21 PM To: Saisai Shao Cc: Xiao Li; Felix Cheung; Sean Owen; dev Subject: Re: Apache Spark 2.2.3 ? Thank you for swift feedbacks and Happy New Year. :) For 2.2.3 release on next week,

Re: Ask for reviewing on Structured Streaming PRs

2019-01-02 Thread Jungtaek Lim
Spark devs, happy new year! I would like to remind this kindly, since there was actually no review after initiating the thread. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 12월 12일 (수) 오후 11:12, Vaclav Kosar 님이 작성: > I am also waiting for any finalization of my PR [3]. I seems that SS PRs > are

Re: Apache Spark 2.2.3 ?

2019-01-02 Thread Dongjoon Hyun
Thank you for swift feedbacks and Happy New Year. :) For 2.2.3 release on next week, I see two positive opinions (including mine) and don't see any direct objections. Apache Spark has a mature, resourceful, and fast-growing community. One of the important characteristic of the mature community is

Re: Spark-optimized Shuffle (SOS) any update?

2019-01-02 Thread marek-simunek
Hi, thanks for reply. I finally got time and glanced through the design doc. It seems that it has nothing to do with the paper I mentioned. The paper is trying to solve the problem of I/O ops required for shuffle are growing quadratically with number of tasks (shuffle files), therefore we

Re: Trigger full GC during executor idle time?

2019-01-02 Thread Mark Hamstra
Without addressing whether the change is beneficial or not, I will note that the logic in the paper and the PR's description is incorrect: "During execution, some executor nodes finish the tasks assigned to them early and wait for the entire stage to complete before more tasks are assigned to

Re: proposal for expanded & consistent timestamp types

2019-01-02 Thread Steve Loughran
OK, I've seen the document now. Probably the best summary of timestamps out there I've ever seen. Irrespective of what historical stuff has done, the goal should be "make everything consistent enough that cut and paste SQL queries over the same data works" and "you shouldn't have to care about

Re: proposal for expanded & consistent timestamp types

2019-01-02 Thread Steve Loughran
On 17 Dec 2018, at 17:44, Zoltan Ivanfi mailto:z...@cloudera.com.INVALID>> wrote: Hi, On Sun, Dec 16, 2018 at 4:43 AM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Shall we include Parquet and ORC? If they don't support it, it's hard for general query engines like Spark to support it.