Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-05 Thread Dongjoon Hyun
GA > > -- > *From:* Holden Karau > *Sent:* Monday, June 29, 2020 9:33 AM > *To:* Maxim Gekk > *Cc:* Dongjoon Hyun; dev > *Subject:* Re: Apache Spark 3.1 Feature Expectation (Dec. 2020) > > Should we also consider the shuffle service refactoring

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-05 Thread Felix Cheung
I think pluggable storage in shuffle is essential for k8s GA From: Holden Karau Sent: Monday, June 29, 2020 9:33 AM To: Maxim Gekk Cc: Dongjoon Hyun; dev Subject: Re: Apache Spark 3.1 Feature Expectation (Dec. 2020) Should we also consider the shuffle service

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-07-01 Thread Gabor Somogyi
Hi Dongjoon, I would add JDBC Kerberos support w/ keytab: https://issues.apache.org/jira/browse/SPARK-12312 BR, G On Mon, Jun 29, 2020 at 6:07 PM Dongjoon Hyun wrote: > Hi, All. > > After a short celebration of Apache Spark 3.0, I'd like to ask you the > community opinion on Apache Spark 3.1

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-30 Thread Tom Graves
Stage Level Scheduling -  https://issues.apache.org/jira/browse/SPARK-27495 TomOn Monday, June 29, 2020, 11:07:18 AM CDT, Dongjoon Hyun wrote: Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations.

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread wuyi
This could be a sub-task of https://issues.apache.org/jira/browse/SPARK-25299 (Use remote storage for persisting shuffle data)? It's good if we could put the whole SPARK-25299 in Spark 3.1. Holden Karau wrote > Should we also consider the

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Jungtaek Lim
Does this count only "new features" (probably major), or also count "improvements"? I'm aware of a couple of improvements which should be ideally included in the next release, but if this counts only major new features then don't feel they should be listed. On Tue, Jun 30, 2020 at 1:32 AM Holden

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Holden Karau
Should we also consider the shuffle service refactoring to support pluggable storage engines as targeting the 3.1 release? On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk wrote: > Hi Dongjoon, > > I would add: > - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) > - Filters

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Maxim Gekk
Hi Dongjoon, I would add: - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) - Filters pushdown to other datasources like Avro - Support nested attributes of filters pushed down to JSON Maxim Gekk Software Engineer Databricks, Inc. On Mon, Jun 29, 2020 at 7:07 PM

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread JackyLee
Thank you for putting forward this. Can we put the support of view and partition catalog in version 3.1? AFAIT, these are great features in DSv2 and Catalog. With these, we can work well with warehouse, such as delta or hive. https://github.com/apache/spark/pull/28147

Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Dongjoon Hyun
Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations. First of all, Apache Spark 3.1 is scheduled for December 2020. - https://spark.apache.org/versioning-policy.html I'm expecting the following items: 1.