Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

Holden Karau Mon, 29 Jun 2020 09:33:56 -0700

Should we also consider the shuffle service refactoring to support
pluggable storage engines as targeting the 3.1 release?


On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk <maxim.g...@databricks.com>
wrote:

> Hi Dongjoon,
>
> I would add:
> - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366)
> - Filters pushdown to other datasources like Avro
> - Support nested attributes of filters pushed down to JSON
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Hi, All.
>>
>> After a short celebration of Apache Spark 3.0, I'd like to ask you the
>> community opinion on Apache Spark 3.1 feature expectations.
>>
>> First of all, Apache Spark 3.1 is scheduled for December 2020.
>> - https://spark.apache.org/versioning-policy.html
>>
>> I'm expecting the following items:
>>
>> 1. Support Scala 2.13
>> 2. Use Apache Hadoop 3.2 by default for better cloud support
>> 3. Declaring Kubernetes Scheduler GA
>>     In my perspective, the last main missing piece was Dynamic allocation
>> and
>>     - Dynamic allocation with shuffle tracking is already shipped at 3.0.
>>     - Dynamic allocation with worker decommission/data migration is
>> targeting 3.1. (Thanks, Holden)
>> 4. DSv2 Stabilization
>>
>> I'm aware of some more features which are on the way currently, but I
>> love to hear the opinions from the main developers and more over the main
>> users who need those features.
>>
>> Thank you in advance. Welcome for any comments.
>>
>> Bests,
>> Dongjoon.
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

Reply via email to