Hi all, I went over all the finished JIRA tickets targeted to Spark 3.0.0, here I'm listing all the notable features and major changes that are ready to test/deliver, please don't hesitate to add more to the list:
SPARK-11215 <https://issues.apache.org/jira/browse/SPARK-11215> Multiple columns support added to various Transformers: StringIndexer SPARK-11150 <https://issues.apache.org/jira/browse/SPARK-11150> Implement Dynamic Partition Pruning SPARK-13677 <https://issues.apache.org/jira/browse/SPARK-13677> Support Tree-Based Feature Transformation SPARK-16692 <https://issues.apache.org/jira/browse/SPARK-16692> Add MultilabelClassificationEvaluator SPARK-19591 <https://issues.apache.org/jira/browse/SPARK-19591> Add sample weights to decision trees SPARK-19712 <https://issues.apache.org/jira/browse/SPARK-19712> Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc. SPARK-19827 <https://issues.apache.org/jira/browse/SPARK-19827> R API for Power Iteration Clustering SPARK-20286 <https://issues.apache.org/jira/browse/SPARK-20286> Improve logic for timing out executors in dynamic allocation SPARK-20636 <https://issues.apache.org/jira/browse/SPARK-20636> Eliminate unnecessary shuffle with adjacent Window expressions SPARK-22148 <https://issues.apache.org/jira/browse/SPARK-22148> Acquire new executors to avoid hang because of blacklisting SPARK-22796 <https://issues.apache.org/jira/browse/SPARK-22796> Multiple columns support added to various Transformers: PySpark QuantileDiscretizer SPARK-23128 <https://issues.apache.org/jira/browse/SPARK-23128> A new approach to do adaptive execution in Spark SQL SPARK-23674 <https://issues.apache.org/jira/browse/SPARK-23674> Add Spark ML Listener for Tracking ML Pipeline Status SPARK-23710 <https://issues.apache.org/jira/browse/SPARK-23710> Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 SPARK-24333 <https://issues.apache.org/jira/browse/SPARK-24333> Add fit with validation set to Gradient Boosted Trees: Python API SPARK-24417 <https://issues.apache.org/jira/browse/SPARK-24417> Build and Run Spark on JDK11 SPARK-24615 <https://issues.apache.org/jira/browse/SPARK-24615> Accelerator-aware task scheduling for Spark SPARK-24920 <https://issues.apache.org/jira/browse/SPARK-24920> Allow sharing Netty's memory pool allocators SPARK-25250 <https://issues.apache.org/jira/browse/SPARK-25250> Fix race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple times SPARK-25341 <https://issues.apache.org/jira/browse/SPARK-25341> Support rolling back a shuffle map stage and re-generate the shuffle files SPARK-25348 <https://issues.apache.org/jira/browse/SPARK-25348> Data source for binary files SPARK-25603 <https://issues.apache.org/jira/browse/SPARK-25603> Generalize Nested Column Pruning SPARK-26132 <https://issues.apache.org/jira/browse/SPARK-26132> Remove support for Scala 2.11 in Spark 3.0.0 SPARK-26215 <https://issues.apache.org/jira/browse/SPARK-26215> define reserved keywords after SQL standard SPARK-26412 <https://issues.apache.org/jira/browse/SPARK-26412> Allow Pandas UDF to take an iterator of pd.DataFrames SPARK-26785 <https://issues.apache.org/jira/browse/SPARK-26785> data source v2 API refactor: streaming write SPARK-26956 <https://issues.apache.org/jira/browse/SPARK-26956> remove streaming output mode from data source v2 APIs SPARK-27064 <https://issues.apache.org/jira/browse/SPARK-27064> create StreamingWrite at the beginning of streaming execution SPARK-27119 <https://issues.apache.org/jira/browse/SPARK-27119> Do not infer schema when reading Hive serde table with native data source SPARK-27225 <https://issues.apache.org/jira/browse/SPARK-27225> Implement join strategy hints SPARK-27240 <https://issues.apache.org/jira/browse/SPARK-27240> Use pandas DataFrame for struct type argument in Scalar Pandas UDF SPARK-27338 <https://issues.apache.org/jira/browse/SPARK-27338> Fix deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator SPARK-27396 <https://issues.apache.org/jira/browse/SPARK-27396> Public APIs for extended Columnar Processing Support SPARK-27589 <https://issues.apache.org/jira/browse/SPARK-27589> Re-implement file sources with data source V2 API SPARK-27677 <https://issues.apache.org/jira/browse/SPARK-27677> Disk-persisted RDD blocks served by shuffle service, and ignored for Dynamic Allocation SPARK-27699 <https://issues.apache.org/jira/browse/SPARK-27699> Partially push down disjunctive predicated in Parquet/ORC SPARK-27763 <https://issues.apache.org/jira/browse/SPARK-27763> Port test cases from PostgreSQL to Spark SQL (ongoing) SPARK-27884 <https://issues.apache.org/jira/browse/SPARK-27884> Deprecate Python 2 support SPARK-27921 <https://issues.apache.org/jira/browse/SPARK-27921> Convert applicable *.sql tests into UDF integrated test base SPARK-27963 <https://issues.apache.org/jira/browse/SPARK-27963> Allow dynamic allocation without an external shuffle service SPARK-28177 <https://issues.apache.org/jira/browse/SPARK-28177> Adjust post shuffle partition number in adaptive execution SPARK-28372 <https://issues.apache.org/jira/browse/SPARK-28372> Document Spark WEB UI SPARK-28399 <https://issues.apache.org/jira/browse/SPARK-28399> RobustScaler feature transformer SPARK-28426 <https://issues.apache.org/jira/browse/SPARK-28426> Metadata Handling in Thrift Server SPARK-28588 <https://issues.apache.org/jira/browse/SPARK-28588> Build a SQL reference doc (ongoing) SPARK-28608 <https://issues.apache.org/jira/browse/SPARK-28608> Improve test coverage of ThriftServer SPARK-28753 <https://issues.apache.org/jira/browse/SPARK-28753> Dynamically reuse subqueries in AQE SPARK-28855 <https://issues.apache.org/jira/browse/SPARK-28855> Remove outdated Experimental, Evolving annotations SPARK-25908 <https://issues.apache.org/jira/browse/SPARK-25908> SPARK-28980 <https://issues.apache.org/jira/browse/SPARK-28980> Remove deprecated items since <= 2.2.0 Cheers, Xingbo