Just for an update, I will send a discussion email about my idea late this week or early next week.
2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cloud0...@gmail.com>님이 작성: > There are many projects going on right now, such as new DS v2 APIs, ANSI > interval types, join improvement, disaggregated shuffle, etc. I don't > think it's realistic to do the branch cut in April. > > I'm +1 to release 3.2 around July, but it doesn't mean we have to cut the > branch 3 months earlier. We should make the release process faster and cut > the branch around June probably. > > > > On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <gatorsm...@gmail.com> wrote: > >> Below are some nice-to-have features we can work on in Spark 3.2: Lateral >> Join support <https://issues.apache.org/jira/browse/SPARK-28379>, >> interval data type, timestamp without time zone, un-nesting arbitrary >> queries, the returned metrics of DSV2, and error message standardization. >> Spark 3.2 will be another exciting release I believe! >> >> Go Spark! >> >> Xiao >> >> >> >> >> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2021年3月10日周三 下午12:25写道: >> >>> Hi, Xiao. >>> >>> This thread started 13 days ago. Since you asked the community about >>> major features or timelines at that time, could you share your roadmap or >>> expectations if you have something in your mind? >>> >>> > Thank you, Dongjoon, for initiating this discussion. Let us keep it >>> open. It might take 1-2 weeks to collect from the community all the >>> features we plan to build and ship in 3.2 since we just finished the 3.1 >>> voting. >>> > TBH, cutting the branch this April does not look good to me. That >>> means, we only have one month left for feature development of Spark 3.2. Do >>> we have enough features in the current master branch? If not, are we able >>> to finish major features we collected here? Do they have a timeline or >>> project plan? >>> >>> Bests, >>> Dongjoon. >>> >>> >>> >>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Hi, John. >>>> >>>> This thread aims to share your expectations and goals (and maybe work >>>> progress) to Apache Spark 3.2 because we are making this together. :) >>>> >>>> Bests, >>>> Dongjoon. >>>> >>>> >>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jzh...@apache.org> wrote: >>>> >>>>> Hi Dongjoon, >>>>> >>>>> Is it possible to get ViewCatalog in? The community already had fairly >>>>> detailed discussions. >>>>> >>>>> Thanks, >>>>> John >>>>> >>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, All. >>>>>> >>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch >>>>>> since December 2020, March seems to be a good time to share our thoughts >>>>>> and aspirations on Apache Spark 3.2. >>>>>> >>>>>> According to the progress on Apache Spark 3.1 release, Apache Spark >>>>>> 3.2 seems to be the last minor release of this year. Given the timeframe, >>>>>> we might consider the following. (This is a small set. Please add your >>>>>> thoughts to this limited list.) >>>>>> >>>>>> # Languages >>>>>> >>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but >>>>>> slipped out. Currently, we are trying to use Scala 2.13.5 via SPARK-34505 >>>>>> and investigating the publishing issue. Thank you for your contributions >>>>>> and feedback on this. >>>>>> >>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017. >>>>>> Like Java 11, we need lots of support from our dependencies. Let's see. >>>>>> >>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at >>>>>> 2021-12-23. So, the deprecation is not required yet, but we had better >>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022. >>>>>> >>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far. >>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN >>>>>> publishing. >>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I believe >>>>>> we had better drop it from the releasing work item list officially. >>>>>> >>>>>> # Dependencies >>>>>> >>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop >>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on >>>>>> Hadoop >>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going >>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2 >>>>>> timeframe and we can move toward Hadoop 3.3.2. >>>>>> >>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default >>>>>> instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile >>>>>> completely >>>>>> via SPARK-32981 and replaced the generated hive-service-rpc code with the >>>>>> official dependency via SPARK-32981. We are steadily improving this area >>>>>> and will consume Hive 2.3.9 if available. >>>>>> >>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades K8s >>>>>> client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order to >>>>>> support K8s model 1.19. >>>>>> >>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using >>>>>> Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 with >>>>>> Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue. >>>>>> Since >>>>>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will go >>>>>> with Kafka Client 2.8 hopefully. >>>>>> >>>>>> # Some Features >>>>>> >>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with Apache >>>>>> Iceberg integration. Especially, we hope the on-going function catalog >>>>>> SPIP >>>>>> and up-coming storage partitioned join SPIP can be delivered as a part of >>>>>> Spark 3.2 and become an additional foundation. >>>>>> >>>>>> - Columnar Encryption: As of today, Apache Spark master branch >>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented via >>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar capability. >>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have this >>>>>> feature officially. Any feedback is welcome. >>>>>> >>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits for >>>>>> ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool support >>>>>> for all IO operations, 2) SPARK-33978 makes ORC datasource support ZSTD >>>>>> compression, 3) SPARK-34503 sets ZSTD as the default codec for event log >>>>>> compression, 4) SPARK-34479 aims to support ZSTD at Avro data source. >>>>>> Also, >>>>>> the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer pool), >>>>>> too. I'm expecting more benefits. >>>>>> >>>>>> - Structure Streaming with RocksDB backend: According to the latest >>>>>> update, it looks active enough for merging to master branch in Spark 3.2. >>>>>> >>>>>> Please share your thoughts and let's build better Apache Spark 3.2 >>>>>> together. >>>>>> >>>>>> Bests, >>>>>> Dongjoon. >>>>>> >>>>> >>>>> >>>>> -- >>>>> John Zhuge >>>>> >>>>