Re: Spark with Hadoop 3.3 distribution release in Download page

2021-01-19 Thread Chao Sun
Hi Gabriel, The distribution won’t be available for download until Spark supports Hadoop 3.3.x. At the moment, Spark cannot use Hadoop 3.3.0 because of various issues. Our best bet is to wait until Hadoop 3.3.1 comes out. Best, Chao On Tue, Jan 19, 2021 at 8:00 PM Gabriel Magno wrote: >

Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread Chao Sun
Thanks Huaxin for driving the release! On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng wrote: > It's Great! > Congrats and thanks, huaxin! > > > -- 原始邮件 -- > *发件人:* "huaxin gao" ; > *发送时间:* 2022年1月29日(星期六) 上午9:07 > *收件人:* "dev";"user"; > *主题:* [ANNOUNCE] Apache

Re: Spark 3.4.1 and Hive 3.1.3

2023-09-07 Thread Chao Sun
Hi Sanket, Spark 3.4.1 currently only works with Hive 2.3.9, and it would require a lot of work to upgrade the Hive version to 3.x and up. Normally though, you only need the Hive client in Spark to talk to HiveMetastore (HMS) for things like table or partition metadata information. In this case,

Re: Spark 3.3.0/3.2.2: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15

2022-09-01 Thread Chao Sun
Hi Fengyu, Do you still have the Parquet file that caused the error? could you open a JIRA and attach the file to it? I can take a look. Chao On Thu, Sep 1, 2022 at 4:03 AM FengYu Cao wrote: > > I'm trying to upgrade our spark (3.2.1 now) > > but with spark 3.3.0 and spark 3.2.2, we had error

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Chao Sun
Congrats everyone! and thanks Yuming for driving the release! On Wed, Oct 26, 2022 at 7:37 AM beliefer wrote: > > Congratulations everyone have contributed to this release. > > > At 2022-10-26 14:21:36, "Yuming Wang" wrote: > > We are happy to announce the availability of Apache Spark 3.3.1! >

[ANNOUNCE] Apache Spark 3.2.3 released

2022-11-29 Thread Chao Sun
We are happy to announce the availability of Apache Spark 3.2.3! Spark 3.2.3 is a maintenance release containing stability fixes. This release is based on the branch-3.2 maintenance branch of Spark. We strongly recommend all 3.2 users to upgrade to this stable release. To download Spark 3.2.3,

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
one! >> >> Yufei >> >> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: >>> >>> Hi all, >>> >>> We are very happy to announce that Project Comet, a plugin to >>> accelerate Spark query execution via leveraging DataFusion an

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
> wanted to try out some realtime aggregate performance on top of parquet and > spark dataframes. > > Thanks and Regards > Praveen > > > On Wed, Feb 14, 2024 at 9:20 AM Chao Sun wrote: > >> > Out of interest what are the differences in the approach between this &g

Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Chao Sun
Hi all, We are very happy to announce that Project Comet, a plugin to accelerate Spark query execution via leveraging DataFusion and Arrow, has now been open sourced under the Apache Arrow umbrella. Please check the project repo https://github.com/apache/arrow-datafusion-comet for more details if

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
*Disclaimer:* The information provided is correct to the best of my > knowledge, sourced from both personal expertise and other resources but of > course cannot be guaranteed . It is essential to note that, as with any > advice, one verified and tested result holds more weight than a thousand