Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Dongjoon Hyun
+1

Thank you!

Dongjoon

On Tue, Jan 3, 2023 at 9:44 PM Rui Wang  wrote:

> +1 to cut the branch starting from a workday!
>
> Great to see this is happening!
>
> Thanks Xinrong!
>
> -Rui
>
> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
> wrote:
>
>> +1, thank you Xinrong for driving this release!
>>
>> --
>> Ruifeng Zheng
>> ruife...@foxmail.com
>>
>> 
>>
>>
>>
>> -- Original --
>> *From:* "Hyukjin Kwon" ;
>> *Date:* Wed, Jan 4, 2023 01:15 PM
>> *To:* "Xinrong Meng";
>> *Cc:* "dev";
>> *Subject:* Re: Time for Spark 3.4.0 release?
>>
>> SGTM +1
>>
>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
>> wrote:
>>
>>> Hi All,
>>>
>>> Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January
>>> 15th per
>>> https://spark.apache.org/versioning-policy.html, but I would suggest we
>>> postpone one day since January 15th is a Sunday.
>>>
>>> I would like to volunteer as the release manager for *Apache Spark
>>> 3.4.0*.
>>>
>>> Thanks,
>>>
>>> Xinrong Meng
>>>
>>>


Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Rui Wang
+1 to cut the branch starting from a workday!

Great to see this is happening!

Thanks Xinrong!

-Rui

On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
wrote:

> +1, thank you Xinrong for driving this release!
>
> --
> Ruifeng Zheng
> ruife...@foxmail.com
>
> 
>
>
>
> -- Original --
> *From:* "Hyukjin Kwon" ;
> *Date:* Wed, Jan 4, 2023 01:15 PM
> *To:* "Xinrong Meng";
> *Cc:* "dev";
> *Subject:* Re: Time for Spark 3.4.0 release?
>
> SGTM +1
>
> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
> wrote:
>
>> Hi All,
>>
>> Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January
>> 15th per
>> https://spark.apache.org/versioning-policy.html, but I would suggest we
>> postpone one day since January 15th is a Sunday.
>>
>> I would like to volunteer as the release manager for *Apache Spark 3.4.0*
>> .
>>
>> Thanks,
>>
>> Xinrong Meng
>>
>>


Re: Time for Spark 3.4.0 release?

2023-01-03 Thread 416161...@qq.com
+1, thank you Xinrong fordriving this release!




Ruifeng Zheng
ruife...@foxmail.com








-- Original --
From:   
 "Hyukjin Kwon" 
   
https://spark.apache.org/versioning-policy.html, but I would suggest we 
postpone one day since January 15th is a Sunday.


I would like to volunteer as the release manager for Apache Spark 3.4.0.


Thanks,


Xinrong Meng

Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Yang,Jie(INF)
+1 for me

YangJie

发件人: Hyukjin Kwon 
日期: 2023年1月4日 星期三 13:16
收件人: Xinrong Meng 
抄送: dev 
主题: Re: Time for Spark 3.4.0 release?

SGTM +1

On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
mailto:xinrong.apa...@gmail.com>> wrote:
Hi All,

Shall we cut branch-3.4 on January 16th, 2023? We proposed January 15th per
https://spark.apache.org/versioning-policy.html,
 but I would suggest we postpone one day since January 15th is a Sunday.

I would like to volunteer as the release manager for Apache Spark 3.4.0.

Thanks,

Xinrong Meng



Re: Time for Spark 3.4.0 release?

2023-01-03 Thread Hyukjin Kwon
SGTM +1

On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng 
wrote:

> Hi All,
>
> Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January
> 15th per
> https://spark.apache.org/versioning-policy.html, but I would suggest we
> postpone one day since January 15th is a Sunday.
>
> I would like to volunteer as the release manager for *Apache Spark 3.4.0*.
>
> Thanks,
>
> Xinrong Meng
>
>


Time for Spark 3.4.0 release?

2023-01-03 Thread Xinrong Meng
Hi All,

Shall we cut *branch-3.4* on *January 16th, 2023*? We proposed January 15th
per
https://spark.apache.org/versioning-policy.html, but I would suggest we
postpone one day since January 15th is a Sunday.

I would like to volunteer as the release manager for *Apache Spark 3.4.0*.

Thanks,

Xinrong Meng


Could we reorder the second aggregate node and the expand node when rewriting multiple distinct

2023-01-03 Thread 万昆
Hello, 
  Spark sql rule RewriteDistinctAggregates will rewrite multiple distinct 
expressions into two Aggregate nodes and a expand node.
The follow is the example in the class documentation, I wander if we can 
reorder the second Aggregate node and the expand node and make the expand 
generate fewer records?
Thanks


Second example: aggregate function without distinct and with filter clauses (in 
sql):
   SELECT
 COUNT(DISTINCT cat1)as cat1_cnt,
 COUNT(DISTINCT cat2)as cat2_cnt,
 SUM(value) FILTER (WHERE id >1)AS total
  FROM
data
  GROUPBY
key

This translates to the following (pseudo) logical plan:

 Aggregate(
key = ['key]
functions = [COUNT(DISTINCT 'cat1),
 COUNT(DISTINCT 'cat2),
 sum('value) with FILTER('id > 1)]
output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
   LocalTableScan [...]

This rule rewrites this logical plan to the following (pseudo) logical plan:

 Aggregate(
key = ['key]
functions = [count(if (('gid = 1)) 'cat1 else null),
 count(if (('gid = 2)) 'cat2 else null),
 first(if (('gid = 0)) 'total else null) ignore nulls]
output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
   Aggregate(
  key = ['key, 'cat1, 'cat2, 'gid]
  functions = [sum('value) with FILTER('id > 1)]
  output = ['key, 'cat1, 'cat2, 'gid, 'total])
 Expand(
projections = [('key, null, null, 0, cast('value as bigint), 'id),
   ('key, 'cat1, null, 1, null, null),
   ('key, null, 'cat2, 2, null, null)]
output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
   LocalTableScan [...]

Could we rewrite this logical plan to :

 Aggregate(
key = ['key]
functions = [count(if (('gid = 1)) 'cat1 else null),
 count(if (('gid = 2)) 'cat2 else null),
 first(if (('gid = 0)) 'total else null) ignore nulls]
output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
   Expand(
 projections = [('key, 'total, null, null, 0, cast('value as bigint)),
('key, 'total, 'cat1, null, 1, null),
('key, 'total, null, 'cat2, 2, null)]
 output = ['key, 'total, 'cat1, 'cat2, 'gid, 'value])
  Aggregate(
 key = ['key, 'cat1, 'cat2]
 functions = [sum('value) with FILTER('id > 1)]
 output = ['key, 'cat1, 'cat2, 'total])
   LocalTableScan [...]