Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Yes, I agree with you for your whitelist approach for backporting. :)
Thank you for summarizing.

Thanks,
Dongjoon.


On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:

> I think I finally got your point. What you want to keep unchanged is the
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
>
> My major concern is whether we should keep merging the feature work or the
> dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
>
>
>
> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>
>> That is not totally fine, Xiao. It sounds like you are asking a change of
>> plan without a proper reason.
>>
>> Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>>
>> Please let the community start to ramp down as we agreed before.
>>
>> Dongjoon
>>
>>
>>
>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>>
>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>> the branch, we should avoid merging the feature work. In the next three
>>> days, let us collect the actively developed PRs that we want to make an
>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>>
 Xiao. You are working against what you are saying.
 If you don't cut a branch, it means you are allowing all patches to
 land Apache Spark 3.3. No?

 > we need to avoid backporting the feature work that are not being well
 discussed.



 On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:

> Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a 
>> branch-3.3
>> and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>> branch together. This situation only becomes worse and worse because 
>> there
>> is no way to block the other patches from landing unintentionally if we
>> don't cut a branch.
>>
>> [SPARK-38335][SQL] Implement parser support for DEFAULT column
>> values
>>
>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>
>> Best,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>
>>> Cool, thanks for clarifying!
>>>
>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
>>> wrote:
>>> >>
>>> >> For the following list:
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >
>>> >
>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>> >
>>> >
>>> >
>>> > Chao Sun  于2022年3月15日周二 10:06写道:
>>> >>
>>> >> Hi Xiao,
>>> >>
>>> >> For the following list:
>>> >>
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >>
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >>
>>> >> Thanks,
>>> >> Chao
>>> >>
>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >> >
>>> >> > The following was tested and merged a few minutes ago. So, we
>>> can remove it from the list.
>>> >> >
>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>>> wrote:
>>> >> >>
>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>> days to collect the list of actively 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
I think I finally got your point. What you want to keep unchanged is the
branch cut date of Spark 3.3. Today? or this Friday? This is not a big
deal.

My major concern is whether we should keep merging the feature work or the
dependency upgrade after the branch cut. To make our release time more
predictable, I am suggesting we should finalize the exception PR list
first, instead of merging them in an ad hoc way. In the past, we spent a
lot of time on the revert of the PRs that were merged after the branch cut.
I hope we can minimize unnecessary arguments in this release. Do you agree,
Dongjoon?



Dongjoon Hyun  于2022年3月15日周二 15:55写道:

> That is not totally fine, Xiao. It sounds like you are asking a change of
> plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can collect
> the list and make a list of exceptions. I'm not blocking what you want to
> do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>
>> Please do not get me wrong. If we don't cut a branch, we are allowing all
>> patches to land Apache Spark 3.3. That is totally fine. After we cut the
>> branch, we should avoid merging the feature work. In the next three days,
>> let us collect the actively developed PRs that we want to make an exception
>> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>>
>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>
>>> Xiao. You are working against what you are saying.
>>> If you don't cut a branch, it means you are allowing all patches to land
>>> Apache Spark 3.3. No?
>>>
>>> > we need to avoid backporting the feature work that are not being well
>>> discussed.
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>>>
 Cutting the branch is simple, but we need to avoid backporting the
 feature work that are not being well discussed. Not all the members are
 actively following the dev list. I think we should wait 3 more days for
 collecting the PR list before cutting the branch.

 BTW, there are very few 3.4-only feature work that will be affected.

 Xiao

 Dongjoon Hyun  于2022年3月15日周二 11:49写道:

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column
> values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>
>> Cool, thanks for clarifying!
>>
>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
>> wrote:
>> >>
>> >> For the following list:
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >
>> >
>> > If possible, I hope these features can be shipped with Spark 3.3.
>> >
>> >
>> >
>> > Chao Sun  于2022年3月15日周二 10:06写道:
>> >>
>> >> Hi Xiao,
>> >>
>> >> For the following list:
>> >>
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >>
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >>
>> >> Thanks,
>> >> Chao
>> >>
>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> >> >
>> >> > The following was tested and merged a few minutes ago. So, we
>> can remove it from the list.
>> >> >
>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>> wrote:
>> >> >>
>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>> days to collect the list of actively developed PRs that we want to merge 
>> to
>> 3.3 after the branch cut?
>> >> >>
>> >> >> Please do not rush to merge the PRs that are not fully
>> reviewed. We can cut the branch this Friday and continue merging the PRs
>> that have been discussed in this thread. Does that make sense?
>> >> >>
>> >> 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
That is not totally fine, Xiao. It sounds like you are asking a change of
plan without a proper reason.

Although we cut the branch Today according our plan, you still can collect
the list and make a list of exceptions. I'm not blocking what you want to
do.

Please let the community start to ramp down as we agreed before.

Dongjoon



On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:

> Please do not get me wrong. If we don't cut a branch, we are allowing all
> patches to land Apache Spark 3.3. That is totally fine. After we cut the
> branch, we should avoid merging the feature work. In the next three days,
> let us collect the actively developed PRs that we want to make an exception
> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>
> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>
>> Xiao. You are working against what you are saying.
>> If you don't cut a branch, it means you are allowing all patches to land
>> Apache Spark 3.3. No?
>>
>> > we need to avoid backporting the feature work that are not being well
>> discussed.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>>
>>> Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>>
>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>>
 Hi, Max, Chao, Xiao, Holden and all.

 I have a different idea.

 Given the situation and small patch list, I don't think we need to
 postpone the branch cut for those patches. It's easier to cut a branch-3.3
 and allow backporting.

 As of today, we already have an obvious Apache Spark 3.4 patch in the
 branch together. This situation only becomes worse and worse because there
 is no way to block the other patches from landing unintentionally if we
 don't cut a branch.

 [SPARK-38335][SQL] Implement parser support for DEFAULT column
 values

 Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

 Best,
 Dongjoon.


 On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun  于2022年3月15日周二 10:06写道:
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
> wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
> days to collect the list of actively developed PRs that we want to merge 
> to
> 3.3 after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed.
> We can cut the branch this Friday and continue merging the PRs that have
> been discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone
> a bit of breathing space? Rushed software development more often results 
> in
> bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
> yikunk...@gmail.com> wrote:
> >> 
> >>  > To make our release time more predictable, let us collect
> the PRs and wait three more days before the branch cut?
> >> 
> >>  For SPIP: Support Customized Kubernetes Schedulers:
> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> 
> >>  Three more days are OK for this from my view.
> >> 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Please do not get me wrong. If we don't cut a branch, we are allowing all
patches to land Apache Spark 3.3. That is totally fine. After we cut the
branch, we should avoid merging the feature work. In the next three days,
let us collect the actively developed PRs that we want to make an exception
(i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?

Dongjoon Hyun  于2022年3月15日周二 14:54写道:

> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to land
> Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being well
> discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>
>> Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>>
>> BTW, there are very few 3.4-only feature work that will be affected.
>>
>> Xiao
>>
>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>
>>> Hi, Max, Chao, Xiao, Holden and all.
>>>
>>> I have a different idea.
>>>
>>> Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>>
>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>> branch together. This situation only becomes worse and worse because there
>>> is no way to block the other patches from landing unintentionally if we
>>> don't cut a branch.
>>>
>>> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>>
>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>>
 Cool, thanks for clarifying!

 On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
 >>
 >> For the following list:
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >> Do you mean we should include them, or exclude them from 3.3?
 >
 >
 > If possible, I hope these features can be shipped with Spark 3.3.
 >
 >
 >
 > Chao Sun  于2022年3月15日周二 10:06写道:
 >>
 >> Hi Xiao,
 >>
 >> For the following list:
 >>
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >>
 >> Do you mean we should include them, or exclude them from 3.3?
 >>
 >> Thanks,
 >> Chao
 >>
 >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 >> >
 >> > The following was tested and merged a few minutes ago. So, we can
 remove it from the list.
 >> >
 >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
 >> >
 >> > Thanks,
 >> > Dongjoon.
 >> >
 >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
 wrote:
 >> >>
 >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
 to collect the list of actively developed PRs that we want to merge to 3.3
 after the branch cut?
 >> >>
 >> >> Please do not rush to merge the PRs that are not fully reviewed.
 We can cut the branch this Friday and continue merging the PRs that have
 been discussed in this thread. Does that make sense?
 >> >>
 >> >> Xiao
 >> >>
 >> >>
 >> >>
 >> >> Holden Karau  于2022年3月15日周二 09:10写道:
 >> >>>
 >> >>> May I suggest we push out one week (22nd) just to give everyone
 a bit of breathing space? Rushed software development more often results in
 bugs.
 >> >>>
 >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
 wrote:
 >> 
 >>  > To make our release time more predictable, let us collect the
 PRs and wait three more days before the branch cut?
 >> 
 >>  For SPIP: Support Customized Kubernetes Schedulers:
 >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
 >> 
 >>  Three more days are OK for this from my view.
 >> 
 >>  Regards,
 >>  Yikun
 >> >>>
 >> >>> --
 >> >>> Twitter: https://twitter.com/holdenkarau
 >> >>> Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Xiao. You are working against what you are saying.
If you don't cut a branch, it means you are allowing all patches to land
Apache Spark 3.3. No?

> we need to avoid backporting the feature work that are not being well
discussed.



On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:

> Cutting the branch is simple, but we need to avoid backporting the feature
> work that are not being well discussed. Not all the members are actively
> following the dev list. I think we should wait 3 more days for collecting
> the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>> branch together. This situation only becomes worse and worse because there
>> is no way to block the other patches from landing unintentionally if we
>> don't cut a branch.
>>
>> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>
>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>
>> Best,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>
>>> Cool, thanks for clarifying!
>>>
>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>>> >>
>>> >> For the following list:
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >
>>> >
>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>> >
>>> >
>>> >
>>> > Chao Sun  于2022年3月15日周二 10:06写道:
>>> >>
>>> >> Hi Xiao,
>>> >>
>>> >> For the following list:
>>> >>
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >>
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >>
>>> >> Thanks,
>>> >> Chao
>>> >>
>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >> >
>>> >> > The following was tested and merged a few minutes ago. So, we can
>>> remove it from the list.
>>> >> >
>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>>> wrote:
>>> >> >>
>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>>> to collect the list of actively developed PRs that we want to merge to 3.3
>>> after the branch cut?
>>> >> >>
>>> >> >> Please do not rush to merge the PRs that are not fully reviewed.
>>> We can cut the branch this Friday and continue merging the PRs that have
>>> been discussed in this thread. Does that make sense?
>>> >> >>
>>> >> >> Xiao
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
>>> >> >>>
>>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>>> bit of breathing space? Rushed software development more often results in
>>> bugs.
>>> >> >>>
>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
>>> wrote:
>>> >> 
>>> >>  > To make our release time more predictable, let us collect the
>>> PRs and wait three more days before the branch cut?
>>> >> 
>>> >>  For SPIP: Support Customized Kubernetes Schedulers:
>>> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> 
>>> >>  Three more days are OK for this from my view.
>>> >> 
>>> >>  Regards,
>>> >>  Yikun
>>> >> >>>
>>> >> >>> --
>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: Data correctness issue with Repartition + FetchFailure

2022-03-15 Thread Jason Xu
Hi Wenchen, thanks for the insight. Agree, the previous fix for repartition
works for deterministic data. With non-deterministic data, I didn't find an
API to pass DeterministicLevel to underlying rdd.
Do you plan to continue work on integration with SQL operators? If not, I'm
available to take a stab.

On Mon, Mar 14, 2022 at 7:00 PM Wenchen Fan  wrote:

> We fixed the repartition correctness bug before, by sorting the data
> before doing round-robin partitioning. But the issue is that we need to
> propagate the isDeterministic property through SQL operators.
>
> On Tue, Mar 15, 2022 at 1:50 AM Jason Xu  wrote:
>
>> Hi Reynold, do you suggest removing RoundRobinPartitioning in
>> repartition(numPartitions: Int) API implementation? If that's the direction
>> we're considering, before we have a new implementation, should we suggest
>> users avoid using the repartition(numPartitions: Int) API?
>>
>> On Sat, Mar 12, 2022 at 1:47 PM Reynold Xin  wrote:
>>
>>> This is why RoundRobinPartitioning shouldn't be used ...
>>>
>>>
>>> On Sat, Mar 12, 2022 at 12:08 PM, Jason Xu 
>>> wrote:
>>>
 Hi Spark community,

 I reported a data correctness issue in
 https://issues.apache.org/jira/browse/SPARK-38388. In short,
 non-deterministic data + Repartition + FetchFailure could result in
 incorrect data, this is an issue we run into in production pipelines, I
 have an example to reproduce the bug in the ticket.

 I report here to bring more attention, could you help confirm it's a
 bug and worth effort to further investigate and fix, thank you in advance
 for help!

 Thanks,
 Jason Xu

>>>
>>>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Cutting the branch is simple, but we need to avoid backporting the feature
work that are not being well discussed. Not all the members are actively
following the dev list. I think we should wait 3 more days for collecting
the PR list before cutting the branch.

BTW, there are very few 3.4-only feature work that will be affected.

Xiao

Dongjoon Hyun  于2022年3月15日周二 11:49写道:

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>
>> Cool, thanks for clarifying!
>>
>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>> >>
>> >> For the following list:
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >
>> >
>> > If possible, I hope these features can be shipped with Spark 3.3.
>> >
>> >
>> >
>> > Chao Sun  于2022年3月15日周二 10:06写道:
>> >>
>> >> Hi Xiao,
>> >>
>> >> For the following list:
>> >>
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >>
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >>
>> >> Thanks,
>> >> Chao
>> >>
>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
>> wrote:
>> >> >
>> >> > The following was tested and merged a few minutes ago. So, we can
>> remove it from the list.
>> >> >
>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>> wrote:
>> >> >>
>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>> to collect the list of actively developed PRs that we want to merge to 3.3
>> after the branch cut?
>> >> >>
>> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
>> can cut the branch this Friday and continue merging the PRs that have been
>> discussed in this thread. Does that make sense?
>> >> >>
>> >> >> Xiao
>> >> >>
>> >> >>
>> >> >>
>> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
>> >> >>>
>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>> bit of breathing space? Rushed software development more often results in
>> bugs.
>> >> >>>
>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
>> wrote:
>> >> 
>> >>  > To make our release time more predictable, let us collect the
>> PRs and wait three more days before the branch cut?
>> >> 
>> >>  For SPIP: Support Customized Kubernetes Schedulers:
>> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> 
>> >>  Three more days are OK for this from my view.
>> >> 
>> >>  Regards,
>> >>  Yikun
>> >> >>>
>> >> >>> --
>> >> >>> Twitter: https://twitter.com/holdenkarau
>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Hi, Max, Chao, Xiao, Holden and all.

I have a different idea.

Given the situation and small patch list, I don't think we need to postpone
the branch cut for those patches. It's easier to cut a branch-3.3 and allow
backporting.

As of today, we already have an obvious Apache Spark 3.4 patch in the
branch together. This situation only becomes worse and worse because there
is no way to block the other patches from landing unintentionally if we
don't cut a branch.

[SPARK-38335][SQL] Implement parser support for DEFAULT column values

Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

Best,
Dongjoon.


On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun  于2022年3月15日周二 10:06写道:
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
> can cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone a
> bit of breathing space? Rushed software development more often results in
> bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
> wrote:
> >> 
> >>  > To make our release time more predictable, let us collect the
> PRs and wait three more days before the branch cut?
> >> 
> >>  For SPIP: Support Customized Kubernetes Schedulers:
> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> 
> >>  Three more days are OK for this from my view.
> >> 
> >>  Regards,
> >>  Yikun
> >> >>>
> >> >>> --
> >> >>> Twitter: https://twitter.com/holdenkarau
> >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Cool, thanks for clarifying!

On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>>
>> For the following list:
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> Do you mean we should include them, or exclude them from 3.3?
>
>
> If possible, I hope these features can be shipped with Spark 3.3.
>
>
>
> Chao Sun  于2022年3月15日周二 10:06写道:
>>
>> Hi Xiao,
>>
>> For the following list:
>>
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>
>> Do you mean we should include them, or exclude them from 3.3?
>>
>> Thanks,
>> Chao
>>
>> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  
>> wrote:
>> >
>> > The following was tested and merged a few minutes ago. So, we can remove 
>> > it from the list.
>> >
>> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
>> >>
>> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to 
>> >> collect the list of actively developed PRs that we want to merge to 3.3 
>> >> after the branch cut?
>> >>
>> >> Please do not rush to merge the PRs that are not fully reviewed. We can 
>> >> cut the branch this Friday and continue merging the PRs that have been 
>> >> discussed in this thread. Does that make sense?
>> >>
>> >> Xiao
>> >>
>> >>
>> >>
>> >> Holden Karau  于2022年3月15日周二 09:10写道:
>> >>>
>> >>> May I suggest we push out one week (22nd) just to give everyone a bit of 
>> >>> breathing space? Rushed software development more often results in bugs.
>> >>>
>> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>> 
>>  > To make our release time more predictable, let us collect the PRs and 
>>  > wait three more days before the branch cut?
>> 
>>  For SPIP: Support Customized Kubernetes Schedulers:
>>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> 
>>  Three more days are OK for this from my view.
>> 
>>  Regards,
>>  Yikun
>> >>>
>> >>> --
>> >>> Twitter: https://twitter.com/holdenkarau
>> >>> Books (Learning Spark, High Performance Spark, etc.): 
>> >>> https://amzn.to/2MaRAG9
>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
>
> For the following list:
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> Do you mean we should include them, or exclude them from 3.3?


If possible, I hope these features can be shipped with Spark 3.3.



Chao Sun  于2022年3月15日周二 10:06写道:

> Hi Xiao,
>
> For the following list:
>
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>
> Do you mean we should include them, or exclude them from 3.3?
>
> Thanks,
> Chao
>
> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
> wrote:
> >
> > The following was tested and merged a few minutes ago. So, we can remove
> it from the list.
> >
> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
> >>
> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >>
> >> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >>
> >> Xiao
> >>
> >>
> >>
> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >>>
> >>> May I suggest we push out one week (22nd) just to give everyone a bit
> of breathing space? Rushed software development more often results in bugs.
> >>>
> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
> wrote:
> 
>  > To make our release time more predictable, let us collect the PRs
> and wait three more days before the branch cut?
> 
>  For SPIP: Support Customized Kubernetes Schedulers:
>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> 
>  Three more days are OK for this from my view.
> 
>  Regards,
>  Yikun
> >>>
> >>> --
> >>> Twitter: https://twitter.com/holdenkarau
> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Hi Xiao,

For the following list:

#35789 [SPARK-32268][SQL] Row-level Runtime Filtering
#34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
#35848 [SPARK-38548][SQL] New SQL function: try_sum

Do you mean we should include them, or exclude them from 3.3?

Thanks,
Chao

On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  wrote:
>
> The following was tested and merged a few minutes ago. So, we can remove it 
> from the list.
>
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>
> Thanks,
> Dongjoon.
>
> On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
>>
>> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect 
>> the list of actively developed PRs that we want to merge to 3.3 after the 
>> branch cut?
>>
>> Please do not rush to merge the PRs that are not fully reviewed. We can cut 
>> the branch this Friday and continue merging the PRs that have been discussed 
>> in this thread. Does that make sense?
>>
>> Xiao
>>
>>
>>
>> Holden Karau  于2022年3月15日周二 09:10写道:
>>>
>>> May I suggest we push out one week (22nd) just to give everyone a bit of 
>>> breathing space? Rushed software development more often results in bugs.
>>>
>>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:

 > To make our release time more predictable, let us collect the PRs and 
 > wait three more days before the branch cut?

 For SPIP: Support Customized Kubernetes Schedulers:
 #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1

 Three more days are OK for this from my view.

 Regards,
 Yikun
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.): 
>>> https://amzn.to/2MaRAG9
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
The following was tested and merged a few minutes ago. So, we can remove it
from the list.

#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1


Thanks,
Dongjoon.

On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:

> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
>
> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
>
> Xiao
>
>
>
>
> Holden Karau  于2022年3月15日周二 09:10写道:
>
>> May I suggest we push out one week (22nd) just to give everyone a bit of
>> breathing space? Rushed software development more often results in bugs.
>>
>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>>
>>> > To make our release time more predictable, let us collect the PRs and
>>> wait three more days before the branch cut?
>>>
>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> 
>>>
>>> Three more days are OK for this from my view.
>>>
>>> Regards,
>>> Yikun
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Let me clarify my above suggestion. Maybe we can wait 3 more days to
collect the list of actively developed PRs that we want to merge to 3.3
after the branch cut?

Please do not rush to merge the PRs that are not fully reviewed. We can cut
the branch this Friday and continue merging the PRs that have been
discussed in this thread. Does that make sense?

Xiao




Holden Karau  于2022年3月15日周二 09:10写道:

> May I suggest we push out one week (22nd) just to give everyone a bit of
> breathing space? Rushed software development more often results in bugs.
>
> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>
>> > To make our release time more predictable, let us collect the PRs and
>> wait three more days before the branch cut?
>>
>> For SPIP: Support Customized Kubernetes Schedulers:
>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> 
>>
>> Three more days are OK for this from my view.
>>
>> Regards,
>> Yikun
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of
breathing space? Rushed software development more often results in bugs.

On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:

> > To make our release time more predictable, let us collect the PRs and
> wait three more days before the branch cut?
>
> For SPIP: Support Customized Kubernetes Schedulers:
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> 
>
> Three more days are OK for this from my view.
>
> Regards,
> Yikun
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Apache Spark 3.3 Release

2022-03-15 Thread Yikun Jiang
> To make our release time more predictable, let us collect the PRs and
wait three more days before the branch cut?

For SPIP: Support Customized Kubernetes Schedulers:
#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1


Three more days are OK for this from my view.

Regards,
Yikun