Re: Apache Spark 3.3 Release

Jacky Lee Wed, 16 Mar 2022 07:05:53 -0700

I also have a PR that has been ready to merge for a while, can we merge in
3.3.0?
[SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics
https://github.com/apache/spark/pull/35185


Adam Binford <adam...@gmail.com> 于2022年3月16日周三 21:16写道：

> Also throwing my hat in for two of my PRs that should be ready just need
> final reviews/approval:
> Removing shuffles from deallocated executors using the shuffle service:
> https://github.com/apache/spark/pull/35085. This has been asked for for
> several years across many issues.
> Configurable memory overhead factor:
> https://github.com/apache/spark/pull/35504
>
> Adam
>
> On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> +1 to define an allowlist of features that we want to backport to branch
>> 3.3. I also have a few in my mind
>> complex type support in vectorized parquet reader:
>> https://github.com/apache/spark/pull/34659
>> refine the DS v2 filter API for JDBC v2:
>> https://github.com/apache/spark/pull/35768
>> a few new SQL functions that have been in development for a while:
>> to_char, split_part, percentile_disc, try_sum, etc.
>>
>> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
>> <maxim.g...@databricks.com.invalid> wrote:
>>
>>> Hi All,
>>>
>>> I have created the branch for Spark 3.3:
>>> https://github.com/apache/spark/commits/branch-3.3
>>>
>>> Please, backport important fixes to it, and if you have some doubts,
>>> ping me in the PR. Regarding new features, we are still building the allow
>>> list for branch-3.3.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>>
>>>> Yes, I agree with you for your whitelist approach for backporting. :)
>>>> Thank you for summarizing.
>>>>
>>>> Thanks,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>
>>>>> I think I finally got your point. What you want to keep unchanged is
>>>>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>>>>> deal.
>>>>>
>>>>> My major concern is whether we should keep merging the feature work or
>>>>> the dependency upgrade after the branch cut. To make our release time more
>>>>> predictable, I am suggesting we should finalize the exception PR list
>>>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>>>> lot of time on the revert of the PRs that were merged after the branch 
>>>>> cut.
>>>>> I hope we can minimize unnecessary arguments in this release. Do you 
>>>>> agree,
>>>>> Dongjoon?
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>>>
>>>>>> That is not totally fine, Xiao. It sounds like you are asking a
>>>>>> change of plan without a proper reason.
>>>>>>
>>>>>> Although we cut the branch Today according our plan, you still can
>>>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>>>> want to do.
>>>>>>
>>>>>> Please let the community start to ramp down as we agreed before.
>>>>>>
>>>>>> Dongjoon
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>>>>
>>>>>>> Please do not get me wrong. If we don't cut a branch, we are
>>>>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. 
>>>>>>> After
>>>>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>>>>> three days, let us collect the actively developed PRs that we want to 
>>>>>>> make
>>>>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does 
>>>>>>> that
>>>>>>> make sense?
>>>>>>>
>>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>>>>>
>>>>>>>> Xiao. You are working against what you are saying.
>>>>>>>> If you don't cut a branch, it means you are allowing all patches to
>>>>>>>> land Apache Spark 3.3. No?
>>>>>>>>
>>>>>>>> > we need to avoid backporting the feature work that are not being
>>>>>>>> well discussed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <gatorsm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>>>>>> feature work that are not being well discussed. Not all the members 
>>>>>>>>> are
>>>>>>>>> actively following the dev list. I think we should wait 3 more days 
>>>>>>>>> for
>>>>>>>>> collecting the PR list before cutting the branch.
>>>>>>>>>
>>>>>>>>> BTW, there are very few 3.4-only feature work that will be
>>>>>>>>> affected.
>>>>>>>>>
>>>>>>>>> Xiao
>>>>>>>>>
>>>>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>>>>>
>>>>>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>>>>>
>>>>>>>>>> I have a different idea.
>>>>>>>>>>
>>>>>>>>>> Given the situation and small patch list, I don't think we need
>>>>>>>>>> to postpone the branch cut for those patches. It's easier to cut a
>>>>>>>>>> branch-3.3 and allow backporting.
>>>>>>>>>>
>>>>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>>>>>>>>> the branch together. This situation only becomes worse and worse 
>>>>>>>>>> because
>>>>>>>>>> there is no way to block the other patches from landing 
>>>>>>>>>> unintentionally if
>>>>>>>>>> we don't cut a branch.
>>>>>>>>>>
>>>>>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT
>>>>>>>>>> column values
>>>>>>>>>>
>>>>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Dongjoon.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <sunc...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Cool, thanks for clarifying!
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <gatorsm...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> For the following list:
>>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>>> vectorized reader
>>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > If possible, I hope these features can be shipped with Spark
>>>>>>>>>>> 3.3.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Chao Sun <sunc...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hi Xiao,
>>>>>>>>>>> >>
>>>>>>>>>>> >> For the following list:
>>>>>>>>>>> >>
>>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>>> vectorized reader
>>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>>> >>
>>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Thanks,
>>>>>>>>>>> >> Chao
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>>>>>> dongjoon.h...@gmail.com> wrote:
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > The following was tested and merged a few minutes ago. So,
>>>>>>>>>>> we can remove it from the list.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>>>> v1.5.1
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks,
>>>>>>>>>>> >> > Dongjoon.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <
>>>>>>>>>>> gatorsm...@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3
>>>>>>>>>>> more days to collect the list of actively developed PRs that we 
>>>>>>>>>>> want to
>>>>>>>>>>> merge to 3.3 after the branch cut?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>>>>>> reviewed. We can cut the branch this Friday and continue merging 
>>>>>>>>>>> the PRs
>>>>>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Xiao
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Holden Karau <hol...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>>>>>> everyone a bit of breathing space? Rushed software development more 
>>>>>>>>>>> often
>>>>>>>>>>> results in bugs.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>>>>>> yikunk...@gmail.com> wrote:
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> > To make our release time more predictable, let us
>>>>>>>>>>> collect the PRs and wait three more days before the branch cut?
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>>>> v1.5.1
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> Regards,
>>>>>>>>>>> >> >>>> Yikun
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>> >> >>> YouTube Live Streams:
>>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>
>>>>>>>>>>
>
> --
> Adam Binford
>

Re: Apache Spark 3.3 Release

Reply via email to