I also have a PR that has been ready to merge for a while, can we merge in 3.3.0? [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics https://github.com/apache/spark/pull/35185
Adam Binford <adam...@gmail.com> 于2022年3月16日周三 21:16写道: > Also throwing my hat in for two of my PRs that should be ready just need > final reviews/approval: > Removing shuffles from deallocated executors using the shuffle service: > https://github.com/apache/spark/pull/35085. This has been asked for for > several years across many issues. > Configurable memory overhead factor: > https://github.com/apache/spark/pull/35504 > > Adam > > On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> +1 to define an allowlist of features that we want to backport to branch >> 3.3. I also have a few in my mind >> complex type support in vectorized parquet reader: >> https://github.com/apache/spark/pull/34659 >> refine the DS v2 filter API for JDBC v2: >> https://github.com/apache/spark/pull/35768 >> a few new SQL functions that have been in development for a while: >> to_char, split_part, percentile_disc, try_sum, etc. >> >> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk >> <maxim.g...@databricks.com.invalid> wrote: >> >>> Hi All, >>> >>> I have created the branch for Spark 3.3: >>> https://github.com/apache/spark/commits/branch-3.3 >>> >>> Please, backport important fixes to it, and if you have some doubts, >>> ping me in the PR. Regarding new features, we are still building the allow >>> list for branch-3.3. >>> >>> Best regards, >>> Max Gekk >>> >>> >>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> >>>> Yes, I agree with you for your whitelist approach for backporting. :) >>>> Thank you for summarizing. >>>> >>>> Thanks, >>>> Dongjoon. >>>> >>>> >>>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <gatorsm...@gmail.com> wrote: >>>> >>>>> I think I finally got your point. What you want to keep unchanged is >>>>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big >>>>> deal. >>>>> >>>>> My major concern is whether we should keep merging the feature work or >>>>> the dependency upgrade after the branch cut. To make our release time more >>>>> predictable, I am suggesting we should finalize the exception PR list >>>>> first, instead of merging them in an ad hoc way. In the past, we spent a >>>>> lot of time on the revert of the PRs that were merged after the branch >>>>> cut. >>>>> I hope we can minimize unnecessary arguments in this release. Do you >>>>> agree, >>>>> Dongjoon? >>>>> >>>>> >>>>> >>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 15:55写道: >>>>> >>>>>> That is not totally fine, Xiao. It sounds like you are asking a >>>>>> change of plan without a proper reason. >>>>>> >>>>>> Although we cut the branch Today according our plan, you still can >>>>>> collect the list and make a list of exceptions. I'm not blocking what you >>>>>> want to do. >>>>>> >>>>>> Please let the community start to ramp down as we agreed before. >>>>>> >>>>>> Dongjoon >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <gatorsm...@gmail.com> wrote: >>>>>> >>>>>>> Please do not get me wrong. If we don't cut a branch, we are >>>>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. >>>>>>> After >>>>>>> we cut the branch, we should avoid merging the feature work. In the next >>>>>>> three days, let us collect the actively developed PRs that we want to >>>>>>> make >>>>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does >>>>>>> that >>>>>>> make sense? >>>>>>> >>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 14:54写道: >>>>>>> >>>>>>>> Xiao. You are working against what you are saying. >>>>>>>> If you don't cut a branch, it means you are allowing all patches to >>>>>>>> land Apache Spark 3.3. No? >>>>>>>> >>>>>>>> > we need to avoid backporting the feature work that are not being >>>>>>>> well discussed. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <gatorsm...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Cutting the branch is simple, but we need to avoid backporting the >>>>>>>>> feature work that are not being well discussed. Not all the members >>>>>>>>> are >>>>>>>>> actively following the dev list. I think we should wait 3 more days >>>>>>>>> for >>>>>>>>> collecting the PR list before cutting the branch. >>>>>>>>> >>>>>>>>> BTW, there are very few 3.4-only feature work that will be >>>>>>>>> affected. >>>>>>>>> >>>>>>>>> Xiao >>>>>>>>> >>>>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 11:49写道: >>>>>>>>> >>>>>>>>>> Hi, Max, Chao, Xiao, Holden and all. >>>>>>>>>> >>>>>>>>>> I have a different idea. >>>>>>>>>> >>>>>>>>>> Given the situation and small patch list, I don't think we need >>>>>>>>>> to postpone the branch cut for those patches. It's easier to cut a >>>>>>>>>> branch-3.3 and allow backporting. >>>>>>>>>> >>>>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in >>>>>>>>>> the branch together. This situation only becomes worse and worse >>>>>>>>>> because >>>>>>>>>> there is no way to block the other patches from landing >>>>>>>>>> unintentionally if >>>>>>>>>> we don't cut a branch. >>>>>>>>>> >>>>>>>>>> [SPARK-38335][SQL] Implement parser support for DEFAULT >>>>>>>>>> column values >>>>>>>>>> >>>>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Dongjoon. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <sunc...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Cool, thanks for clarifying! >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <gatorsm...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> For the following list: >>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering >>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet >>>>>>>>>>> vectorized reader >>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum >>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3? >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > If possible, I hope these features can be shipped with Spark >>>>>>>>>>> 3.3. >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > Chao Sun <sunc...@apache.org> 于2022年3月15日周二 10:06写道: >>>>>>>>>>> >> >>>>>>>>>>> >> Hi Xiao, >>>>>>>>>>> >> >>>>>>>>>>> >> For the following list: >>>>>>>>>>> >> >>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering >>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet >>>>>>>>>>> vectorized reader >>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum >>>>>>>>>>> >> >>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3? >>>>>>>>>>> >> >>>>>>>>>>> >> Thanks, >>>>>>>>>>> >> Chao >>>>>>>>>>> >> >>>>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun < >>>>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>>>> >> > >>>>>>>>>>> >> > The following was tested and merged a few minutes ago. So, >>>>>>>>>>> we can remove it from the list. >>>>>>>>>>> >> > >>>>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to >>>>>>>>>>> v1.5.1 >>>>>>>>>>> >> > >>>>>>>>>>> >> > Thanks, >>>>>>>>>>> >> > Dongjoon. >>>>>>>>>>> >> > >>>>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li < >>>>>>>>>>> gatorsm...@gmail.com> wrote: >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 >>>>>>>>>>> more days to collect the list of actively developed PRs that we >>>>>>>>>>> want to >>>>>>>>>>> merge to 3.3 after the branch cut? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully >>>>>>>>>>> reviewed. We can cut the branch this Friday and continue merging >>>>>>>>>>> the PRs >>>>>>>>>>> that have been discussed in this thread. Does that make sense? >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Xiao >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> >>>>>>>>>>> >> >> Holden Karau <hol...@pigscanfly.ca> 于2022年3月15日周二 09:10写道: >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give >>>>>>>>>>> everyone a bit of breathing space? Rushed software development more >>>>>>>>>>> often >>>>>>>>>>> results in bugs. >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang < >>>>>>>>>>> yikunk...@gmail.com> wrote: >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> > To make our release time more predictable, let us >>>>>>>>>>> collect the PRs and wait three more days before the branch cut? >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers: >>>>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to >>>>>>>>>>> v1.5.1 >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> Three more days are OK for this from my view. >>>>>>>>>>> >> >>>> >>>>>>>>>>> >> >>>> Regards, >>>>>>>>>>> >> >>>> Yikun >>>>>>>>>>> >> >>> >>>>>>>>>>> >> >>> -- >>>>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>> >> >>> YouTube Live Streams: >>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>> >>>>>>>>>> > > -- > Adam Binford >