Re: Apache Spark 3.3 Release

2022-04-29 Thread Maciej
Thanks for the updated Max! Just a small clarification ‒ the following should be moved to RESOLVED: 1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib 2. SPARK-37395: Inline type hint files for files in python/pyspark/ml 3. SPARK-37093: Inline type hints

Re: Apache Spark 3.3 Release

2022-04-28 Thread Maxim Gekk
Hello All, I am going to create the first release candidate of Spark 3.3 at the beginning of the next week if there are no objections. Below is the list of allow features, and their current status. At the moment, only one feature is still in progress, but it can be postponed to the next release,

Re: Apache Spark 3.3 Release

2022-04-15 Thread Maxim Gekk
Hello All, Current status of features from the allow list for branch-3.3 is: IN PROGRESS: 1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc 2. SPARK-28516: Data Type Formatting Functions: `to_char` 3. SPARK-34079: Improvement CTE table scan IN PROGRESS but

Re: Apache Spark 3.3 Release

2022-04-04 Thread Maxim Gekk
Hello All, Below is current status of features from the allow list: IN PROGRESS: 1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib 2. SPARK-37395: Inline type hint files for files in python/pyspark/ml 3. SPARK-37093: Inline type hints python/pyspark/streaming

Re: Apache Spark 3.3 Release

2022-03-21 Thread Tom Graves
Maybe I'm miss understanding what you are saying, according to those dates code freeze, which should be majority of features are merged is March 15th. So if this list is all features and not merged at this point we should probably discuss if we want them to go in or if we need to change the

Re: Apache Spark 3.3 Release

2022-03-21 Thread Wenchen Fan
Just checked the release calendar, the planned RC cut date is April: [image: image.png] Let's revisit after 2 weeks then? On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan wrote: > Shall we revisit this list after a week? Ideally, they should be either > merged or rejected for 3.3, so that we can cut

Re: Apache Spark 3.3 Release

2022-03-21 Thread Wenchen Fan
Shall we revisit this list after a week? Ideally, they should be either merged or rejected for 3.3, so that we can cut rc1. We can still discuss them case by case at that time if there are exceptions. On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun wrote: > Thank you for your summarization. > > I

Re: Apache Spark 3.3 Release

2022-03-18 Thread Dongjoon Hyun
Thank you for your summarization. I believe we need to have a discussion in order to evaluate each PR's readiness. BTW, `branch-3.3` is still open for bug fixes including minor dependency changes like the following. (Backported) [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4 Revert

Re: Apache Spark 3.3 Release

2022-03-18 Thread Maxim Gekk
Hi All, Here is the allow list which I built based on your requests in this thread: 1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib 2. SPARK-37395: Inline type hint files for files in python/pyspark/ml 3. SPARK-37093: Inline type hints python/pyspark/streaming

Re: Apache Spark 3.3 Release

2022-03-17 Thread Tom Graves
Is the feature freeze target date March 22nd then?  I saw a few dates thrown around want to confirm what we landed on  I am trying to get the following improvements finished review and in, if concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated scalar subqueries-

Re: Apache Spark 3.3 Release

2022-03-17 Thread Gengliang Wang
I'd like to add the following new SQL functions in the 3.3 release. These functions are useful when overflow or encoding errors occur: - [SPARK-38548][SQL] New SQL function: try_sum - [SPARK-38589][SQL] New SQL function: try_avg

Re: Apache Spark 3.3 Release

2022-03-16 Thread Andrew Melo
Hello, I've been trying for a bit to get the following two PRs merged and into a release, and I'm having some difficulty moving them forward: https://github.com/apache/spark/pull/34903 - This passes the current python interpreter to spark-env.sh to allow some currently-unavailable customization

Re: Apache Spark 3.3 Release

2022-03-16 Thread Holden Karau
I'd like to add/backport the logging in https://github.com/apache/spark/pull/35881 PR so that when users submit issues with dynamic allocation we can better debug what's going on. On Wed, Mar 16, 2022 at 3:45 PM Chao Sun wrote: > There is one item on our side that we want to backport to 3.3: >

Re: Apache Spark 3.3 Release

2022-03-16 Thread Chao Sun
There is one item on our side that we want to backport to 3.3: - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support (https://github.com/apache/spark/pull/35262) It's already reviewed and approved. On Wed, Mar 16, 2022 at 9:13 AM Tom Graves wrote: > > It looks

Re: Apache Spark 3.3 Release

2022-03-16 Thread Tom Graves
It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.  Tom On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk wrote: Hi All, I have created the branch for Spark 3.3:

Re: Apache Spark 3.3 Release

2022-03-16 Thread Jacky Lee
I also have a PR that has been ready to merge for a while, can we merge in 3.3.0? [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics https://github.com/apache/spark/pull/35185 Adam Binford 于2022年3月16日周三 21:16写道: > Also throwing my hat in for two of my PRs that should be

Re: Apache Spark 3.3 Release

2022-03-16 Thread Jacky Lee
I also have a PR that has been ready to merge for a while, can we merge in 3.3.0? [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics https://github.com/apache/spark/pull/35185 beliefer 于2022年3月16日周三 21:33写道: > +1 Glad to see we will release 3.3.0. > > > At 2022-03-04

Re: Apache Spark 3.3 Release

2022-03-16 Thread Adam Binford
Also throwing my hat in for two of my PRs that should be ready just need final reviews/approval: Removing shuffles from deallocated executors using the shuffle service: https://github.com/apache/spark/pull/35085. This has been asked for for several years across many issues. Configurable memory

Re: Apache Spark 3.3 Release

2022-03-16 Thread Wenchen Fan
+1 to define an allowlist of features that we want to backport to branch 3.3. I also have a few in my mind complex type support in vectorized parquet reader: https://github.com/apache/spark/pull/34659 refine the DS v2 filter API for JDBC v2: https://github.com/apache/spark/pull/35768 a few new SQL

Re: Apache Spark 3.3 Release

2022-03-16 Thread Maxim Gekk
Hi All, I have created the branch for Spark 3.3: https://github.com/apache/spark/commits/branch-3.3 Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3. Best regards, Max Gekk On

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Yes, I agree with you for your whitelist approach for backporting. :) Thank you for summarizing. Thanks, Dongjoon. On Tue, Mar 15, 2022 at 4:20 PM Xiao Li wrote: > I think I finally got your point. What you want to keep unchanged is the > branch cut date of Spark 3.3. Today? or this Friday?

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal. My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason. Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do. Please let the community start to ramp

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Xiao. You are working against what you are saying. If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No? > we need to avoid backporting the feature work that are not being well discussed. On Tue, Mar 15, 2022 at 12:12 PM Xiao Li wrote: > Cutting the

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch. BTW, there are very few 3.4-only

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Hi, Max, Chao, Xiao, Holden and all. I have a different idea. Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting. As of today, we already have an obvious Apache Spark 3.4 patch in

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Cool, thanks for clarifying! On Tue, Mar 15, 2022 at 10:11 AM Xiao Li wrote: >> >> For the following list: >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader >> #35848 [SPARK-38548][SQL] New SQL function:

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
> > For the following list: > #35789 [SPARK-32268][SQL] Row-level Runtime Filtering > #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized > reader > #35848 [SPARK-38548][SQL] New SQL function: try_sum > Do you mean we should include them, or exclude them from 3.3? If possible,

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Hi Xiao, For the following list: #35789 [SPARK-32268][SQL] Row-level Runtime Filtering #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader #35848 [SPARK-38548][SQL] New SQL function: try_sum Do you mean we should include them, or exclude them from 3.3? Thanks, Chao

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
The following was tested and merged a few minutes ago. So, we can remove it from the list. #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 Thanks, Dongjoon. On Tue, Mar 15, 2022 at 9:48 AM Xiao Li wrote: > Let me clarify my above

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut? Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that

Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs. On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang wrote: > > To make our release time more predictable, let us collect the PRs and > wait three more days

Re: Apache Spark 3.3 Release

2022-03-15 Thread Yikun Jiang
> To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut? For SPIP: Support Customized Kubernetes Schedulers: #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 Three more days are

Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut? Please list all the actively developed feature work we plan to release with Spark 3.3? We should avoid merging any new feature work that is not being discussed in this email thread.

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
I mainly mean: - [SPARK-35801] Row-level operations in Data Source V2 - [SPARK-37166] Storage Partitioned Join For which the PR: - https://github.com/apache/spark/pull/35395 - https://github.com/apache/spark/pull/35657 are actively being reviewed. It seems there are ongoing PRs for other

Re: Apache Spark 3.3 Release

2022-03-14 Thread Holden Karau
On Mon, Mar 14, 2022 at 11:53 PM Xiao Li wrote: > Could you please list which features we want to finish before the branch > cut? How long will they take? > > Xiao > > Chao Sun 于2022年3月14日周一 13:30写道: > >> Hi Max, >> >> As there are still some ongoing work for the above listed SPIPs, can we >>

Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
Could you please list which features we want to finish before the branch cut? How long will they take? Xiao Chao Sun 于2022年3月14日周一 13:30写道: > Hi Max, > > As there are still some ongoing work for the above listed SPIPs, can we > still merge them after the branch cut? > > Thanks, > Chao > > On

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
Hi Max, As there are still some ongoing work for the above listed SPIPs, can we still merge them after the branch cut? Thanks, Chao On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk wrote: > Hi All, > > Since there are no actual blockers for Spark 3.3.0 and significant > objections, I am going to

Re: Apache Spark 3.3 Release

2022-03-14 Thread Maxim Gekk
Hi All, Since there are no actual blockers for Spark 3.3.0 and significant objections, I am going to cut branch-3.3 after 15th March at 00:00 PST. Please, let us know if you have any concerns about that. Best regards, Max Gekk On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk wrote: > Hello All, > >

Re: Apache Spark 3.3 Release

2022-03-06 Thread Maciej
Ideally, we should complete these - [SPARK-37093] Inline type hints python/pyspark/streaming - [SPARK-37395] Inline type hint files for files in python/pyspark/ml - [SPARK-37396] Inline type hint files for files in python/pyspark/mllib All tasks have either PR in progress or someone working on a

Re: Apache Spark 3.3 Release

2022-03-04 Thread Yikun Jiang
@Maxim Thanks for driving the release! > Not sure about SPARK-36057 since the current state. @Igor Costa Thanks for your attention, as dongjoon said, basic framework abilities of custom scheduler have been supported, we are also planning to mark this as beta in 3.3.0. Of course, we will do more

Re: Apache Spark 3.3 Release

2022-03-04 Thread Dongjoon Hyun
s yes but to confirm again. >> >> On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun >> wrote: >> >>> Thank you, Max, for volunteering for Apache Spark 3.3 release manager. >>> >>> Ya, I'm also +1 for the original plan. >>> >>> D

Re: Apache Spark 3.3 Release

2022-03-04 Thread Igor Costa
irm again. > > On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun > wrote: > >> Thank you, Max, for volunteering for Apache Spark 3.3 release manager. >> >> Ya, I'm also +1 for the original plan. >> >> Dongjoon >> >> On Thu, Mar 3, 2022 at 10:52 AM Mr

Re: Apache Spark 3.3 Release

2022-03-03 Thread Jungtaek Lim
to gain traction as well as going through actual reviews. My guess is yes but to confirm again. On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun wrote: > Thank you, Max, for volunteering for Apache Spark 3.3 release manager. > > Ya, I'm also +1 for the original plan. > > Dongjoon &g

Re: Apache Spark 3.3 Release

2022-03-03 Thread Dongjoon Hyun
Thank you, Max, for volunteering for Apache Spark 3.3 release manager. Ya, I'm also +1 for the original plan. Dongjoon On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan wrote: > > Agree with Sean, code freeze by mid March sounds good. > > Regards, > Mridul > > On Thu,

Re: Apache Spark 3.3 Release

2022-03-03 Thread Mridul Muralidharan
Agree with Sean, code freeze by mid March sounds good. Regards, Mridul On Thu, Mar 3, 2022 at 12:47 PM Sean Owen wrote: > I think it's fine to pursue the existing plan - code freeze in two weeks > and try to close off key remaining issues. Final release pending on how > those go, and testing,

Re: Apache Spark 3.3 Release

2022-03-03 Thread Sean Owen
I think it's fine to pursue the existing plan - code freeze in two weeks and try to close off key remaining issues. Final release pending on how those go, and testing, but fine to get the ball rolling. On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk wrote: > Hello All, > > I would like to bring on

Apache Spark 3.3 Release

2022-03-03 Thread Maxim Gekk
Hello All, I would like to bring on the table the theme about the new Spark release 3.3. According to the public schedule at https://spark.apache.org/versioning-policy.html, we planned to start the code freeze and release branch cut on March 15th, 2022. Since this date is coming soon, I would