Re: Apache Spark 3.3 Release

2022-04-29 Thread Maciej
Thanks for the updated Max!

Just a small clarification ‒ the following should be moved to RESOLVED:

1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
3. SPARK-37093: Inline type hints python/pyspark/streaming

On 4/28/22 14:42, Maxim Gekk wrote:
> Hello All,
> 
> I am going to create the first release candidate of Spark 3.3 at the
> beginning of the next week if there are no objections. Below is the list
> of allow features, and their current status. At the moment, only one
> feature is still in progress, but it can be postponed to the next
> release, I guess:
> 
> IN PROGRESS:
> 
>  1. SPARK-28516: Data Type Formatting Functions: `to_char`
> 
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>  1. SPARK-37650: Tell spark-env.sh the python interpreter
>  2. SPARK-36664: Log time spent waiting for cluster resources
>  3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
>  4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>  5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
> RESOLVED:
> 
>  1. SPARK-32268: Bloom Filter Join
>  2. SPARK-38548: New SQL function: try_sum
>  3. SPARK-38063: Support SQL split_part function
>  4. SPARK-38432: Refactor framework so as JDBC dialect could compile
> filter by self way
>  5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
>  6. SPARK-38194: Make Yarn memory overhead factor configurable
>  7. SPARK-37618: Support cleaning up shuffle blocks from external
> shuffle service
>  8. SPARK-37831: Add task partition id in metrics
>  9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
> DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 10. SPARK-38590: New SQL function: try_to_binary
> 11. SPARK-37377: Refactor V2 Partitioning interface and remove
> deprecated usage of Distribution
> 12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
> sources
> 13. SPARK-34659: Web UI does not correctly get appId
> 14. SPARK-38589: New SQL function: try_avg
> 15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
> 16. SPARK-34079: Improvement CTE table scan
> 
> 
> Max Gekk
> 
> Software Engineer
> 
> Databricks, Inc.
> 
> 
> 
> On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk  > wrote:
> 
> Hello All,
> 
> Current status of features from the allow list for branch-3.3 is:
> 
> IN PROGRESS:
> 
>  1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>  2. SPARK-28516: Data Type Formatting Functions: `to_char`
>  3. SPARK-34079: Improvement CTE table scan
> 
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>  1. SPARK-37650: Tell spark-env.sh the python interpreter
>  2. SPARK-36664: Log time spent waiting for cluster resources
>  3. SPARK-37396: Inline type hint files for files in
> python/pyspark/mllib
>  4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>  5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
> RESOLVED:
> 
>  1. SPARK-32268: Bloom Filter Join
>  2. SPARK-38548: New SQL function: try_sum
>  3. SPARK-38063: Support SQL split_part function
>  4. SPARK-38432: Refactor framework so as JDBC dialect could compile
> filter by self way
>  5. SPARK-34863: Support nested column in Spark Parquet vectorized
> readers
>  6. SPARK-38194: Make Yarn memory overhead factor configurable
>  7. SPARK-37618: Support cleaning up shuffle blocks from external
> shuffle service
>  8. SPARK-37831: Add task partition id in metrics
>  9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
> DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 10. SPARK-38590: New SQL function: try_to_binary
> 11. SPARK-37377: Refactor V2 Partitioning interface and remove
> deprecated usage of Distribution
> 12. SPARK-38085: DataSource V2: Handle DELETE commands for
> group-based sources
> 13. SPARK-34659: Web UI does not correctly get appId
> 14. SPARK-38589: New SQL function: try_avg
> 
> 
> Max Gekk
> 
> Software Engineer
> 
> Databricks, Inc.
> 
> 
> 
> On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk  > wrote:
> 
> Hello All,
> 
> Below is current status of features from the allow list:
> 
> IN PROGRESS:
> 
>  1. SPARK-37396: Inline type hint files for files in
> python/pyspark/mllib
>  2. SPARK-37395: Inline type hint files for files in
> python/pyspark/ml
>  3. SPARK-37093: Inline type hints python/pyspark/streaming
>  4. SPARK-37377: Refactor V2 Partitioning interface and remove
> deprecated usage of Distribution
>  5. SPARK-38085: DataSource V2: Handle DELETE commands for

Re: Apache Spark 3.3 Release

2022-04-28 Thread Maxim Gekk
Hello All,

I am going to create the first release candidate of Spark 3.3 at the
beginning of the next week if there are no objections. Below is the list of
allow features, and their current status. At the moment, only one feature
is still in progress, but it can be postponed to the next release, I guess:

IN PROGRESS:

   1. SPARK-28516: Data Type Formatting Functions: `to_char`

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg
   15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   16. SPARK-34079: Improvement CTE table scan


Max Gekk

Software Engineer

Databricks, Inc.


On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk 
wrote:

> Hello All,
>
> Current status of features from the allow list for branch-3.3 is:
>
> IN PROGRESS:
>
>1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>2. SPARK-28516: Data Type Formatting Functions: `to_char`
>3. SPARK-34079: Improvement CTE table scan
>
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
>
>1. SPARK-37650: Tell spark-env.sh the python interpreter
>2. SPARK-36664: Log time spent waiting for cluster resources
>3. SPARK-37396: Inline type hint files for files in
>python/pyspark/mllib
>4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>5. SPARK-37093: Inline type hints python/pyspark/streaming
>
> RESOLVED:
>
>1. SPARK-32268: Bloom Filter Join
>2. SPARK-38548: New SQL function: try_sum
>3. SPARK-38063: Support SQL split_part function
>4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>filter by self way
>5. SPARK-34863: Support nested column in Spark Parquet vectorized
>readers
>6. SPARK-38194: Make Yarn memory overhead factor configurable
>7. SPARK-37618: Support cleaning up shuffle blocks from external
>shuffle service
>8. SPARK-37831: Add task partition id in metrics
>9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>10. SPARK-38590: New SQL function: try_to_binary
>11. SPARK-37377: Refactor V2 Partitioning interface and remove
>deprecated usage of Distribution
>12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>sources
>13. SPARK-34659: Web UI does not correctly get appId
>14. SPARK-38589: New SQL function: try_avg
>
>
> Max Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk 
> wrote:
>
>> Hello All,
>>
>> Below is current status of features from the allow list:
>>
>> IN PROGRESS:
>>
>>1. SPARK-37396: Inline type hint files for files in
>>python/pyspark/mllib
>>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>3. SPARK-37093: Inline type hints python/pyspark/streaming
>>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>deprecated usage of Distribution
>>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>sources
>>6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>7. SPARK-28516: Data Type Formatting Functions: `to_char`
>>8. SPARK-36664: Log time spent waiting for cluster resources
>>9. SPARK-34659: Web UI does not correctly get appId
>>10. SPARK-37650: Tell spark-env.sh the python interpreter
>>11. SPARK-38589: New SQL function: try_avg
>>12. SPARK-38590: New SQL function: try_to_binary
>>13. SPARK-34079: Improvement CTE table scan
>>
>> RESOLVED:
>>
>>1. SPARK-32268: Bloom Filter Join
>>2. SPARK-38548: New SQL function: try_sum
>>3. SPARK-38063: Support SQL split_part function
>>4. 

Re: Apache Spark 3.3 Release

2022-04-15 Thread Maxim Gekk
Hello All,

Current status of features from the allow list for branch-3.3 is:

IN PROGRESS:

   1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   2. SPARK-28516: Data Type Formatting Functions: `to_char`
   3. SPARK-34079: Improvement CTE table scan

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg


Max Gekk

Software Engineer

Databricks, Inc.


On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk  wrote:

> Hello All,
>
> Below is current status of features from the allow list:
>
> IN PROGRESS:
>
>1. SPARK-37396: Inline type hint files for files in
>python/pyspark/mllib
>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>3. SPARK-37093: Inline type hints python/pyspark/streaming
>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>deprecated usage of Distribution
>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>sources
>6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>7. SPARK-28516: Data Type Formatting Functions: `to_char`
>8. SPARK-36664: Log time spent waiting for cluster resources
>9. SPARK-34659: Web UI does not correctly get appId
>10. SPARK-37650: Tell spark-env.sh the python interpreter
>11. SPARK-38589: New SQL function: try_avg
>12. SPARK-38590: New SQL function: try_to_binary
>13. SPARK-34079: Improvement CTE table scan
>
> RESOLVED:
>
>1. SPARK-32268: Bloom Filter Join
>2. SPARK-38548: New SQL function: try_sum
>3. SPARK-38063: Support SQL split_part function
>4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>filter by self way
>5. SPARK-34863: Support nested column in Spark Parquet vectorized
>readers
>6. SPARK-38194: Make Yarn memory overhead factor configurable
>7. SPARK-37618: Support cleaning up shuffle blocks from external
>shuffle service
>8. SPARK-37831: Add task partition id in metrics
>9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>
> We need to decide whether we are going to wait a little bit more or close
> the doors.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk 
> wrote:
>
>> Hi All,
>>
>> Here is the allow list which I built based on your requests in this
>> thread:
>>
>>1. SPARK-37396: Inline type hint files for files in
>>python/pyspark/mllib
>>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>3. SPARK-37093: Inline type hints python/pyspark/streaming
>>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>deprecated usage of Distribution
>>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>sources
>>6. SPARK-32268: Bloom Filter Join
>>7. SPARK-38548: New SQL function: try_sum
>>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>9. SPARK-38063: Support SQL split_part function
>>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>filter by self way
>>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>readers
>>13. SPARK-38194: Make Yarn memory overhead factor configurable
>>14. SPARK-37618: Support cleaning up shuffle blocks from external
>>shuffle service
>>15. SPARK-37831: Add task partition id in metrics
>>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>17. SPARK-36664: Log time spent waiting for 

Re: Apache Spark 3.3 Release

2022-04-04 Thread Maxim Gekk
Hello All,

Below is current status of features from the allow list:

IN PROGRESS:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   7. SPARK-28516: Data Type Formatting Functions: `to_char`
   8. SPARK-36664: Log time spent waiting for cluster resources
   9. SPARK-34659: Web UI does not correctly get appId
   10. SPARK-37650: Tell spark-env.sh the python interpreter
   11. SPARK-38589: New SQL function: try_avg
   12. SPARK-38590: New SQL function: try_to_binary
   13. SPARK-34079: Improvement CTE table scan

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

We need to decide whether we are going to wait a little bit more or close
the doors.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk 
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>1. SPARK-37396: Inline type hint files for files in
>python/pyspark/mllib
>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>3. SPARK-37093: Inline type hints python/pyspark/streaming
>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>deprecated usage of Distribution
>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>sources
>6. SPARK-32268: Bloom Filter Join
>7. SPARK-38548: New SQL function: try_sum
>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>9. SPARK-38063: Support SQL split_part function
>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>filter by self way
>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>readers
>13. SPARK-38194: Make Yarn memory overhead factor configurable
>14. SPARK-37618: Support cleaning up shuffle blocks from external
>shuffle service
>15. SPARK-37831: Add task partition id in metrics
>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>17. SPARK-36664: Log time spent waiting for cluster resources
>18. SPARK-34659: Web UI does not correctly get appId
>19. SPARK-37650: Tell spark-env.sh the python interpreter
>20. SPARK-38589: New SQL function: try_avg
>21. SPARK-38590: New SQL function: try_to_binary
>22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> 
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors 
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltn...@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>- [SPARK-38548][SQL] New SQL function: try_sum
>>
>>- [SPARK-38589][SQL] New SQL function: try_avg
>>
>>- [SPARK-38590][SQL] New SQL function: try_to_binary
>>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo 
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> 

Re: Apache Spark 3.3 Release

2022-03-21 Thread Tom Graves
 Maybe I'm miss understanding what you are saying, according to those dates 
code freeze, which should be majority of features are merged is March 15th. So 
if this list is all features and not merged at this point we should probably 
discuss if we want them to go in or if we need to change the dates.  Major 
features going in during QA period can destabilize things.
Tom
On Monday, March 21, 2022, 01:53:24 AM CDT, Wenchen Fan 
 wrote:  
 
 Just checked the release calendar, the planned RC cut date is April:
Let's revisit after 2 weeks then?
On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan  wrote:

Shall we revisit this list after a week? Ideally, they should be either merged 
or rejected for 3.3, so that we can cut rc1. We can still discuss them case by 
case at that time if there are exceptions.
On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun  wrote:

Thank you for your summarization.

I believe we need to have a discussion in order to evaluate each PR's readiness.

BTW, `branch-3.3` is still open for bug fixes including minor dependency 
changes like the following.

(Backported)[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5

(Upcoming)
[SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
[SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
Dongjoon.


On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk  wrote:

Hi All,
Here is the allow list which I built based on your requests in this thread:   
   - SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   - SPARK-37395: Inline type hint files for files in python/pyspark/ml
   - SPARK-37093: Inline type hints python/pyspark/streaming
   - SPARK-37377: Refactor V2 Partitioning interface and remove deprecated 
usage of Distribution
   - SPARK-38085: DataSource V2: Handle DELETE commands for group-based sources
   - SPARK-32268: Bloom Filter Join
   - SPARK-38548: New SQL function: try_sum
   - SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   - SPARK-38063: Support SQL split_part function
   - SPARK-28516: Data Type Formatting Functions: `to_char`
   - SPARK-38432: Refactor framework so as JDBC dialect could compile filter by 
self way
   - SPARK-34863: Support nested column in Spark Parquet vectorized readers
   - SPARK-38194: Make Yarn memory overhead factor configurable
   - SPARK-37618: Support cleaning up shuffle blocks from external shuffle 
service
   - SPARK-37831: Add task partition id in metrics
   - SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and 
DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   - SPARK-36664: Log time spent waiting for cluster resources
   - SPARK-34659: Web UI does not correctly get appId
   - SPARK-37650: Tell spark-env.sh the python interpreter
   - SPARK-38589: New SQL function: try_avg
   - SPARK-38590: New SQL function: try_to_binary   

   - SPARK-34079: Improvement CTE table scan   

Best regards,Max Gekk

On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:

 Is the feature freeze target date March 22nd then?  I saw a few dates thrown 
around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if 
concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated 
scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle 
service for released executors
Tom

On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang 
 wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These 
functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
>
> I'd like to add/backport the logging in 
> https://github.com/apache/spark/pull/35881 PR so that when users submit 
> issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized 

Re: Apache Spark 3.3 Release

2022-03-21 Thread Wenchen Fan
Just checked the release calendar, the planned RC cut date is April:
[image: image.png]
Let's revisit after 2 weeks then?

On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan  wrote:

> Shall we revisit this list after a week? Ideally, they should be either
> merged or rejected for 3.3, so that we can cut rc1. We can still discuss
> them case by case at that time if there are exceptions.
>
> On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun 
> wrote:
>
>> Thank you for your summarization.
>>
>> I believe we need to have a discussion in order to evaluate each PR's
>> readiness.
>>
>> BTW, `branch-3.3` is still open for bug fixes including minor dependency
>> changes like the following.
>>
>> (Backported)
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
>> Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
>>
>> (Upcoming)
>> [SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
>> [SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
>>
>> Dongjoon.
>>
>>
>>
>> On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk 
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the allow list which I built based on your requests in this
>>> thread:
>>>
>>>1. SPARK-37396: Inline type hint files for files in
>>>python/pyspark/mllib
>>>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>>3. SPARK-37093: Inline type hints python/pyspark/streaming
>>>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>>deprecated usage of Distribution
>>>5. SPARK-38085: DataSource V2: Handle DELETE commands for
>>>group-based sources
>>>6. SPARK-32268: Bloom Filter Join
>>>7. SPARK-38548: New SQL function: try_sum
>>>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>>9. SPARK-38063: Support SQL split_part function
>>>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>>filter by self way
>>>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>>readers
>>>13. SPARK-38194: Make Yarn memory overhead factor configurable
>>>14. SPARK-37618: Support cleaning up shuffle blocks from external
>>>shuffle service
>>>15. SPARK-37831: Add task partition id in metrics
>>>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>>17. SPARK-36664: Log time spent waiting for cluster resources
>>>18. SPARK-34659: Web UI does not correctly get appId
>>>19. SPARK-37650: Tell spark-env.sh the python interpreter
>>>20. SPARK-38589: New SQL function: try_avg
>>>21. SPARK-38590: New SQL function: try_to_binary
>>>22. SPARK-34079: Improvement CTE table scan
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:
>>>
 Is the feature freeze target date March 22nd then?  I saw a few dates
 thrown around want to confirm what we landed on

 I am trying to get the following improvements finished review and in,
 if concerns with either, let me know:
 - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
 
 - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
 for released executors 

 Tom


 On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
 ltn...@gmail.com> wrote:


 I'd like to add the following new SQL functions in the 3.3 release.
 These functions are useful when overflow or encoding errors occur:

- [SPARK-38548][SQL] New SQL function: try_sum

- [SPARK-38589][SQL] New SQL function: try_avg

- [SPARK-38590][SQL] New SQL function: try_to_binary


 Gengliang

 On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo 
 wrote:

 Hello,

 I've been trying for a bit to get the following two PRs merged and
 into a release, and I'm having some difficulty moving them forward:

 https://github.com/apache/spark/pull/34903 - This passes the current
 python interpreter to spark-env.sh to allow some currently-unavailable
 customization to happen
 https://github.com/apache/spark/pull/31774 - This fixes a bug in the
 SparkUI reverse proxy-handling code where it does a greedy match for
 "proxy" in the URL, and will mistakenly replace the App-ID in the
 wrong place.

 I'm not exactly sure of how to get attention of PRs that have been
 sitting around for a while, but these are really important to our
 use-cases, and it would be nice to have them merged in.

 Cheers
 Andrew

 On Wed, Mar 16, 2022 at 6:21 PM Holden Karau 

Re: Apache Spark 3.3 Release

2022-03-21 Thread Wenchen Fan
Shall we revisit this list after a week? Ideally, they should be either
merged or rejected for 3.3, so that we can cut rc1. We can still discuss
them case by case at that time if there are exceptions.

On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun 
wrote:

> Thank you for your summarization.
>
> I believe we need to have a discussion in order to evaluate each PR's
> readiness.
>
> BTW, `branch-3.3` is still open for bug fixes including minor dependency
> changes like the following.
>
> (Backported)
> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
> Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
>
> (Upcoming)
> [SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
> [SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
>
> Dongjoon.
>
>
>
> On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk 
> wrote:
>
>> Hi All,
>>
>> Here is the allow list which I built based on your requests in this
>> thread:
>>
>>1. SPARK-37396: Inline type hint files for files in
>>python/pyspark/mllib
>>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>3. SPARK-37093: Inline type hints python/pyspark/streaming
>>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>deprecated usage of Distribution
>>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>sources
>>6. SPARK-32268: Bloom Filter Join
>>7. SPARK-38548: New SQL function: try_sum
>>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>9. SPARK-38063: Support SQL split_part function
>>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>filter by self way
>>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>readers
>>13. SPARK-38194: Make Yarn memory overhead factor configurable
>>14. SPARK-37618: Support cleaning up shuffle blocks from external
>>shuffle service
>>15. SPARK-37831: Add task partition id in metrics
>>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>17. SPARK-36664: Log time spent waiting for cluster resources
>>18. SPARK-34659: Web UI does not correctly get appId
>>19. SPARK-37650: Tell spark-env.sh the python interpreter
>>20. SPARK-38589: New SQL function: try_avg
>>21. SPARK-38590: New SQL function: try_to_binary
>>22. SPARK-34079: Improvement CTE table scan
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:
>>
>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>> thrown around want to confirm what we landed on
>>>
>>> I am trying to get the following improvements finished review and in, if
>>> concerns with either, let me know:
>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>> 
>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>> for released executors 
>>>
>>> Tom
>>>
>>>
>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>> ltn...@gmail.com> wrote:
>>>
>>>
>>> I'd like to add the following new SQL functions in the 3.3 release.
>>> These functions are useful when overflow or encoding errors occur:
>>>
>>>- [SPARK-38548][SQL] New SQL function: try_sum
>>>
>>>- [SPARK-38589][SQL] New SQL function: try_avg
>>>
>>>- [SPARK-38590][SQL] New SQL function: try_to_binary
>>>
>>>
>>> Gengliang
>>>
>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo 
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've been trying for a bit to get the following two PRs merged and
>>> into a release, and I'm having some difficulty moving them forward:
>>>
>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>> customization to happen
>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>> wrong place.
>>>
>>> I'm not exactly sure of how to get attention of PRs that have been
>>> sitting around for a while, but these are really important to our
>>> use-cases, and it would be nice to have them merged in.
>>>
>>> Cheers
>>> Andrew
>>>
>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau 
>>> wrote:
>>> >
>>> > I'd like to add/backport the logging in
>>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>>> issues with dynamic allocation we can better debug what's going on.
>>> >
>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>>> >>
>>> >> There is one item on 

Re: Apache Spark 3.3 Release

2022-03-18 Thread Dongjoon Hyun
Thank you for your summarization.

I believe we need to have a discussion in order to evaluate each PR's
readiness.

BTW, `branch-3.3` is still open for bug fixes including minor dependency
changes like the following.

(Backported)
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5

(Upcoming)
[SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
[SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0

Dongjoon.



On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk 
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>1. SPARK-37396: Inline type hint files for files in
>python/pyspark/mllib
>2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>3. SPARK-37093: Inline type hints python/pyspark/streaming
>4. SPARK-37377: Refactor V2 Partitioning interface and remove
>deprecated usage of Distribution
>5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>sources
>6. SPARK-32268: Bloom Filter Join
>7. SPARK-38548: New SQL function: try_sum
>8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>9. SPARK-38063: Support SQL split_part function
>10. SPARK-28516: Data Type Formatting Functions: `to_char`
>11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>filter by self way
>12. SPARK-34863: Support nested column in Spark Parquet vectorized
>readers
>13. SPARK-38194: Make Yarn memory overhead factor configurable
>14. SPARK-37618: Support cleaning up shuffle blocks from external
>shuffle service
>15. SPARK-37831: Add task partition id in metrics
>16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>17. SPARK-36664: Log time spent waiting for cluster resources
>18. SPARK-34659: Web UI does not correctly get appId
>19. SPARK-37650: Tell spark-env.sh the python interpreter
>20. SPARK-38589: New SQL function: try_avg
>21. SPARK-38590: New SQL function: try_to_binary
>22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> 
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors 
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltn...@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>- [SPARK-38548][SQL] New SQL function: try_sum
>>
>>- [SPARK-38589][SQL] New SQL function: try_avg
>>
>>- [SPARK-38590][SQL] New SQL function: try_to_binary
>>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo 
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>> SparkUI reverse proxy-handling code where it does a greedy match for
>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>> wrong place.
>>
>> I'm not exactly sure of how to get attention of PRs that have been
>> sitting around for a while, but these are really important to our
>> use-cases, and it would be nice to have them merged in.
>>
>> Cheers
>> Andrew
>>
>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau 
>> wrote:
>> >
>> > I'd like to add/backport the logging in
>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>> issues with dynamic allocation we can better debug what's going on.
>> >
>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>> >>
>> >> There is one item on our side that we want to backport to 3.3:
>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>> >>
>> >> It's already reviewed and approved.
>> >>
>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>  wrote:
>> >> >
>> >> > It looks like the version hasn't been updated on master and still
>> shows 

Re: Apache Spark 3.3 Release

2022-03-18 Thread Maxim Gekk
Hi All,

Here is the allow list which I built based on your requests in this thread:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-32268: Bloom Filter Join
   7. SPARK-38548: New SQL function: try_sum
   8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   9. SPARK-38063: Support SQL split_part function
   10. SPARK-28516: Data Type Formatting Functions: `to_char`
   11. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   12. SPARK-34863: Support nested column in Spark Parquet vectorized
   readers
   13. SPARK-38194: Make Yarn memory overhead factor configurable
   14. SPARK-37618: Support cleaning up shuffle blocks from external
   shuffle service
   15. SPARK-37831: Add task partition id in metrics
   16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   17. SPARK-36664: Log time spent waiting for cluster resources
   18. SPARK-34659: Web UI does not correctly get appId
   19. SPARK-37650: Tell spark-env.sh the python interpreter
   20. SPARK-38589: New SQL function: try_avg
   21. SPARK-38590: New SQL function: try_to_binary
   22. SPARK-34079: Improvement CTE table scan

Best regards,
Max Gekk


On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:

> Is the feature freeze target date March 22nd then?  I saw a few dates
> thrown around want to confirm what we landed on
>
> I am trying to get the following improvements finished review and in, if
> concerns with either, let me know:
> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
> 
> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for
> released executors 
>
> Tom
>
>
> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
> ltn...@gmail.com> wrote:
>
>
> I'd like to add the following new SQL functions in the 3.3 release. These
> functions are useful when overflow or encoding errors occur:
>
>- [SPARK-38548][SQL] New SQL function: try_sum
>
>- [SPARK-38589][SQL] New SQL function: try_avg
>
>- [SPARK-38590][SQL] New SQL function: try_to_binary
>
>
> Gengliang
>
> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:
>
> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> 

Re: Apache Spark 3.3 Release

2022-03-17 Thread Tom Graves
 Is the feature freeze target date March 22nd then?  I saw a few dates thrown 
around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if 
concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated 
scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle 
service for released executors
Tom

On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang 
 wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These 
functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
>
> I'd like to add/backport the logging in 
> https://github.com/apache/spark/pull/35881 PR so that when users submit 
> issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves  
>> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 
>> > 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
>> >  wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping 
>> > me in the PR. Regarding new features, we are still building the allow list 
>> > for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun  
>> > wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the 
>> > branch cut date of Spark 3.3. Today? or this Friday? This is not a big 
>> > deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the 
>> > dependency upgrade after the branch cut. To make our release time more 
>> > predictable, I am suggesting we should finalize the exception PR list 
>> > first, instead of merging them in an ad hoc way. In the past, we spent a 
>> > lot of time on the revert of the PRs that were merged after the branch 
>> > cut. I hope we can minimize unnecessary arguments in this release. Do you 
>> > agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of 
>> > plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect 
>> > the list and make a list of exceptions. I'm not blocking what you want to 
>> > do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all 
>> > patches to land Apache Spark 3.3. That is totally fine. After we cut the 
>> > branch, we should avoid merging the feature work. In the next three days, 
>> > let us collect the actively developed PRs that we want to make an 
>> > exception (i.e., merged to 3.3 after the upcoming branch cut). Does that 
>> > make sense?
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land 
>> > Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well 
>> > > discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 

Re: Apache Spark 3.3 Release

2022-03-17 Thread Gengliang Wang
I'd like to add the following new SQL functions in the 3.3 release. These
functions are useful when overflow or encoding errors occur:

   - [SPARK-38548][SQL] New SQL function: try_sum
   
   - [SPARK-38589][SQL] New SQL function: try_avg
   
   - [SPARK-38590][SQL] New SQL function: try_to_binary
   

Gengliang

On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
> >> >
> >> > Xiao. You are working against what you are saying.
> >> > If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
> >> >
> >> > > we need to avoid backporting the feature work that are not being
> well discussed.
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
> wrote:
> >> >
> >> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Andrew Melo
Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
>
> I'd like to add/backport the logging in 
> https://github.com/apache/spark/pull/35881 PR so that when users submit 
> issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves  
>> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 
>> > 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
>> >  wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping 
>> > me in the PR. Regarding new features, we are still building the allow list 
>> > for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun  
>> > wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the 
>> > branch cut date of Spark 3.3. Today? or this Friday? This is not a big 
>> > deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the 
>> > dependency upgrade after the branch cut. To make our release time more 
>> > predictable, I am suggesting we should finalize the exception PR list 
>> > first, instead of merging them in an ad hoc way. In the past, we spent a 
>> > lot of time on the revert of the PRs that were merged after the branch 
>> > cut. I hope we can minimize unnecessary arguments in this release. Do you 
>> > agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of 
>> > plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect 
>> > the list and make a list of exceptions. I'm not blocking what you want to 
>> > do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all 
>> > patches to land Apache Spark 3.3. That is totally fine. After we cut the 
>> > branch, we should avoid merging the feature work. In the next three days, 
>> > let us collect the actively developed PRs that we want to make an 
>> > exception (i.e., merged to 3.3 after the upcoming branch cut). Does that 
>> > make sense?
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land 
>> > Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well 
>> > > discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>> >
>> > Cutting the branch is simple, but we need to avoid backporting the feature 
>> > work that are not being well discussed. Not all the members are actively 
>> > following the dev list. I think we should wait 3 more days for collecting 
>> > the PR list before cutting the branch.
>> >
>> > BTW, there are very few 3.4-only feature work that will be affected.
>> >
>> > Xiao
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>> >
>> > Hi, Max, Chao, Xiao, Holden and all.
>> >
>> > I have a different idea.
>> >
>> > Given the situation and small patch list, I don't think we need to 
>> > postpone the branch cut for those patches. It's easier to cut a branch-3.3 
>> > and allow backporting.
>> >
>> > As of today, we already have an obvious Apache Spark 3.4 patch in the 
>> > branch together. This 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Holden Karau
I'd like to add/backport the logging in
https://github.com/apache/spark/pull/35881 PR so that when users submit
issues with dynamic allocation we can better debug what's going on.

On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:

> There is one item on our side that we want to backport to 3.3:
> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>
> It's already reviewed and approved.
>
> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >
> > It looks like the version hasn't been updated on master and still shows
> 3.3.0-SNAPSHOT, can you please update that.
> >
> > Tom
> >
> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >
> >
> > Hi All,
> >
> > I have created the branch for Spark 3.3:
> > https://github.com/apache/spark/commits/branch-3.3
> >
> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >
> > Best regards,
> > Max Gekk
> >
> >
> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun 
> wrote:
> >
> > Yes, I agree with you for your whitelist approach for backporting. :)
> > Thank you for summarizing.
> >
> > Thanks,
> > Dongjoon.
> >
> >
> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
> >
> > I think I finally got your point. What you want to keep unchanged is the
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
> >
> > My major concern is whether we should keep merging the feature work or
> the dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >
> >
> >
> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
> >
> > That is not totally fine, Xiao. It sounds like you are asking a change
> of plan without a proper reason.
> >
> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >
> > Please let the community start to ramp down as we agreed before.
> >
> > Dongjoon
> >
> >
> >
> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
> >
> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >
> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
> >
> > Xiao. You are working against what you are saying.
> > If you don't cut a branch, it means you are allowing all patches to land
> Apache Spark 3.3. No?
> >
> > > we need to avoid backporting the feature work that are not being well
> discussed.
> >
> >
> >
> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
> >
> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
> >
> > BTW, there are very few 3.4-only feature work that will be affected.
> >
> > Xiao
> >
> > Dongjoon Hyun  于2022年3月15日周二 11:49写道:
> >
> > Hi, Max, Chao, Xiao, Holden and all.
> >
> > I have a different idea.
> >
> > Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
> >
> > As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
> >
> > [SPARK-38335][SQL] Implement parser support for DEFAULT column values
> >
> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
> >
> > Best,
> > Dongjoon.
> >
> >
> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
> >
> > Cool, thanks for clarifying!
> >
> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> > >>
> > >> For the following list:
> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> > >> Do you mean we should include them, or exclude them from 3.3?
> > >
> > >
> > > If possible, I hope these features can be shipped with Spark 3.3.
> > >
> > >
> > >
> > > Chao Sun  于2022年3月15日周二 10:06写道:
> > >>
> > >> Hi Xiao,
> > >>
> > 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Chao Sun
There is one item on our side that we want to backport to 3.3:
- vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
Parquet V2 support (https://github.com/apache/spark/pull/35262)

It's already reviewed and approved.

On Wed, Mar 16, 2022 at 9:13 AM Tom Graves  wrote:
>
> It looks like the version hasn't been updated on master and still shows 
> 3.3.0-SNAPSHOT, can you please update that.
>
> Tom
>
> On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
>  wrote:
>
>
> Hi All,
>
> I have created the branch for Spark 3.3:
> https://github.com/apache/spark/commits/branch-3.3
>
> Please, backport important fixes to it, and if you have some doubts, ping me 
> in the PR. Regarding new features, we are still building the allow list for 
> branch-3.3.
>
> Best regards,
> Max Gekk
>
>
> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun  wrote:
>
> Yes, I agree with you for your whitelist approach for backporting. :)
> Thank you for summarizing.
>
> Thanks,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>
> I think I finally got your point. What you want to keep unchanged is the 
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
>
> My major concern is whether we should keep merging the feature work or the 
> dependency upgrade after the branch cut. To make our release time more 
> predictable, I am suggesting we should finalize the exception PR list first, 
> instead of merging them in an ad hoc way. In the past, we spent a lot of time 
> on the revert of the PRs that were merged after the branch cut. I hope we can 
> minimize unnecessary arguments in this release. Do you agree, Dongjoon?
>
>
>
> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>
> That is not totally fine, Xiao. It sounds like you are asking a change of 
> plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can collect 
> the list and make a list of exceptions. I'm not blocking what you want to do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>
> Please do not get me wrong. If we don't cut a branch, we are allowing all 
> patches to land Apache Spark 3.3. That is totally fine. After we cut the 
> branch, we should avoid merging the feature work. In the next three days, let 
> us collect the actively developed PRs that we want to make an exception 
> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>
> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>
> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to land 
> Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being well 
> > discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>
> Cutting the branch is simple, but we need to avoid backporting the feature 
> work that are not being well discussed. Not all the members are actively 
> following the dev list. I think we should wait 3 more days for collecting the 
> PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to postpone 
> the branch cut for those patches. It's easier to cut a branch-3.3 and allow 
> backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the branch 
> together. This situation only becomes worse and worse because there is no way 
> to block the other patches from landing unintentionally if we don't cut a 
> branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>
> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized 
> >> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun  于2022年3月15日周二 10:06写道:
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized 
> >> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  
> >> wrote:
> >> >
> >> > The following was tested and merged a few minutes 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Tom Graves
 It looks like the version hasn't been updated on master and still shows 
3.3.0-SNAPSHOT, can you please update that. 
Tom
On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
 wrote:  
 
 Hi All,

I have created the branch for Spark 3.3:
https://github.com/apache/spark/commits/branch-3.3

Please, backport important fixes to it, and if you have some doubts, ping me in 
the PR. Regarding new features, we are still building the allow list for 
branch-3.3.
Best regards,Max Gekk

On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun  wrote:

Yes, I agree with you for your whitelist approach for backporting. :)Thank you 
for summarizing.

Thanks,Dongjoon.

On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:

I think I finally got your point. What you want to keep unchanged is the branch 
cut date of Spark 3.3. Today? or this Friday? This is not a big deal. 
My major concern is whether we should keep merging the feature work or the 
dependency upgrade after the branch cut. To make our release time more 
predictable, I am suggesting we should finalize the exception PR list first, 
instead of merging them in an ad hoc way. In the past, we spent a lot of time 
on the revert of the PRs that were merged after the branch cut. I hope we can 
minimize unnecessary arguments in this release. Do you agree, Dongjoon?


Dongjoon Hyun  于2022年3月15日周二 15:55写道:

That is not totally fine, Xiao. It sounds like you are asking a change of plan 
without a proper reason.
Although we cut the branch Today according our plan, you still can collect the 
list and make a list of exceptions. I'm not blocking what you want to do.
Please let the community start to ramp down as we agreed before.
Dongjoon


On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:

Please do not get me wrong. If we don't cut a branch, we are allowing all 
patches to land Apache Spark 3.3. That is totally fine. After we cut the 
branch, we should avoid merging the feature work. In the next three days, let 
us collect the actively developed PRs that we want to make an exception (i.e., 
merged to 3.3 after the upcoming branch cut). Does that make sense?
Dongjoon Hyun  于2022年3月15日周二 14:54写道:

Xiao. You are working against what you are saying.If you don't cut a branch, it 
means you are allowing all patches to land Apache Spark 3.3. No?

> we need to avoid backporting the feature work that are not being well 
> discussed.


On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:

Cutting the branch is simple, but we need to avoid backporting the feature work 
that are not being well discussed. Not all the members are actively following 
the dev list. I think we should wait 3 more days for collecting the PR list 
before cutting the branch. 
BTW, there are very few 3.4-only feature work that will be affected.

Xiao
Dongjoon Hyun  于2022年3月15日周二 11:49写道:

Hi, Max, Chao, Xiao, Holden and all.
I have a different idea.
Given the situation and small patch list, I don't think we need to postpone the 
branch cut for those patches. It's easier to cut a branch-3.3 and allow 
backporting.
As of today, we already have an obvious Apache Spark 3.4 patch in the branch 
together. This situation only becomes worse and worse because there is no way 
to block the other patches from landing unintentionally if we don't cut a 
branch.
    [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
Best,
Dongjoon.

On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:

Cool, thanks for clarifying!

On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>>
>> For the following list:
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> Do you mean we should include them, or exclude them from 3.3?
>
>
> If possible, I hope these features can be shipped with Spark 3.3.
>
>
>
> Chao Sun  于2022年3月15日周二 10:06写道:
>>
>> Hi Xiao,
>>
>> For the following list:
>>
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>
>> Do you mean we should include them, or exclude them from 3.3?
>>
>> Thanks,
>> Chao
>>
>> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  
>> wrote:
>> >
>> > The following was tested and merged a few minutes ago. So, we can remove 
>> > it from the list.
>> >
>> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
>> >>
>> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to 
>> >> collect the list of actively developed PRs that we want to merge to 3.3 
>> >> after the branch cut?
>> >>
>> >> Please do not rush to merge the PRs that are not fully reviewed. We can 
>> >> cut the branch this Friday and continue merging the PRs that have been 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Jacky Lee
I also have a PR that has been ready to merge for a while, can we merge in
3.3.0?
[SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics
https://github.com/apache/spark/pull/35185

Adam Binford  于2022年3月16日周三 21:16写道:

> Also throwing my hat in for two of my PRs that should be ready just need
> final reviews/approval:
> Removing shuffles from deallocated executors using the shuffle service:
> https://github.com/apache/spark/pull/35085. This has been asked for for
> several years across many issues.
> Configurable memory overhead factor:
> https://github.com/apache/spark/pull/35504
>
> Adam
>
> On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan  wrote:
>
>> +1 to define an allowlist of features that we want to backport to branch
>> 3.3. I also have a few in my mind
>> complex type support in vectorized parquet reader:
>> https://github.com/apache/spark/pull/34659
>> refine the DS v2 filter API for JDBC v2:
>> https://github.com/apache/spark/pull/35768
>> a few new SQL functions that have been in development for a while:
>> to_char, split_part, percentile_disc, try_sum, etc.
>>
>> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
>>  wrote:
>>
>>> Hi All,
>>>
>>> I have created the branch for Spark 3.3:
>>> https://github.com/apache/spark/commits/branch-3.3
>>>
>>> Please, backport important fixes to it, and if you have some doubts,
>>> ping me in the PR. Regarding new features, we are still building the allow
>>> list for branch-3.3.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun 
>>> wrote:
>>>
 Yes, I agree with you for your whitelist approach for backporting. :)
 Thank you for summarizing.

 Thanks,
 Dongjoon.


 On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:

> I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
>
> My major concern is whether we should keep merging the feature work or
> the dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch 
> cut.
> I hope we can minimize unnecessary arguments in this release. Do you 
> agree,
> Dongjoon?
>
>
>
> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>
>> That is not totally fine, Xiao. It sounds like you are asking a
>> change of plan without a proper reason.
>>
>> Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>>
>> Please let the community start to ramp down as we agreed before.
>>
>> Dongjoon
>>
>>
>>
>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>>
>>> Please do not get me wrong. If we don't cut a branch, we are
>>> allowing all patches to land Apache Spark 3.3. That is totally fine. 
>>> After
>>> we cut the branch, we should avoid merging the feature work. In the next
>>> three days, let us collect the actively developed PRs that we want to 
>>> make
>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does 
>>> that
>>> make sense?
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>>
 Xiao. You are working against what you are saying.
 If you don't cut a branch, it means you are allowing all patches to
 land Apache Spark 3.3. No?

 > we need to avoid backporting the feature work that are not being
 well discussed.



 On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
 wrote:

> Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members 
> are
> actively following the dev list. I think we should wait 3 more days 
> for
> collecting the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be
> affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need
>> to postpone the branch cut for those patches. It's easier to cut a
>> branch-3.3 and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in
>> the branch together. This situation only becomes worse and worse 
>> because
>> there is no way to block the other patches from landing 
>> 

Re: Apache Spark 3.3 Release

2022-03-16 Thread Jacky Lee
I also have a PR that has been ready to merge for a while, can we merge in
3.3.0?
[SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics
https://github.com/apache/spark/pull/35185

beliefer  于2022年3月16日周三 21:33写道:

> +1 Glad to see we will release 3.3.0.
>
>
> At 2022-03-04 02:44:37, "Maxim Gekk" 
> wrote:
>
> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>
>
>
>
>


Re: Apache Spark 3.3 Release

2022-03-16 Thread Adam Binford
Also throwing my hat in for two of my PRs that should be ready just need
final reviews/approval:
Removing shuffles from deallocated executors using the shuffle service:
https://github.com/apache/spark/pull/35085. This has been asked for for
several years across many issues.
Configurable memory overhead factor:
https://github.com/apache/spark/pull/35504

Adam

On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan  wrote:

> +1 to define an allowlist of features that we want to backport to branch
> 3.3. I also have a few in my mind
> complex type support in vectorized parquet reader:
> https://github.com/apache/spark/pull/34659
> refine the DS v2 filter API for JDBC v2:
> https://github.com/apache/spark/pull/35768
> a few new SQL functions that have been in development for a while:
> to_char, split_part, percentile_disc, try_sum, etc.
>
> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
>  wrote:
>
>> Hi All,
>>
>> I have created the branch for Spark 3.3:
>> https://github.com/apache/spark/commits/branch-3.3
>>
>> Please, backport important fixes to it, and if you have some doubts, ping
>> me in the PR. Regarding new features, we are still building the allow list
>> for branch-3.3.
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun 
>> wrote:
>>
>>> Yes, I agree with you for your whitelist approach for backporting. :)
>>> Thank you for summarizing.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>>>
 I think I finally got your point. What you want to keep unchanged is
 the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
 deal.

 My major concern is whether we should keep merging the feature work or
 the dependency upgrade after the branch cut. To make our release time more
 predictable, I am suggesting we should finalize the exception PR list
 first, instead of merging them in an ad hoc way. In the past, we spent a
 lot of time on the revert of the PRs that were merged after the branch cut.
 I hope we can minimize unnecessary arguments in this release. Do you agree,
 Dongjoon?



 Dongjoon Hyun  于2022年3月15日周二 15:55写道:

> That is not totally fine, Xiao. It sounds like you are asking a change
> of plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>
>> Please do not get me wrong. If we don't cut a branch, we are allowing
>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>> the branch, we should avoid merging the feature work. In the next three
>> days, let us collect the actively developed PRs that we want to make an
>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>> make sense?
>>
>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>
>>> Xiao. You are working against what you are saying.
>>> If you don't cut a branch, it means you are allowing all patches to
>>> land Apache Spark 3.3. No?
>>>
>>> > we need to avoid backporting the feature work that are not being
>>> well discussed.
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
>>> wrote:
>>>
 Cutting the branch is simple, but we need to avoid backporting the
 feature work that are not being well discussed. Not all the members are
 actively following the dev list. I think we should wait 3 more days for
 collecting the PR list before cutting the branch.

 BTW, there are very few 3.4-only feature work that will be affected.

 Xiao

 Dongjoon Hyun  于2022年3月15日周二 11:49写道:

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a 
> branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in
> the branch together. This situation only becomes worse and worse 
> because
> there is no way to block the other patches from landing 
> unintentionally if
> we don't cut a branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column
> values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun 
> wrote:
>
>> Cool, thanks for clarifying!
>>

Re: Apache Spark 3.3 Release

2022-03-16 Thread Wenchen Fan
+1 to define an allowlist of features that we want to backport to branch
3.3. I also have a few in my mind
complex type support in vectorized parquet reader:
https://github.com/apache/spark/pull/34659
refine the DS v2 filter API for JDBC v2:
https://github.com/apache/spark/pull/35768
a few new SQL functions that have been in development for a while: to_char,
split_part, percentile_disc, try_sum, etc.

On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
 wrote:

> Hi All,
>
> I have created the branch for Spark 3.3:
> https://github.com/apache/spark/commits/branch-3.3
>
> Please, backport important fixes to it, and if you have some doubts, ping
> me in the PR. Regarding new features, we are still building the allow list
> for branch-3.3.
>
> Best regards,
> Max Gekk
>
>
> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun 
> wrote:
>
>> Yes, I agree with you for your whitelist approach for backporting. :)
>> Thank you for summarizing.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>>
>>> I think I finally got your point. What you want to keep unchanged is the
>>> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>>> deal.
>>>
>>> My major concern is whether we should keep merging the feature work or
>>> the dependency upgrade after the branch cut. To make our release time more
>>> predictable, I am suggesting we should finalize the exception PR list
>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>> Dongjoon?
>>>
>>>
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>>>
 That is not totally fine, Xiao. It sounds like you are asking a change
 of plan without a proper reason.

 Although we cut the branch Today according our plan, you still can
 collect the list and make a list of exceptions. I'm not blocking what you
 want to do.

 Please let the community start to ramp down as we agreed before.

 Dongjoon



 On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:

> Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
>
> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>
>> Xiao. You are working against what you are saying.
>> If you don't cut a branch, it means you are allowing all patches to
>> land Apache Spark 3.3. No?
>>
>> > we need to avoid backporting the feature work that are not being
>> well discussed.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
>> wrote:
>>
>>> Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>>
>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>>
 Hi, Max, Chao, Xiao, Holden and all.

 I have a different idea.

 Given the situation and small patch list, I don't think we need to
 postpone the branch cut for those patches. It's easier to cut a 
 branch-3.3
 and allow backporting.

 As of today, we already have an obvious Apache Spark 3.4 patch in
 the branch together. This situation only becomes worse and worse 
 because
 there is no way to block the other patches from landing 
 unintentionally if
 we don't cut a branch.

 [SPARK-38335][SQL] Implement parser support for DEFAULT column
 values

 Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

 Best,
 Dongjoon.


 On Tue, Mar 15, 2022 at 10:17 AM Chao Sun 
 wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
> wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >

Re: Apache Spark 3.3 Release

2022-03-16 Thread Maxim Gekk
Hi All,

I have created the branch for Spark 3.3:
https://github.com/apache/spark/commits/branch-3.3

Please, backport important fixes to it, and if you have some doubts, ping
me in the PR. Regarding new features, we are still building the allow list
for branch-3.3.

Best regards,
Max Gekk


On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun 
wrote:

> Yes, I agree with you for your whitelist approach for backporting. :)
> Thank you for summarizing.
>
> Thanks,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>
>> I think I finally got your point. What you want to keep unchanged is the
>> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>> deal.
>>
>> My major concern is whether we should keep merging the feature work or
>> the dependency upgrade after the branch cut. To make our release time more
>> predictable, I am suggesting we should finalize the exception PR list
>> first, instead of merging them in an ad hoc way. In the past, we spent a
>> lot of time on the revert of the PRs that were merged after the branch cut.
>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>> Dongjoon?
>>
>>
>>
>> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>>
>>> That is not totally fine, Xiao. It sounds like you are asking a change
>>> of plan without a proper reason.
>>>
>>> Although we cut the branch Today according our plan, you still can
>>> collect the list and make a list of exceptions. I'm not blocking what you
>>> want to do.
>>>
>>> Please let the community start to ramp down as we agreed before.
>>>
>>> Dongjoon
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>>>
 Please do not get me wrong. If we don't cut a branch, we are allowing
 all patches to land Apache Spark 3.3. That is totally fine. After we cut
 the branch, we should avoid merging the feature work. In the next three
 days, let us collect the actively developed PRs that we want to make an
 exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
 make sense?

 Dongjoon Hyun  于2022年3月15日周二 14:54写道:

> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being
> well discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>
>> Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>>
>> BTW, there are very few 3.4-only feature work that will be affected.
>>
>> Xiao
>>
>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>
>>> Hi, Max, Chao, Xiao, Holden and all.
>>>
>>> I have a different idea.
>>>
>>> Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a 
>>> branch-3.3
>>> and allow backporting.
>>>
>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>> the branch together. This situation only becomes worse and worse because
>>> there is no way to block the other patches from landing unintentionally 
>>> if
>>> we don't cut a branch.
>>>
>>> [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>> values
>>>
>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun 
>>> wrote:
>>>
 Cool, thanks for clarifying!

 On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
 wrote:
 >>
 >> For the following list:
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >> Do you mean we should include them, or exclude them from 3.3?
 >
 >
 > If possible, I hope these features can be shipped with Spark 3.3.
 >
 >
 >
 > Chao Sun  于2022年3月15日周二 10:06写道:
 >>
 >> Hi Xiao,
 >>
 >> For the following list:
 >>
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >>
 >> Do you mean we should include them, or exclude them from 3.3?
 >>
 >> Thanks,
 >> Chao
 >>
 >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Yes, I agree with you for your whitelist approach for backporting. :)
Thank you for summarizing.

Thanks,
Dongjoon.


On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:

> I think I finally got your point. What you want to keep unchanged is the
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
>
> My major concern is whether we should keep merging the feature work or the
> dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
>
>
>
> Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>
>> That is not totally fine, Xiao. It sounds like you are asking a change of
>> plan without a proper reason.
>>
>> Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>>
>> Please let the community start to ramp down as we agreed before.
>>
>> Dongjoon
>>
>>
>>
>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>>
>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>> the branch, we should avoid merging the feature work. In the next three
>>> days, let us collect the actively developed PRs that we want to make an
>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>>
 Xiao. You are working against what you are saying.
 If you don't cut a branch, it means you are allowing all patches to
 land Apache Spark 3.3. No?

 > we need to avoid backporting the feature work that are not being well
 discussed.



 On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:

> Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a 
>> branch-3.3
>> and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>> branch together. This situation only becomes worse and worse because 
>> there
>> is no way to block the other patches from landing unintentionally if we
>> don't cut a branch.
>>
>> [SPARK-38335][SQL] Implement parser support for DEFAULT column
>> values
>>
>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>
>> Best,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>
>>> Cool, thanks for clarifying!
>>>
>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
>>> wrote:
>>> >>
>>> >> For the following list:
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >
>>> >
>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>> >
>>> >
>>> >
>>> > Chao Sun  于2022年3月15日周二 10:06写道:
>>> >>
>>> >> Hi Xiao,
>>> >>
>>> >> For the following list:
>>> >>
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >>
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >>
>>> >> Thanks,
>>> >> Chao
>>> >>
>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >> >
>>> >> > The following was tested and merged a few minutes ago. So, we
>>> can remove it from the list.
>>> >> >
>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>>> wrote:
>>> >> >>
>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>> days to collect the list of actively 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
I think I finally got your point. What you want to keep unchanged is the
branch cut date of Spark 3.3. Today? or this Friday? This is not a big
deal.

My major concern is whether we should keep merging the feature work or the
dependency upgrade after the branch cut. To make our release time more
predictable, I am suggesting we should finalize the exception PR list
first, instead of merging them in an ad hoc way. In the past, we spent a
lot of time on the revert of the PRs that were merged after the branch cut.
I hope we can minimize unnecessary arguments in this release. Do you agree,
Dongjoon?



Dongjoon Hyun  于2022年3月15日周二 15:55写道:

> That is not totally fine, Xiao. It sounds like you are asking a change of
> plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can collect
> the list and make a list of exceptions. I'm not blocking what you want to
> do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>
>> Please do not get me wrong. If we don't cut a branch, we are allowing all
>> patches to land Apache Spark 3.3. That is totally fine. After we cut the
>> branch, we should avoid merging the feature work. In the next three days,
>> let us collect the actively developed PRs that we want to make an exception
>> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>>
>> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>>
>>> Xiao. You are working against what you are saying.
>>> If you don't cut a branch, it means you are allowing all patches to land
>>> Apache Spark 3.3. No?
>>>
>>> > we need to avoid backporting the feature work that are not being well
>>> discussed.
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>>>
 Cutting the branch is simple, but we need to avoid backporting the
 feature work that are not being well discussed. Not all the members are
 actively following the dev list. I think we should wait 3 more days for
 collecting the PR list before cutting the branch.

 BTW, there are very few 3.4-only feature work that will be affected.

 Xiao

 Dongjoon Hyun  于2022年3月15日周二 11:49写道:

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column
> values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>
>> Cool, thanks for clarifying!
>>
>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li 
>> wrote:
>> >>
>> >> For the following list:
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >
>> >
>> > If possible, I hope these features can be shipped with Spark 3.3.
>> >
>> >
>> >
>> > Chao Sun  于2022年3月15日周二 10:06写道:
>> >>
>> >> Hi Xiao,
>> >>
>> >> For the following list:
>> >>
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >>
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >>
>> >> Thanks,
>> >> Chao
>> >>
>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> >> >
>> >> > The following was tested and merged a few minutes ago. So, we
>> can remove it from the list.
>> >> >
>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>> wrote:
>> >> >>
>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>> days to collect the list of actively developed PRs that we want to merge 
>> to
>> 3.3 after the branch cut?
>> >> >>
>> >> >> Please do not rush to merge the PRs that are not fully
>> reviewed. We can cut the branch this Friday and continue merging the PRs
>> that have been discussed in this thread. Does that make sense?
>> >> >>
>> >> 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
That is not totally fine, Xiao. It sounds like you are asking a change of
plan without a proper reason.

Although we cut the branch Today according our plan, you still can collect
the list and make a list of exceptions. I'm not blocking what you want to
do.

Please let the community start to ramp down as we agreed before.

Dongjoon



On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:

> Please do not get me wrong. If we don't cut a branch, we are allowing all
> patches to land Apache Spark 3.3. That is totally fine. After we cut the
> branch, we should avoid merging the feature work. In the next three days,
> let us collect the actively developed PRs that we want to make an exception
> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>
> Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>
>> Xiao. You are working against what you are saying.
>> If you don't cut a branch, it means you are allowing all patches to land
>> Apache Spark 3.3. No?
>>
>> > we need to avoid backporting the feature work that are not being well
>> discussed.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>>
>>> Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>>
>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>>
 Hi, Max, Chao, Xiao, Holden and all.

 I have a different idea.

 Given the situation and small patch list, I don't think we need to
 postpone the branch cut for those patches. It's easier to cut a branch-3.3
 and allow backporting.

 As of today, we already have an obvious Apache Spark 3.4 patch in the
 branch together. This situation only becomes worse and worse because there
 is no way to block the other patches from landing unintentionally if we
 don't cut a branch.

 [SPARK-38335][SQL] Implement parser support for DEFAULT column
 values

 Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

 Best,
 Dongjoon.


 On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun  于2022年3月15日周二 10:06写道:
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
> wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
> days to collect the list of actively developed PRs that we want to merge 
> to
> 3.3 after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed.
> We can cut the branch this Friday and continue merging the PRs that have
> been discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone
> a bit of breathing space? Rushed software development more often results 
> in
> bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
> yikunk...@gmail.com> wrote:
> >> 
> >>  > To make our release time more predictable, let us collect
> the PRs and wait three more days before the branch cut?
> >> 
> >>  For SPIP: Support Customized Kubernetes Schedulers:
> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> 
> >>  Three more days are OK for this from my view.
> >> 

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Please do not get me wrong. If we don't cut a branch, we are allowing all
patches to land Apache Spark 3.3. That is totally fine. After we cut the
branch, we should avoid merging the feature work. In the next three days,
let us collect the actively developed PRs that we want to make an exception
(i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?

Dongjoon Hyun  于2022年3月15日周二 14:54写道:

> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to land
> Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being well
> discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:
>
>> Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>>
>> BTW, there are very few 3.4-only feature work that will be affected.
>>
>> Xiao
>>
>> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>>
>>> Hi, Max, Chao, Xiao, Holden and all.
>>>
>>> I have a different idea.
>>>
>>> Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>>
>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>> branch together. This situation only becomes worse and worse because there
>>> is no way to block the other patches from landing unintentionally if we
>>> don't cut a branch.
>>>
>>> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>>
>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>>
 Cool, thanks for clarifying!

 On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
 >>
 >> For the following list:
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >> Do you mean we should include them, or exclude them from 3.3?
 >
 >
 > If possible, I hope these features can be shipped with Spark 3.3.
 >
 >
 >
 > Chao Sun  于2022年3月15日周二 10:06写道:
 >>
 >> Hi Xiao,
 >>
 >> For the following list:
 >>
 >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
 >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
 vectorized reader
 >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
 >>
 >> Do you mean we should include them, or exclude them from 3.3?
 >>
 >> Thanks,
 >> Chao
 >>
 >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 >> >
 >> > The following was tested and merged a few minutes ago. So, we can
 remove it from the list.
 >> >
 >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
 >> >
 >> > Thanks,
 >> > Dongjoon.
 >> >
 >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
 wrote:
 >> >>
 >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
 to collect the list of actively developed PRs that we want to merge to 3.3
 after the branch cut?
 >> >>
 >> >> Please do not rush to merge the PRs that are not fully reviewed.
 We can cut the branch this Friday and continue merging the PRs that have
 been discussed in this thread. Does that make sense?
 >> >>
 >> >> Xiao
 >> >>
 >> >>
 >> >>
 >> >> Holden Karau  于2022年3月15日周二 09:10写道:
 >> >>>
 >> >>> May I suggest we push out one week (22nd) just to give everyone
 a bit of breathing space? Rushed software development more often results in
 bugs.
 >> >>>
 >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
 wrote:
 >> 
 >>  > To make our release time more predictable, let us collect the
 PRs and wait three more days before the branch cut?
 >> 
 >>  For SPIP: Support Customized Kubernetes Schedulers:
 >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
 >> 
 >>  Three more days are OK for this from my view.
 >> 
 >>  Regards,
 >>  Yikun
 >> >>>
 >> >>> --
 >> >>> Twitter: https://twitter.com/holdenkarau
 >> >>> Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9
 >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Xiao. You are working against what you are saying.
If you don't cut a branch, it means you are allowing all patches to land
Apache Spark 3.3. No?

> we need to avoid backporting the feature work that are not being well
discussed.



On Tue, Mar 15, 2022 at 12:12 PM Xiao Li  wrote:

> Cutting the branch is simple, but we need to avoid backporting the feature
> work that are not being well discussed. Not all the members are actively
> following the dev list. I think we should wait 3 more days for collecting
> the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun  于2022年3月15日周二 11:49写道:
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>> branch together. This situation only becomes worse and worse because there
>> is no way to block the other patches from landing unintentionally if we
>> don't cut a branch.
>>
>> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>
>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>
>> Best,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>>
>>> Cool, thanks for clarifying!
>>>
>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>>> >>
>>> >> For the following list:
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >
>>> >
>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>> >
>>> >
>>> >
>>> > Chao Sun  于2022年3月15日周二 10:06写道:
>>> >>
>>> >> Hi Xiao,
>>> >>
>>> >> For the following list:
>>> >>
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >>
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >>
>>> >> Thanks,
>>> >> Chao
>>> >>
>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >> >
>>> >> > The following was tested and merged a few minutes ago. So, we can
>>> remove it from the list.
>>> >> >
>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>>> wrote:
>>> >> >>
>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>>> to collect the list of actively developed PRs that we want to merge to 3.3
>>> after the branch cut?
>>> >> >>
>>> >> >> Please do not rush to merge the PRs that are not fully reviewed.
>>> We can cut the branch this Friday and continue merging the PRs that have
>>> been discussed in this thread. Does that make sense?
>>> >> >>
>>> >> >> Xiao
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
>>> >> >>>
>>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>>> bit of breathing space? Rushed software development more often results in
>>> bugs.
>>> >> >>>
>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
>>> wrote:
>>> >> 
>>> >>  > To make our release time more predictable, let us collect the
>>> PRs and wait three more days before the branch cut?
>>> >> 
>>> >>  For SPIP: Support Customized Kubernetes Schedulers:
>>> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> 
>>> >>  Three more days are OK for this from my view.
>>> >> 
>>> >>  Regards,
>>> >>  Yikun
>>> >> >>>
>>> >> >>> --
>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Cutting the branch is simple, but we need to avoid backporting the feature
work that are not being well discussed. Not all the members are actively
following the dev list. I think we should wait 3 more days for collecting
the PR list before cutting the branch.

BTW, there are very few 3.4-only feature work that will be affected.

Xiao

Dongjoon Hyun  于2022年3月15日周二 11:49写道:

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
>
> [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:
>
>> Cool, thanks for clarifying!
>>
>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>> >>
>> >> For the following list:
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >
>> >
>> > If possible, I hope these features can be shipped with Spark 3.3.
>> >
>> >
>> >
>> > Chao Sun  于2022年3月15日周二 10:06写道:
>> >>
>> >> Hi Xiao,
>> >>
>> >> For the following list:
>> >>
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >>
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >>
>> >> Thanks,
>> >> Chao
>> >>
>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
>> wrote:
>> >> >
>> >> > The following was tested and merged a few minutes ago. So, we can
>> remove it from the list.
>> >> >
>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li 
>> wrote:
>> >> >>
>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>> to collect the list of actively developed PRs that we want to merge to 3.3
>> after the branch cut?
>> >> >>
>> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
>> can cut the branch this Friday and continue merging the PRs that have been
>> discussed in this thread. Does that make sense?
>> >> >>
>> >> >> Xiao
>> >> >>
>> >> >>
>> >> >>
>> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
>> >> >>>
>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>> bit of breathing space? Rushed software development more often results in
>> bugs.
>> >> >>>
>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
>> wrote:
>> >> 
>> >>  > To make our release time more predictable, let us collect the
>> PRs and wait three more days before the branch cut?
>> >> 
>> >>  For SPIP: Support Customized Kubernetes Schedulers:
>> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> 
>> >>  Three more days are OK for this from my view.
>> >> 
>> >>  Regards,
>> >>  Yikun
>> >> >>>
>> >> >>> --
>> >> >>> Twitter: https://twitter.com/holdenkarau
>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Hi, Max, Chao, Xiao, Holden and all.

I have a different idea.

Given the situation and small patch list, I don't think we need to postpone
the branch cut for those patches. It's easier to cut a branch-3.3 and allow
backporting.

As of today, we already have an obvious Apache Spark 3.4 patch in the
branch together. This situation only becomes worse and worse because there
is no way to block the other patches from landing unintentionally if we
don't cut a branch.

[SPARK-38335][SQL] Implement parser support for DEFAULT column values

Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

Best,
Dongjoon.


On Tue, Mar 15, 2022 at 10:17 AM Chao Sun  wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun  于2022年3月15日周二 10:06写道:
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
> can cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone a
> bit of breathing space? Rushed software development more often results in
> bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
> wrote:
> >> 
> >>  > To make our release time more predictable, let us collect the
> PRs and wait three more days before the branch cut?
> >> 
> >>  For SPIP: Support Customized Kubernetes Schedulers:
> >>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> 
> >>  Three more days are OK for this from my view.
> >> 
> >>  Regards,
> >>  Yikun
> >> >>>
> >> >>> --
> >> >>> Twitter: https://twitter.com/holdenkarau
> >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Cool, thanks for clarifying!

On Tue, Mar 15, 2022 at 10:11 AM Xiao Li  wrote:
>>
>> For the following list:
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> Do you mean we should include them, or exclude them from 3.3?
>
>
> If possible, I hope these features can be shipped with Spark 3.3.
>
>
>
> Chao Sun  于2022年3月15日周二 10:06写道:
>>
>> Hi Xiao,
>>
>> For the following list:
>>
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>
>> Do you mean we should include them, or exclude them from 3.3?
>>
>> Thanks,
>> Chao
>>
>> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  
>> wrote:
>> >
>> > The following was tested and merged a few minutes ago. So, we can remove 
>> > it from the list.
>> >
>> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
>> >>
>> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to 
>> >> collect the list of actively developed PRs that we want to merge to 3.3 
>> >> after the branch cut?
>> >>
>> >> Please do not rush to merge the PRs that are not fully reviewed. We can 
>> >> cut the branch this Friday and continue merging the PRs that have been 
>> >> discussed in this thread. Does that make sense?
>> >>
>> >> Xiao
>> >>
>> >>
>> >>
>> >> Holden Karau  于2022年3月15日周二 09:10写道:
>> >>>
>> >>> May I suggest we push out one week (22nd) just to give everyone a bit of 
>> >>> breathing space? Rushed software development more often results in bugs.
>> >>>
>> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>> 
>>  > To make our release time more predictable, let us collect the PRs and 
>>  > wait three more days before the branch cut?
>> 
>>  For SPIP: Support Customized Kubernetes Schedulers:
>>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> 
>>  Three more days are OK for this from my view.
>> 
>>  Regards,
>>  Yikun
>> >>>
>> >>> --
>> >>> Twitter: https://twitter.com/holdenkarau
>> >>> Books (Learning Spark, High Performance Spark, etc.): 
>> >>> https://amzn.to/2MaRAG9
>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
>
> For the following list:
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> Do you mean we should include them, or exclude them from 3.3?


If possible, I hope these features can be shipped with Spark 3.3.



Chao Sun  于2022年3月15日周二 10:06写道:

> Hi Xiao,
>
> For the following list:
>
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>
> Do you mean we should include them, or exclude them from 3.3?
>
> Thanks,
> Chao
>
> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun 
> wrote:
> >
> > The following was tested and merged a few minutes ago. So, we can remove
> it from the list.
> >
> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
> >>
> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >>
> >> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >>
> >> Xiao
> >>
> >>
> >>
> >> Holden Karau  于2022年3月15日周二 09:10写道:
> >>>
> >>> May I suggest we push out one week (22nd) just to give everyone a bit
> of breathing space? Rushed software development more often results in bugs.
> >>>
> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang 
> wrote:
> 
>  > To make our release time more predictable, let us collect the PRs
> and wait three more days before the branch cut?
> 
>  For SPIP: Support Customized Kubernetes Schedulers:
>  #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> 
>  Three more days are OK for this from my view.
> 
>  Regards,
>  Yikun
> >>>
> >>> --
> >>> Twitter: https://twitter.com/holdenkarau
> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Hi Xiao,

For the following list:

#35789 [SPARK-32268][SQL] Row-level Runtime Filtering
#34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
#35848 [SPARK-38548][SQL] New SQL function: try_sum

Do you mean we should include them, or exclude them from 3.3?

Thanks,
Chao

On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun  wrote:
>
> The following was tested and merged a few minutes ago. So, we can remove it 
> from the list.
>
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>
> Thanks,
> Dongjoon.
>
> On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:
>>
>> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect 
>> the list of actively developed PRs that we want to merge to 3.3 after the 
>> branch cut?
>>
>> Please do not rush to merge the PRs that are not fully reviewed. We can cut 
>> the branch this Friday and continue merging the PRs that have been discussed 
>> in this thread. Does that make sense?
>>
>> Xiao
>>
>>
>>
>> Holden Karau  于2022年3月15日周二 09:10写道:
>>>
>>> May I suggest we push out one week (22nd) just to give everyone a bit of 
>>> breathing space? Rushed software development more often results in bugs.
>>>
>>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:

 > To make our release time more predictable, let us collect the PRs and 
 > wait three more days before the branch cut?

 For SPIP: Support Customized Kubernetes Schedulers:
 #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1

 Three more days are OK for this from my view.

 Regards,
 Yikun
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.): 
>>> https://amzn.to/2MaRAG9
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
The following was tested and merged a few minutes ago. So, we can remove it
from the list.

#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1


Thanks,
Dongjoon.

On Tue, Mar 15, 2022 at 9:48 AM Xiao Li  wrote:

> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
>
> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
>
> Xiao
>
>
>
>
> Holden Karau  于2022年3月15日周二 09:10写道:
>
>> May I suggest we push out one week (22nd) just to give everyone a bit of
>> breathing space? Rushed software development more often results in bugs.
>>
>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>>
>>> > To make our release time more predictable, let us collect the PRs and
>>> wait three more days before the branch cut?
>>>
>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> 
>>>
>>> Three more days are OK for this from my view.
>>>
>>> Regards,
>>> Yikun
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Let me clarify my above suggestion. Maybe we can wait 3 more days to
collect the list of actively developed PRs that we want to merge to 3.3
after the branch cut?

Please do not rush to merge the PRs that are not fully reviewed. We can cut
the branch this Friday and continue merging the PRs that have been
discussed in this thread. Does that make sense?

Xiao




Holden Karau  于2022年3月15日周二 09:10写道:

> May I suggest we push out one week (22nd) just to give everyone a bit of
> breathing space? Rushed software development more often results in bugs.
>
> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:
>
>> > To make our release time more predictable, let us collect the PRs and
>> wait three more days before the branch cut?
>>
>> For SPIP: Support Customized Kubernetes Schedulers:
>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> 
>>
>> Three more days are OK for this from my view.
>>
>> Regards,
>> Yikun
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of
breathing space? Rushed software development more often results in bugs.

On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang  wrote:

> > To make our release time more predictable, let us collect the PRs and
> wait three more days before the branch cut?
>
> For SPIP: Support Customized Kubernetes Schedulers:
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> 
>
> Three more days are OK for this from my view.
>
> Regards,
> Yikun
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Apache Spark 3.3 Release

2022-03-15 Thread Yikun Jiang
> To make our release time more predictable, let us collect the PRs and
wait three more days before the branch cut?

For SPIP: Support Customized Kubernetes Schedulers:
#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1


Three more days are OK for this from my view.

Regards,
Yikun


Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
To make our release time more predictable, let us collect the PRs and wait
three more days before the branch cut?

Please list all the actively developed feature work we plan to release with
Spark 3.3? We should avoid merging any new feature work that is not being
discussed in this email thread. Below is my list

   - #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
   
   - #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
   reader 
   - #35848 [SPARK-38548][SQL] New SQL function: try_sum
   




Chao Sun  于2022年3月14日周一 21:17写道:

> I mainly mean:
>
>   - [SPARK-35801] Row-level operations in Data Source V2
>   - [SPARK-37166] Storage Partitioned Join
>
> For which the PR:
>
> - https://github.com/apache/spark/pull/35395
> - https://github.com/apache/spark/pull/35657
>
> are actively being reviewed. It seems there are ongoing PRs for other
> SPIPs as well but I'm not involved in those so not quite sure whether
> they are intended for 3.3 release.
>
> Chao
>
>
> Chao
>
> On Mon, Mar 14, 2022 at 8:53 PM Xiao Li  wrote:
> >
> > Could you please list which features we want to finish before the branch
> cut? How long will they take?
> >
> > Xiao
> >
> > Chao Sun  于2022年3月14日周一 13:30写道:
> >>
> >> Hi Max,
> >>
> >> As there are still some ongoing work for the above listed SPIPs, can we
> still merge them after the branch cut?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk 
> >> 
> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Since there are no actual blockers for Spark 3.3.0 and significant
> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
> Please, let us know if you have any concerns about that.
> >>>
> >>> Best regards,
> >>> Max Gekk
> >>>
> >>>
> >>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk 
> wrote:
> 
>  Hello All,
> 
>  I would like to bring on the table the theme about the new Spark
> release 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
> 
>  Bellow is the list of ongoing and active SPIPs:
> 
>  Spark SQL:
>  - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>  - [SPARK-35801] Row-level operations in Data Source V2
>  - [SPARK-37166] Storage Partitioned Join
> 
>  Spark Core:
>  - [SPARK-20624] Add better handling for node shutdown
>  - [SPARK-25299] Use remote storage for persisting shuffle data
> 
>  PySpark:
>  - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
> 
>  Kubernetes:
>  - [SPARK-36057] Support Customized Kubernetes Schedulers
> 
>  Probably, we should finish if there are any remaining works for Spark
> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
> would like to volunteer to help drive this process.
> 
>  Best regards,
>  Max Gekk
>


Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
I mainly mean:

  - [SPARK-35801] Row-level operations in Data Source V2
  - [SPARK-37166] Storage Partitioned Join

For which the PR:

- https://github.com/apache/spark/pull/35395
- https://github.com/apache/spark/pull/35657

are actively being reviewed. It seems there are ongoing PRs for other
SPIPs as well but I'm not involved in those so not quite sure whether
they are intended for 3.3 release.

Chao


Chao

On Mon, Mar 14, 2022 at 8:53 PM Xiao Li  wrote:
>
> Could you please list which features we want to finish before the branch cut? 
> How long will they take?
>
> Xiao
>
> Chao Sun  于2022年3月14日周一 13:30写道:
>>
>> Hi Max,
>>
>> As there are still some ongoing work for the above listed SPIPs, can we 
>> still merge them after the branch cut?
>>
>> Thanks,
>> Chao
>>
>> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk 
>>  wrote:
>>>
>>> Hi All,
>>>
>>> Since there are no actual blockers for Spark 3.3.0 and significant 
>>> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST. 
>>> Please, let us know if you have any concerns about that.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk  wrote:

 Hello All,

 I would like to bring on the table the theme about the new Spark release 
 3.3. According to the public schedule at 
 https://spark.apache.org/versioning-policy.html, we planned to start the 
 code freeze and release branch cut on March 15th, 2022. Since this date is 
 coming soon, I would like to take your attention on the topic and gather 
 objections that you might have.

 Bellow is the list of ongoing and active SPIPs:

 Spark SQL:
 - [SPARK-31357] DataSourceV2: Catalog API for view metadata
 - [SPARK-35801] Row-level operations in Data Source V2
 - [SPARK-37166] Storage Partitioned Join

 Spark Core:
 - [SPARK-20624] Add better handling for node shutdown
 - [SPARK-25299] Use remote storage for persisting shuffle data

 PySpark:
 - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

 Kubernetes:
 - [SPARK-36057] Support Customized Kubernetes Schedulers

 Probably, we should finish if there are any remaining works for Spark 3.3, 
 and switch to QA mode, cut a branch and keep everything on track. I would 
 like to volunteer to help drive this process.

 Best regards,
 Max Gekk

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.3 Release

2022-03-14 Thread Holden Karau
On Mon, Mar 14, 2022 at 11:53 PM Xiao Li  wrote:

> Could you please list which features we want to finish before the branch
> cut? How long will they take?
>
> Xiao
>
> Chao Sun  于2022年3月14日周一 13:30写道:
>
>> Hi Max,
>>
>> As there are still some ongoing work for the above listed SPIPs, can we
>> still merge them after the branch cut?
>>
> In the past we’ve allowed merges for actively developed PRs post branch
cut, but it is easier when it doesn’t need to be cherry picked (eg pre cut).

>
>> Thanks,
>> Chao
>>
>> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
>>  wrote:
>>
>>> Hi All,
>>>
>>> Since there are no actual blockers for Spark 3.3.0 and significant
>>> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
>>> Please, let us know if you have any concerns about that.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk 
>>> wrote:
>>>
 Hello All,

 I would like to bring on the table the theme about the new Spark
 release 3.3. According to the public schedule at
 https://spark.apache.org/versioning-policy.html, we planned to start
 the code freeze and release branch cut on March 15th, 2022. Since this date
 is coming soon, I would like to take your attention on the topic and gather
 objections that you might have.

 Bellow is the list of ongoing and active SPIPs:

 Spark SQL:
 - [SPARK-31357] DataSourceV2: Catalog API for view metadata
 - [SPARK-35801] Row-level operations in Data Source V2
 - [SPARK-37166] Storage Partitioned Join

 Spark Core:
 - [SPARK-20624] Add better handling for node shutdown
 - [SPARK-25299] Use remote storage for persisting shuffle data

 PySpark:
 - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

 Kubernetes:
 - [SPARK-36057] Support Customized Kubernetes Schedulers

 Probably, we should finish if there are any remaining works for Spark
 3.3, and switch to QA mode, cut a branch and keep everything on track. I
 would like to volunteer to help drive this process.

 Best regards,
 Max Gekk

>>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
Could you please list which features we want to finish before the branch
cut? How long will they take?

Xiao

Chao Sun  于2022年3月14日周一 13:30写道:

> Hi Max,
>
> As there are still some ongoing work for the above listed SPIPs, can we
> still merge them after the branch cut?
>
> Thanks,
> Chao
>
> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
>  wrote:
>
>> Hi All,
>>
>> Since there are no actual blockers for Spark 3.3.0 and significant
>> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
>> Please, let us know if you have any concerns about that.
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk 
>> wrote:
>>
>>> Hello All,
>>>
>>> I would like to bring on the table the theme about the new Spark release
>>> 3.3. According to the public schedule at
>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>> is coming soon, I would like to take your attention on the topic and gather
>>> objections that you might have.
>>>
>>> Bellow is the list of ongoing and active SPIPs:
>>>
>>> Spark SQL:
>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>> - [SPARK-35801] Row-level operations in Data Source V2
>>> - [SPARK-37166] Storage Partitioned Join
>>>
>>> Spark Core:
>>> - [SPARK-20624] Add better handling for node shutdown
>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>
>>> PySpark:
>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>
>>> Kubernetes:
>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>
>>> Probably, we should finish if there are any remaining works for Spark
>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>> would like to volunteer to help drive this process.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>


Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
Hi Max,

As there are still some ongoing work for the above listed SPIPs, can we
still merge them after the branch cut?

Thanks,
Chao

On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
 wrote:

> Hi All,
>
> Since there are no actual blockers for Spark 3.3.0 and significant
> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
> Please, let us know if you have any concerns about that.
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk 
> wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>


Re: Apache Spark 3.3 Release

2022-03-14 Thread Maxim Gekk
Hi All,

Since there are no actual blockers for Spark 3.3.0 and significant
objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
Please, let us know if you have any concerns about that.

Best regards,
Max Gekk


On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk  wrote:

> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>


Re: Apache Spark 3.3 Release

2022-03-06 Thread Maciej
Ideally, we should complete these

- [SPARK-37093] Inline type hints python/pyspark/streaming
- [SPARK-37395] Inline type hint files for files in python/pyspark/ml
- [SPARK-37396] Inline type hint files for files in python/pyspark/mllib

All tasks have either PR in progress or someone working on a one, so the
the limiting factor is our ability to review these.

On 3/3/22 19:44, Maxim Gekk wrote:
> Hello All,
> 
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html
> , we planned to start
> the code freeze and release branch cut on March 15th, 2022. Since this
> date is coming soon, I would like to take your attention on the topic
> and gather objections that you might have.
> 
> Bellow is the list of ongoing and active SPIPs:
> 
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
> 
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
> 
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
> 
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
> 
> Probably, we should finish if there are any remaining works for Spark
> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
> would like to volunteer to help drive this process.
> 
> Best regards,
> Max Gekk


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC


OpenPGP_signature
Description: OpenPGP digital signature


Re: Apache Spark 3.3 Release

2022-03-04 Thread Yikun Jiang
@Maxim Thanks for driving the release!

> Not sure about SPARK-36057 since the current state.

@Igor Costa Thanks for your attention, as dongjoon said, basic framework
abilities of  custom scheduler have been supported, we are also planning to
mark this as beta in 3.3.0. Of course, we will do more tests to make sure
it is more stable and also welcome more input to make it better
continuously.

> I don't think that could be a blocker for Apache Spark 3.2.0.

Yep, and v3.3.0, : )

Regards,
Yikun


Re: Apache Spark 3.3 Release

2022-03-04 Thread Dongjoon Hyun
I've reviewed most of the actual code in that area.

That's pretty much an experimental feature still.

I don't think that could be a blocker for Apache Spark 3.2.0.

Dongjoon.



On Fri, Mar 4, 2022 at 12:25 PM Igor Costa  wrote:

> Thanks Maxim,
>
> The code freeze by end of this month would be fine. Not sure about
> SPARK-36057 since the current state.
>
>
>
> Thanks
>
> On Fri, 4 Mar 2022 at 19:27, Jungtaek Lim 
> wrote:
>
>> Thanks Maxim for volunteering to drive the release! I support the plan
>> (March 15th) to perform a release branch cut.
>>
>> Btw, would we be open for modification of critical/blocker issues after
>> the release branch cut? I have a blocker JIRA ticket and the PR is open for
>> reviewing, but need some time to gain traction as well as going through
>> actual reviews. My guess is yes but to confirm again.
>>
>> On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>>>
>>> Ya, I'm also +1 for the original plan.
>>>
>>> Dongjoon
>>>
>>> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
>>> wrote:
>>>
>>>>
>>>> Agree with Sean, code freeze by mid March sounds good.
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>>>>
>>>>> I think it's fine to pursue the existing plan - code freeze in two
>>>>> weeks and try to close off key remaining issues. Final release pending on
>>>>> how those go, and testing, but fine to get the ball rolling.
>>>>>
>>>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>>>  wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I would like to bring on the table the theme about the new Spark
>>>>>> release 3.3. According to the public schedule at
>>>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>>>> the code freeze and release branch cut on March 15th, 2022. Since this 
>>>>>> date
>>>>>> is coming soon, I would like to take your attention on the topic and 
>>>>>> gather
>>>>>> objections that you might have.
>>>>>>
>>>>>> Bellow is the list of ongoing and active SPIPs:
>>>>>>
>>>>>> Spark SQL:
>>>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>>>> - [SPARK-37166] Storage Partitioned Join
>>>>>>
>>>>>> Spark Core:
>>>>>> - [SPARK-20624] Add better handling for node shutdown
>>>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>>>
>>>>>> PySpark:
>>>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>>>
>>>>>> Kubernetes:
>>>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>>>
>>>>>> Probably, we should finish if there are any remaining works for Spark
>>>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>>>> would like to volunteer to help drive this process.
>>>>>>
>>>>>> Best regards,
>>>>>> Max Gekk
>>>>>>
>>>>> --
> Sent from Gmail Mobile
>


Re: Apache Spark 3.3 Release

2022-03-04 Thread Igor Costa
Thanks Maxim,

The code freeze by end of this month would be fine. Not sure about
SPARK-36057 since the current state.



Thanks

On Fri, 4 Mar 2022 at 19:27, Jungtaek Lim 
wrote:

> Thanks Maxim for volunteering to drive the release! I support the plan
> (March 15th) to perform a release branch cut.
>
> Btw, would we be open for modification of critical/blocker issues after
> the release branch cut? I have a blocker JIRA ticket and the PR is open for
> reviewing, but need some time to gain traction as well as going through
> actual reviews. My guess is yes but to confirm again.
>
> On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun 
> wrote:
>
>> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>>
>> Ya, I'm also +1 for the original plan.
>>
>> Dongjoon
>>
>> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
>> wrote:
>>
>>>
>>> Agree with Sean, code freeze by mid March sounds good.
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>>>
>>>> I think it's fine to pursue the existing plan - code freeze in two
>>>> weeks and try to close off key remaining issues. Final release pending on
>>>> how those go, and testing, but fine to get the ball rolling.
>>>>
>>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>>  wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I would like to bring on the table the theme about the new Spark
>>>>> release 3.3. According to the public schedule at
>>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>>> the code freeze and release branch cut on March 15th, 2022. Since this 
>>>>> date
>>>>> is coming soon, I would like to take your attention on the topic and 
>>>>> gather
>>>>> objections that you might have.
>>>>>
>>>>> Bellow is the list of ongoing and active SPIPs:
>>>>>
>>>>> Spark SQL:
>>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>>> - [SPARK-37166] Storage Partitioned Join
>>>>>
>>>>> Spark Core:
>>>>> - [SPARK-20624] Add better handling for node shutdown
>>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>>
>>>>> PySpark:
>>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>>
>>>>> Kubernetes:
>>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>>
>>>>> Probably, we should finish if there are any remaining works for Spark
>>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>>> would like to volunteer to help drive this process.
>>>>>
>>>>> Best regards,
>>>>> Max Gekk
>>>>>
>>>> --
Sent from Gmail Mobile


Re: Apache Spark 3.3 Release

2022-03-03 Thread Jungtaek Lim
Thanks Maxim for volunteering to drive the release! I support the plan
(March 15th) to perform a release branch cut.

Btw, would we be open for modification of critical/blocker issues after the
release branch cut? I have a blocker JIRA ticket and the PR is open for
reviewing, but need some time to gain traction as well as going through
actual reviews. My guess is yes but to confirm again.

On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun 
wrote:

> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>
> Ya, I'm also +1 for the original plan.
>
> Dongjoon
>
> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
> wrote:
>
>>
>> Agree with Sean, code freeze by mid March sounds good.
>>
>> Regards,
>> Mridul
>>
>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>>
>>> I think it's fine to pursue the existing plan - code freeze in two weeks
>>> and try to close off key remaining issues. Final release pending on how
>>> those go, and testing, but fine to get the ball rolling.
>>>
>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>  wrote:
>>>
>>>> Hello All,
>>>>
>>>> I would like to bring on the table the theme about the new Spark
>>>> release 3.3. According to the public schedule at
>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>>> is coming soon, I would like to take your attention on the topic and gather
>>>> objections that you might have.
>>>>
>>>> Bellow is the list of ongoing and active SPIPs:
>>>>
>>>> Spark SQL:
>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>> - [SPARK-37166] Storage Partitioned Join
>>>>
>>>> Spark Core:
>>>> - [SPARK-20624] Add better handling for node shutdown
>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>
>>>> PySpark:
>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>
>>>> Kubernetes:
>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>
>>>> Probably, we should finish if there are any remaining works for Spark
>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>> would like to volunteer to help drive this process.
>>>>
>>>> Best regards,
>>>> Max Gekk
>>>>
>>>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Dongjoon Hyun
Thank you, Max, for volunteering for Apache Spark 3.3 release manager.

Ya, I'm also +1 for the original plan.

Dongjoon

On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
wrote:

>
> Agree with Sean, code freeze by mid March sounds good.
>
> Regards,
> Mridul
>
> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>
>> I think it's fine to pursue the existing plan - code freeze in two weeks
>> and try to close off key remaining issues. Final release pending on how
>> those go, and testing, but fine to get the ball rolling.
>>
>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>  wrote:
>>
>>> Hello All,
>>>
>>> I would like to bring on the table the theme about the new Spark release
>>> 3.3. According to the public schedule at
>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>> is coming soon, I would like to take your attention on the topic and gather
>>> objections that you might have.
>>>
>>> Bellow is the list of ongoing and active SPIPs:
>>>
>>> Spark SQL:
>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>> - [SPARK-35801] Row-level operations in Data Source V2
>>> - [SPARK-37166] Storage Partitioned Join
>>>
>>> Spark Core:
>>> - [SPARK-20624] Add better handling for node shutdown
>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>
>>> PySpark:
>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>
>>> Kubernetes:
>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>
>>> Probably, we should finish if there are any remaining works for Spark
>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>> would like to volunteer to help drive this process.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Mridul Muralidharan
Agree with Sean, code freeze by mid March sounds good.

Regards,
Mridul

On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:

> I think it's fine to pursue the existing plan - code freeze in two weeks
> and try to close off key remaining issues. Final release pending on how
> those go, and testing, but fine to get the ball rolling.
>
> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>  wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Sean Owen
I think it's fine to pursue the existing plan - code freeze in two weeks
and try to close off key remaining issues. Final release pending on how
those go, and testing, but fine to get the ball rolling.

On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
 wrote:

> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>


Apache Spark 3.3 Release

2022-03-03 Thread Maxim Gekk
Hello All,

I would like to bring on the table the theme about the new Spark release
3.3. According to the public schedule at
https://spark.apache.org/versioning-policy.html, we planned to start the
code freeze and release branch cut on March 15th, 2022. Since this date is
coming soon, I would like to take your attention on the topic and gather
objections that you might have.

Bellow is the list of ongoing and active SPIPs:

Spark SQL:
- [SPARK-31357] DataSourceV2: Catalog API for view metadata
- [SPARK-35801] Row-level operations in Data Source V2
- [SPARK-37166] Storage Partitioned Join

Spark Core:
- [SPARK-20624] Add better handling for node shutdown
- [SPARK-25299] Use remote storage for persisting shuffle data

PySpark:
- [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

Kubernetes:
- [SPARK-36057] Support Customized Kubernetes Schedulers

Probably, we should finish if there are any remaining works for Spark 3.3,
and switch to QA mode, cut a branch and keep everything on track. I would
like to volunteer to help drive this process.

Best regards,
Max Gekk