Re: Apache Spark 3.3 Release

2022-03-17 Thread Maxim Gekk
Hi All,

Here is the allow list which I built based on your requests in this thread:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-32268: Bloom Filter Join
   7. SPARK-38548: New SQL function: try_sum
   8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   9. SPARK-38063: Support SQL split_part function
   10. SPARK-28516: Data Type Formatting Functions: `to_char`
   11. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   12. SPARK-34863: Support nested column in Spark Parquet vectorized
   readers
   13. SPARK-38194: Make Yarn memory overhead factor configurable
   14. SPARK-37618: Support cleaning up shuffle blocks from external
   shuffle service
   15. SPARK-37831: Add task partition id in metrics
   16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   17. SPARK-36664: Log time spent waiting for cluster resources
   18. SPARK-34659: Web UI does not correctly get appId
   19. SPARK-37650: Tell spark-env.sh the python interpreter
   20. SPARK-38589: New SQL function: try_avg
   21. SPARK-38590: New SQL function: try_to_binary
   22. SPARK-34079: Improvement CTE table scan

Best regards,
Max Gekk


On Thu, Mar 17, 2022 at 4:59 PM Tom Graves  wrote:

> Is the feature freeze target date March 22nd then?  I saw a few dates
> thrown around want to confirm what we landed on
>
> I am trying to get the following improvements finished review and in, if
> concerns with either, let me know:
> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
> 
> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for
> released executors 
>
> Tom
>
>
> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
> ltn...@gmail.com> wrote:
>
>
> I'd like to add the following new SQL functions in the 3.3 release. These
> functions are useful when overflow or encoding errors occur:
>
>- [SPARK-38548][SQL] New SQL function: try_sum
>
>- [SPARK-38589][SQL] New SQL function: try_avg
>
>- [SPARK-38590][SQL] New SQL function: try_to_binary
>
>
> Gengliang
>
> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:
>
> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >

Re: bazel and external/

2022-03-17 Thread Jungtaek Lim
Avro reader is technically a connector. We eventually called data source
implementation "connector" as well; the package name in the catalyst
represents it.

Docker is something I'm not sure fits with the name "external". It probably
deserves a top level directory now, since we start to release an official
docker image. That does not seem to be an experimental one.

Except Docker, all modules in the external directory are "sort of"
connectors. Ganglia metric sink is an exception, but it is still a kind of
connector for Dropwizard.
(It might be interesting to see how many users are still using kinesis-asl
and ganglia-lgpl modules. We have had almost no updates for DStream for
several years.)

If we agree with my proposal for docker, remaining is going to be
effectively a rename. I don't have a strong opinion, just wanted to avoid
the external directory to become/remain miscellaneous one.

On Fri, Mar 18, 2022 at 10:04 AM Sean Owen  wrote:

> I sympathize, but might be less change to just rename the dir. There is
> more in there like the avro reader; it's kind of miscellaneous. I think we
> might want fewer rather than more top level dirs.
>
> On Thu, Mar 17, 2022 at 7:33 PM Jungtaek Lim 
> wrote:
>
>> We seem to just focus on how to avoid the conflict with the name
>> "external" used in bazel. Since we consider the possibility of renaming,
>> why not revisit the modules "external" contains?
>>
>> Looks like kinds of the modules external directory contains are 1) Docker
>> 2) Connectors 3) Sink on Dropwizard metrics (only ganglia here, and it
>> seems to be just that Ganglia is LGPL)
>>
>> Would it make sense if each kind deserves a top directory? We can
>> probably give better generalized names, and as a side-effect we will no
>> longer have "external".
>>
>> On Fri, Mar 18, 2022 at 5:45 AM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for posting this, Alkis.
>>>
>>> Before the question (1) and (2), I'm curious if the Apache Spark
>>> community has other downstreams using Bazel.
>>>
>>> To All. If there are some Bazel users with Apache Spark code, could you
>>> share your practice? If you are using renaming, what is your renamed
>>> directory name?
>>>
>>> Dongjoon.
>>>
>>>
>>> On Thu, Mar 17, 2022 at 11:56 AM Alkis Evlogimenos
>>>  wrote:
>>>
 AFAIK there is not. `external` has been baked in bazel since the
 beginning and there is no plan from bazel devs to attempt to fix this
 
 .

 On Thu, Mar 17, 2022 at 7:52 PM Sean Owen  wrote:

> Just checking - there is no way to tell bazel to look somewhere else
> for whatever 'external' means to it?
> It's a kinda big ugly change but it's not a functional change. If
> anything it might break some downstream builds that rely on the current
> structure too. But such is life for developers? I don't have a strong
> reason we can't.
>
> On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
>  wrote:
>
>> Hi Spark devs.
>>
>> The Apache Spark repo has a top level external/ directory. This is a
>> reserved name for the bazel build system and it causes all sorts of
>> problems: some can be worked around and some cannot (for some details on
>> one that cannot see
>> https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
>> ).
>>
>> Some forks of Apache Spark use bazel as a build system. It would be
>> nice if we can make this change in Apache Spark without resorting to
>> complex renames/merges whenever changes are pulled from upstream.
>>
>> As such I proposed to rename external/ directory to want to rename
>> the external/ directory to something else [SPARK-38569
>> ]. I also sent a
>> tentative [PR-35874 ]
>> that renames external/ to vendor/.
>>
>> My questions to you are:
>> 1. Are there any objections to renaming external to X?
>> 2. Is vendor a good new name for external?
>>
>> Cheers,
>>
>


Re: bazel and external/

2022-03-17 Thread Sean Owen
I sympathize, but might be less change to just rename the dir. There is
more in there like the avro reader; it's kind of miscellaneous. I think we
might want fewer rather than more top level dirs.

On Thu, Mar 17, 2022 at 7:33 PM Jungtaek Lim 
wrote:

> We seem to just focus on how to avoid the conflict with the name
> "external" used in bazel. Since we consider the possibility of renaming,
> why not revisit the modules "external" contains?
>
> Looks like kinds of the modules external directory contains are 1) Docker
> 2) Connectors 3) Sink on Dropwizard metrics (only ganglia here, and it
> seems to be just that Ganglia is LGPL)
>
> Would it make sense if each kind deserves a top directory? We can probably
> give better generalized names, and as a side-effect we will no longer have
> "external".
>
> On Fri, Mar 18, 2022 at 5:45 AM Dongjoon Hyun 
> wrote:
>
>> Thank you for posting this, Alkis.
>>
>> Before the question (1) and (2), I'm curious if the Apache Spark
>> community has other downstreams using Bazel.
>>
>> To All. If there are some Bazel users with Apache Spark code, could you
>> share your practice? If you are using renaming, what is your renamed
>> directory name?
>>
>> Dongjoon.
>>
>>
>> On Thu, Mar 17, 2022 at 11:56 AM Alkis Evlogimenos
>>  wrote:
>>
>>> AFAIK there is not. `external` has been baked in bazel since the
>>> beginning and there is no plan from bazel devs to attempt to fix this
>>> 
>>> .
>>>
>>> On Thu, Mar 17, 2022 at 7:52 PM Sean Owen  wrote:
>>>
 Just checking - there is no way to tell bazel to look somewhere else
 for whatever 'external' means to it?
 It's a kinda big ugly change but it's not a functional change. If
 anything it might break some downstream builds that rely on the current
 structure too. But such is life for developers? I don't have a strong
 reason we can't.

 On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
  wrote:

> Hi Spark devs.
>
> The Apache Spark repo has a top level external/ directory. This is a
> reserved name for the bazel build system and it causes all sorts of
> problems: some can be worked around and some cannot (for some details on
> one that cannot see
> https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
> ).
>
> Some forks of Apache Spark use bazel as a build system. It would be
> nice if we can make this change in Apache Spark without resorting to
> complex renames/merges whenever changes are pulled from upstream.
>
> As such I proposed to rename external/ directory to want to rename the
> external/ directory to something else [SPARK-38569
> ]. I also sent a
> tentative [PR-35874 ]
> that renames external/ to vendor/.
>
> My questions to you are:
> 1. Are there any objections to renaming external to X?
> 2. Is vendor a good new name for external?
>
> Cheers,
>



Re: bazel and external/

2022-03-17 Thread Jungtaek Lim
We seem to just focus on how to avoid the conflict with the name "external"
used in bazel. Since we consider the possibility of renaming, why not
revisit the modules "external" contains?

Looks like kinds of the modules external directory contains are 1) Docker
2) Connectors 3) Sink on Dropwizard metrics (only ganglia here, and it
seems to be just that Ganglia is LGPL)

Would it make sense if each kind deserves a top directory? We can probably
give better generalized names, and as a side-effect we will no longer have
"external".

On Fri, Mar 18, 2022 at 5:45 AM Dongjoon Hyun 
wrote:

> Thank you for posting this, Alkis.
>
> Before the question (1) and (2), I'm curious if the Apache Spark community
> has other downstreams using Bazel.
>
> To All. If there are some Bazel users with Apache Spark code, could you
> share your practice? If you are using renaming, what is your renamed
> directory name?
>
> Dongjoon.
>
>
> On Thu, Mar 17, 2022 at 11:56 AM Alkis Evlogimenos
>  wrote:
>
>> AFAIK there is not. `external` has been baked in bazel since the
>> beginning and there is no plan from bazel devs to attempt to fix this
>> .
>>
>> On Thu, Mar 17, 2022 at 7:52 PM Sean Owen  wrote:
>>
>>> Just checking - there is no way to tell bazel to look somewhere else for
>>> whatever 'external' means to it?
>>> It's a kinda big ugly change but it's not a functional change. If
>>> anything it might break some downstream builds that rely on the current
>>> structure too. But such is life for developers? I don't have a strong
>>> reason we can't.
>>>
>>> On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
>>>  wrote:
>>>
 Hi Spark devs.

 The Apache Spark repo has a top level external/ directory. This is a
 reserved name for the bazel build system and it causes all sorts of
 problems: some can be worked around and some cannot (for some details on
 one that cannot see
 https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
 ).

 Some forks of Apache Spark use bazel as a build system. It would be
 nice if we can make this change in Apache Spark without resorting to
 complex renames/merges whenever changes are pulled from upstream.

 As such I proposed to rename external/ directory to want to rename the
 external/ directory to something else [SPARK-38569
 ]. I also sent a
 tentative [PR-35874 ] that
 renames external/ to vendor/.

 My questions to you are:
 1. Are there any objections to renaming external to X?
 2. Is vendor a good new name for external?

 Cheers,

>>>


Re: bazel and external/

2022-03-17 Thread Dongjoon Hyun
Thank you for posting this, Alkis.

Before the question (1) and (2), I'm curious if the Apache Spark community
has other downstreams using Bazel.

To All. If there are some Bazel users with Apache Spark code, could you
share your practice? If you are using renaming, what is your renamed
directory name?

Dongjoon.


On Thu, Mar 17, 2022 at 11:56 AM Alkis Evlogimenos
 wrote:

> AFAIK there is not. `external` has been baked in bazel since the beginning
> and there is no plan from bazel devs to attempt to fix this
> .
>
> On Thu, Mar 17, 2022 at 7:52 PM Sean Owen  wrote:
>
>> Just checking - there is no way to tell bazel to look somewhere else for
>> whatever 'external' means to it?
>> It's a kinda big ugly change but it's not a functional change. If
>> anything it might break some downstream builds that rely on the current
>> structure too. But such is life for developers? I don't have a strong
>> reason we can't.
>>
>> On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
>>  wrote:
>>
>>> Hi Spark devs.
>>>
>>> The Apache Spark repo has a top level external/ directory. This is a
>>> reserved name for the bazel build system and it causes all sorts of
>>> problems: some can be worked around and some cannot (for some details on
>>> one that cannot see
>>> https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
>>> ).
>>>
>>> Some forks of Apache Spark use bazel as a build system. It would be nice
>>> if we can make this change in Apache Spark without resorting to
>>> complex renames/merges whenever changes are pulled from upstream.
>>>
>>> As such I proposed to rename external/ directory to want to rename the
>>> external/ directory to something else [SPARK-38569
>>> ]. I also sent a
>>> tentative [PR-35874 ] that
>>> renames external/ to vendor/.
>>>
>>> My questions to you are:
>>> 1. Are there any objections to renaming external to X?
>>> 2. Is vendor a good new name for external?
>>>
>>> Cheers,
>>>
>>


Re: bazel and external/

2022-03-17 Thread Alkis Evlogimenos
AFAIK there is not. `external` has been baked in bazel since the beginning
and there is no plan from bazel devs to attempt to fix this
.

On Thu, Mar 17, 2022 at 7:52 PM Sean Owen  wrote:

> Just checking - there is no way to tell bazel to look somewhere else for
> whatever 'external' means to it?
> It's a kinda big ugly change but it's not a functional change. If anything
> it might break some downstream builds that rely on the current structure
> too. But such is life for developers? I don't have a strong reason we can't.
>
> On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
>  wrote:
>
>> Hi Spark devs.
>>
>> The Apache Spark repo has a top level external/ directory. This is a
>> reserved name for the bazel build system and it causes all sorts of
>> problems: some can be worked around and some cannot (for some details on
>> one that cannot see
>> https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
>> ).
>>
>> Some forks of Apache Spark use bazel as a build system. It would be nice
>> if we can make this change in Apache Spark without resorting to
>> complex renames/merges whenever changes are pulled from upstream.
>>
>> As such I proposed to rename external/ directory to want to rename the
>> external/ directory to something else [SPARK-38569
>> ]. I also sent a
>> tentative [PR-35874 ] that
>> renames external/ to vendor/.
>>
>> My questions to you are:
>> 1. Are there any objections to renaming external to X?
>> 2. Is vendor a good new name for external?
>>
>> Cheers,
>>
>


Re: bazel and external/

2022-03-17 Thread Sean Owen
Just checking - there is no way to tell bazel to look somewhere else for
whatever 'external' means to it?
It's a kinda big ugly change but it's not a functional change. If anything
it might break some downstream builds that rely on the current structure
too. But such is life for developers? I don't have a strong reason we can't.

On Thu, Mar 17, 2022 at 1:47 PM Alkis Evlogimenos
 wrote:

> Hi Spark devs.
>
> The Apache Spark repo has a top level external/ directory. This is a
> reserved name for the bazel build system and it causes all sorts of
> problems: some can be worked around and some cannot (for some details on
> one that cannot see
> https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30
> ).
>
> Some forks of Apache Spark use bazel as a build system. It would be nice
> if we can make this change in Apache Spark without resorting to
> complex renames/merges whenever changes are pulled from upstream.
>
> As such I proposed to rename external/ directory to want to rename the
> external/ directory to something else [SPARK-38569
> ]. I also sent a
> tentative [PR-35874 ] that
> renames external/ to vendor/.
>
> My questions to you are:
> 1. Are there any objections to renaming external to X?
> 2. Is vendor a good new name for external?
>
> Cheers,
>


bazel and external/

2022-03-17 Thread Alkis Evlogimenos
Hi Spark devs.

The Apache Spark repo has a top level external/ directory. This is a
reserved name for the bazel build system and it causes all sorts of
problems: some can be worked around and some cannot (for some details on
one that cannot see
https://github.com/hedronvision/bazel-compile-commands-extractor/issues/30).

Some forks of Apache Spark use bazel as a build system. It would be nice if
we can make this change in Apache Spark without resorting to
complex renames/merges whenever changes are pulled from upstream.

As such I proposed to rename external/ directory to want to rename the
external/ directory to something else [SPARK-38569
]. I also sent a
tentative [PR-35874 ] that
renames external/ to vendor/.

My questions to you are:
1. Are there any objections to renaming external to X?
2. Is vendor a good new name for external?

Cheers,


Re: Apache Spark 3.3 Release

2022-03-17 Thread Tom Graves
 Is the feature freeze target date March 22nd then?  I saw a few dates thrown 
around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if 
concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated 
scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle 
service for released executors
Tom

On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang 
 wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These 
functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
>
> I'd like to add/backport the logging in 
> https://github.com/apache/spark/pull/35881 PR so that when users submit 
> issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves  
>> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 
>> > 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
>> >  wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping 
>> > me in the PR. Regarding new features, we are still building the allow list 
>> > for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun  
>> > wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the 
>> > branch cut date of Spark 3.3. Today? or this Friday? This is not a big 
>> > deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the 
>> > dependency upgrade after the branch cut. To make our release time more 
>> > predictable, I am suggesting we should finalize the exception PR list 
>> > first, instead of merging them in an ad hoc way. In the past, we spent a 
>> > lot of time on the revert of the PRs that were merged after the branch 
>> > cut. I hope we can minimize unnecessary arguments in this release. Do you 
>> > agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of 
>> > plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect 
>> > the list and make a list of exceptions. I'm not blocking what you want to 
>> > do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all 
>> > patches to land Apache Spark 3.3. That is totally fine. After we cut the 
>> > branch, we should avoid merging the feature work. In the next three days, 
>> > let us collect the actively developed PRs that we want to make an 
>> > exception (i.e., merged to 3.3 after the upcoming branch cut). Does that 
>> > make sense?
>> >
>> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land 
>> > Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well 
>> > > discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 1

Re: Apache Spark 3.3 Release

2022-03-17 Thread Gengliang Wang
I'd like to add the following new SQL functions in the 3.3 release. These
functions are useful when overflow or encoding errors occur:

   - [SPARK-38548][SQL] New SQL function: try_sum
   
   - [SPARK-38589][SQL] New SQL function: try_avg
   
   - [SPARK-38590][SQL] New SQL function: try_to_binary
   

Gengliang

On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo  wrote:

> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau  wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun  wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves 
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li  wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 15:55写道:
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li  wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >> >
> >> > Dongjoon Hyun  于2022年3月15日周二 14:54写道:
> >> >
> >> > Xiao. You are working against what you are saying.
> >> > If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
> >> >
> >> > > we need to avoid backporting the feature work that are not being
> well discussed.
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li 
> wrote:
> >> >
> >> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> active