Spark on Kubernetes focused workshops

2022-05-19 Thread Agarwal, Janak


Team, is there a meetup or workshop focused on Spark on Kubernetes?
If not, any interest in creating a once a month sync to exchange notes and best 
practices?

Thanks,
Janak


Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-19 Thread L. C. Hsieh
+1. Thanks Hyukjin.

On Thu, May 19, 2022 at 10:14 AM Bryan Cutler  wrote:
>
> +1, sounds good
>
> On Wed, May 18, 2022 at 9:16 PM Dongjoon Hyun  wrote:
>>
>> +1
>>
>> Thank you for the suggestion, Hyukjin.
>>
>> Dongjoon.
>>
>> On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen  
>> wrote:
>>>
>>> +1
>>> But can will have PR Title and PR label the same,  PS
>>>
>>> ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng 
>>> :

 Great!

 It saves us from always specifying "Pandas API on Spark" in PR titles.

 Thanks!


 Xinrong Meng

 Software Engineer

 Databricks



 On Tue, May 17, 2022 at 1:08 AM Maciej  wrote:
>
> Sounds good!
>
> +1
>
> On 5/17/22 06:08, Yikun Jiang wrote:
> > It's a pretty good idea, +1.
> >
> > To be clear in Github:
> >
> > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr 
> > title
> > (*still keep [PYTHON]* and [PS] new added)
> >
> > - For PR label: new added: `PANDAS API ON Spark`, still keep: `PYTHON`,
> > `CORE`
> > (*still keep `PYTHON`, `CORE`* and `PANDAS API ON SPARK` new added)
> > https://github.com/apache/spark/pull/36574
> > 
> >
> > Right?
> >
> > Regards,
> > Yikun
> >
> >
> > On Tue, May 17, 2022 at 11:26 AM Hyukjin Kwon  > > wrote:
> >
> > Hi all,
> >
> > What about we introduce a component in JIRA "Pandas API on Spark",
> > and use "PS"  (pandas-on-Spark) in PR titles? We already use "ps" in
> > many places when we: import pyspark.pandas as ps.
> > This is similar to "Structured Streaming" in JIRA, and "SS" in PR 
> > title.
> >
> > I think it'd be easier to track the changes here with that.
> > Currently it's a bit difficult to identify it from pure PySpark 
> > changes.
> >
>
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>>>
>>>
>>>
>>> --
>>> Bjørn Jørgensen
>>> Vestre Aspehaug 4, 6010 Ålesund
>>> Norge
>>>
>>> +47 480 94 297

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-19 Thread Bryan Cutler
+1, sounds good

On Wed, May 18, 2022 at 9:16 PM Dongjoon Hyun 
wrote:

> +1
>
> Thank you for the suggestion, Hyukjin.
>
> Dongjoon.
>
> On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen 
> wrote:
>
>> +1
>> But can will have PR Title and PR label the same,  PS
>>
>> ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng
>> :
>>
>>> Great!
>>>
>>> It saves us from always specifying "Pandas API on Spark" in PR titles.
>>>
>>> Thanks!
>>>
>>>
>>> Xinrong Meng
>>>
>>> Software Engineer
>>>
>>> Databricks
>>>
>>>
>>> On Tue, May 17, 2022 at 1:08 AM Maciej  wrote:
>>>
 Sounds good!

 +1

 On 5/17/22 06:08, Yikun Jiang wrote:
 > It's a pretty good idea, +1.
 >
 > To be clear in Github:
 >
 > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr
 title
 > (*still keep [PYTHON]* and [PS] new added)
 >
 > - For PR label: new added: `PANDAS API ON Spark`, still keep:
 `PYTHON`,
 > `CORE`
 > (*still keep `PYTHON`, `CORE`* and `PANDAS API ON SPARK` new added)
 > https://github.com/apache/spark/pull/36574
 > 
 >
 > Right?
 >
 > Regards,
 > Yikun
 >
 >
 > On Tue, May 17, 2022 at 11:26 AM Hyukjin Kwon >>> > > wrote:
 >
 > Hi all,
 >
 > What about we introduce a component in JIRA "Pandas API on Spark",
 > and use "PS"  (pandas-on-Spark) in PR titles? We already use "ps"
 in
 > many places when we: import pyspark.pandas as ps.
 > This is similar to "Structured Streaming" in JIRA, and "SS" in PR
 title.
 >
 > I think it'd be easier to track the changes here with that.
 > Currently it's a bit difficult to identify it from pure PySpark
 changes.
 >


 --
 Best regards,
 Maciej Szymkiewicz

 Web: https://zero323.net
 PGP: A30CEF0C31A501EC

>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>


Re:Re: Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-19 Thread beliefer
Thank you for the reply !




At 2022-05-18 20:27:27, "Wenchen Fan"  wrote:

A view is essentially a SQL query. It's fragile to share views between Spark 
and Hive because different systems have different SQL dialects. They may 
interpret the view SQL query differently and introduce unexpected behaviors.


In this case, Spark returns decimal type for gender * 0.3 - 0.1 but Hive 
returns double type. The view schema was determined during creation by Hive, 
which does not match the view SQL query when we use Spark to read the view. We 
need to re-create this view using Spark. Actually I think we need to do the 
same for every Hive view if we need to use it in Spark.


On Wed, May 18, 2022 at 7:03 PM beliefer  wrote:


During the migration from hive to spark, there was a problem with the SQL used 
to create views in hive. The problem is that the SQL that legally creates a 
view in hive will make an error when executed in spark SQL.

The SQL is as follows:

CREATE VIEW test_db.my_view AS
select
case
when age > 12 then gender * 0.3 - 0.1
end AS TT,
gender,
age,
careers,
education
from
test_db.my_table;

The error message is as follows:

Cannot up cast TT from decimal(13, 1) to double.

The type path of the target object is:



You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object



How should we solve this problem?




 

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Emil Ejbyfeldt

Hi,

When testing out Spark 3.3.0 on our production spark workload it was 
noticed that https://issues.apache.org/jira/browse/SPARK-38681 is 
actually a regression from 3.2 (I did not know this a the time of 
creating the ticket) seem like the bug was introduced in 
https://github.com/apache/spark/pull/33205


I already have a PR here that fixes the issue:
https://github.com/apache/spark/pull/36004

Since this is a breaking regression for us I think it might be for other 
people as well.


Best,
Emil

On 16/05/2022 14:43, Maxim Gekk wrote:
Please vote on releasing the following candidate as 
Apache Spark version 3.3.0.


The vote is open until 11:59pm Pacific time May 19th and passes if a 
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.


[ ] +1 Release this package as Apache Spark 3.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/ 



The tag to be voted on is v3.3.0-rc2 (commit 
c8c657b922ac8fd8dcf9553113e11a80079db059):
https://github.com/apache/spark/tree/v3.3.0-rc2 



The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/ 



Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS 



The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1403 



The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/ 



The list of bug fixes going into 3.3.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12350369 



This release is using the release script of the tag v3.3.0-rc2.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.0?
===
The current list of open tickets targeted at 3.3.0 can be found at:
https://issues.apache.org/jira/projects/SPARK 
 and search for "Target 
Version/s" = 3.3.0


Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Maxim Gekk

Software Engineer

Databricks, Inc.



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Kent Yao
Thanks for the quick fix, Gengliang.

BR,
Kent

Gengliang Wang  于2022年5月19日周四 18:25写道:
>
> Hi Kent and Wenchen,
>
> Thanks for reporting. I just created 
> https://github.com/apache/spark/pull/36609 to fix the issue.
>
> Gengliang
>
> On Thu, May 19, 2022 at 5:40 PM Wenchen Fan  wrote:
>>
>> I think it should have been fixed  by 
>> https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed
>>  . Maybe the fix is not completed...
>>
>> On Thu, May 19, 2022 at 2:16 PM Kent Yao  wrote:
>>>
>>> Thanks, Maxim.
>>>
>>> Leave my -1 for this release candidate.
>>>
>>> Unfortunately, I don't know which PR fixed this.
>>> Does anyone happen to know?
>>>
>>> BR,
>>> Kent Yao
>>>
>>> Maxim Gekk  于2022年5月19日周四 13:42写道:
>>> >
>>> > Hi Kent,
>>> >
>>> > > Shall we backport the fix from the master to 3.3 too?
>>> >
>>> > Yes, we shall.
>>> >
>>> > Maxim Gekk
>>> >
>>> > Software Engineer
>>> >
>>> > Databricks, Inc.
>>> >
>>> >
>>> >
>>> > On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I verified the simple case below with the binary release, and it looks
>>> >> like a bug to me.
>>> >>
>>> >> bin/spark-sql -e "select date '2018-11-17' > 1"
>>> >>
>>> >> Error in query: Invalid call to toAttribute on unresolved object;
>>> >> 'Project [unresolvedalias((2018-11-17 > 1), None)]
>>> >> +- OneRowRelation
>>> >>
>>> >> Both 3.2 releases and the master branch work fine with correct errors
>>> >> -  'due to data type mismatch'.
>>> >>
>>> >> Shall we backport the fix from the master to 3.3 too?
>>> >>
>>> >> Bests
>>> >>
>>> >> Kent Yao
>>> >>
>>> >>
>>> >> Yuming Wang  于2022年5月18日周三 19:04写道:
>>> >> >
>>> >> > -1. There is a regression: https://github.com/apache/spark/pull/36595
>>> >> >
>>> >> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov  
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi,
>>> >> >>
>>> >> >> [X] +1 Release this package as Apache Spark 3.3.0
>>> >> >>
>>> >> >> Tested:
>>> >> >> - make local distribution from sources (with 
>>> >> >> ./dev/make-distribution.sh --tgz --name with-volcano 
>>> >> >> -Pkubernetes,volcano,hadoop-3)
>>> >> >> - create a Docker image (with JDK 11)
>>> >> >> - run Pi example on
>>> >> >> -- local
>>> >> >> -- Kubernetes with default scheduler
>>> >> >> -- Kubernetes with Volcano scheduler
>>> >> >>
>>> >> >> On both x86_64 and aarch64 !
>>> >> >>
>>> >> >> Regards,
>>> >> >> Martin
>>> >> >>
>>> >> >>
>>> >> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk 
>>> >> >>  wrote:
>>> >> >>>
>>> >> >>> Please vote on releasing the following candidate as Apache Spark 
>>> >> >>> version 3.3.0.
>>> >> >>>
>>> >> >>> The vote is open until 11:59pm Pacific time May 19th and passes if a 
>>> >> >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> >> >>>
>>> >> >>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> >> >>> [ ] -1 Do not release this package because ...
>>> >> >>>
>>> >> >>> To learn more about Apache Spark, please see http://spark.apache.org/
>>> >> >>>
>>> >> >>> The tag to be voted on is v3.3.0-rc2 (commit 
>>> >> >>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>>> >> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
>>> >> >>>
>>> >> >>> The release files, including signatures, digests, etc. can be found 
>>> >> >>> at:
>>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>>> >> >>>
>>> >> >>> Signatures used for Spark RCs can be found in this file:
>>> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >> >>>
>>> >> >>> The staging repository for this release can be found at:
>>> >> >>> https://repository.apache.org/content/repositories/orgapachespark-1403
>>> >> >>>
>>> >> >>> The documentation corresponding to this release can be found at:
>>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>>> >> >>>
>>> >> >>> The list of bug fixes going into 3.3.0 can be found at the following 
>>> >> >>> URL:
>>> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>> >> >>>
>>> >> >>> This release is using the release script of the tag v3.3.0-rc2.
>>> >> >>>
>>> >> >>>
>>> >> >>> FAQ
>>> >> >>>
>>> >> >>> =
>>> >> >>> How can I help test this release?
>>> >> >>> =
>>> >> >>> If you are a Spark user, you can help us test this release by taking
>>> >> >>> an existing Spark workload and running on this release candidate, 
>>> >> >>> then
>>> >> >>> reporting any regressions.
>>> >> >>>
>>> >> >>> If you're working in PySpark you can set up a virtual env and install
>>> >> >>> the current RC and see if anything important breaks, in the 
>>> >> >>> Java/Scala
>>> >> >>> you can add the staging repository to your projects resolvers and 
>>> >> >>> test
>>> >> >>> with the RC (make sure to clean up the artifact cache before/after so
>>> >> >>> you don't end up building with a out of date RC going forward).
>>> >> >>>
>>> >> >>> ===
>>> >> >>> What should happen to JIRA 

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Gengliang Wang
Hi Kent and Wenchen,

Thanks for reporting. I just created
https://github.com/apache/spark/pull/36609 to fix the issue.

Gengliang

On Thu, May 19, 2022 at 5:40 PM Wenchen Fan  wrote:

> I think it should have been fixed  by
> https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed
> . Maybe the fix is not completed...
>
> On Thu, May 19, 2022 at 2:16 PM Kent Yao  wrote:
>
>> Thanks, Maxim.
>>
>> Leave my -1 for this release candidate.
>>
>> Unfortunately, I don't know which PR fixed this.
>> Does anyone happen to know?
>>
>> BR,
>> Kent Yao
>>
>> Maxim Gekk  于2022年5月19日周四 13:42写道:
>> >
>> > Hi Kent,
>> >
>> > > Shall we backport the fix from the master to 3.3 too?
>> >
>> > Yes, we shall.
>> >
>> > Maxim Gekk
>> >
>> > Software Engineer
>> >
>> > Databricks, Inc.
>> >
>> >
>> >
>> > On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
>> >>
>> >> Hi,
>> >>
>> >> I verified the simple case below with the binary release, and it looks
>> >> like a bug to me.
>> >>
>> >> bin/spark-sql -e "select date '2018-11-17' > 1"
>> >>
>> >> Error in query: Invalid call to toAttribute on unresolved object;
>> >> 'Project [unresolvedalias((2018-11-17 > 1), None)]
>> >> +- OneRowRelation
>> >>
>> >> Both 3.2 releases and the master branch work fine with correct errors
>> >> -  'due to data type mismatch'.
>> >>
>> >> Shall we backport the fix from the master to 3.3 too?
>> >>
>> >> Bests
>> >>
>> >> Kent Yao
>> >>
>> >>
>> >> Yuming Wang  于2022年5月18日周三 19:04写道:
>> >> >
>> >> > -1. There is a regression:
>> https://github.com/apache/spark/pull/36595
>> >> >
>> >> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov <
>> mgrigo...@apache.org> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> [X] +1 Release this package as Apache Spark 3.3.0
>> >> >>
>> >> >> Tested:
>> >> >> - make local distribution from sources (with
>> ./dev/make-distribution.sh --tgz --name with-volcano
>> -Pkubernetes,volcano,hadoop-3)
>> >> >> - create a Docker image (with JDK 11)
>> >> >> - run Pi example on
>> >> >> -- local
>> >> >> -- Kubernetes with default scheduler
>> >> >> -- Kubernetes with Volcano scheduler
>> >> >>
>> >> >> On both x86_64 and aarch64 !
>> >> >>
>> >> >> Regards,
>> >> >> Martin
>> >> >>
>> >> >>
>> >> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk <
>> maxim.g...@databricks.com.invalid> wrote:
>> >> >>>
>> >> >>> Please vote on releasing the following candidate as Apache Spark
>> version 3.3.0.
>> >> >>>
>> >> >>> The vote is open until 11:59pm Pacific time May 19th and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >> >>>
>> >> >>> [ ] +1 Release this package as Apache Spark 3.3.0
>> >> >>> [ ] -1 Do not release this package because ...
>> >> >>>
>> >> >>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>> >> >>>
>> >> >>> The tag to be voted on is v3.3.0-rc2 (commit
>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>> >> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
>> >> >>>
>> >> >>> The release files, including signatures, digests, etc. can be
>> found at:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>> >> >>>
>> >> >>> Signatures used for Spark RCs can be found in this file:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >> >>>
>> >> >>> The staging repository for this release can be found at:
>> >> >>>
>> https://repository.apache.org/content/repositories/orgapachespark-1403
>> >> >>>
>> >> >>> The documentation corresponding to this release can be found at:
>> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>> >> >>>
>> >> >>> The list of bug fixes going into 3.3.0 can be found at the
>> following URL:
>> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>> >> >>>
>> >> >>> This release is using the release script of the tag v3.3.0-rc2.
>> >> >>>
>> >> >>>
>> >> >>> FAQ
>> >> >>>
>> >> >>> =
>> >> >>> How can I help test this release?
>> >> >>> =
>> >> >>> If you are a Spark user, you can help us test this release by
>> taking
>> >> >>> an existing Spark workload and running on this release candidate,
>> then
>> >> >>> reporting any regressions.
>> >> >>>
>> >> >>> If you're working in PySpark you can set up a virtual env and
>> install
>> >> >>> the current RC and see if anything important breaks, in the
>> Java/Scala
>> >> >>> you can add the staging repository to your projects resolvers and
>> test
>> >> >>> with the RC (make sure to clean up the artifact cache before/after
>> so
>> >> >>> you don't end up building with a out of date RC going forward).
>> >> >>>
>> >> >>> ===
>> >> >>> What should happen to JIRA tickets still targeting 3.3.0?
>> >> >>> ===
>> >> >>> The current list of open tickets targeted at 3.3.0 can be found at:
>> >> >>> https://issues.apache.org/jira/projects/SPARK and search for
>> "Target Version/s" = 3.3.0

Final reminder: ApacheCon North America call for presentations closing soon

2022-05-19 Thread Rich Bowen
[Note: You're receiving this because you are subscribed to one or more
Apache Software Foundation project mailing lists.]

This is your final reminder that the Call for Presetations for
ApacheCon North America 2022 will close at 00:01 GMT on Monday, May
23rd, 2022. Please don't wait! Get your talk proposals in now!

Details here: https://apachecon.com/acna2022/cfp.html

--Rich, for the ApacheCon Planners



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Wenchen Fan
I think it should have been fixed  by
https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed
. Maybe the fix is not completed...

On Thu, May 19, 2022 at 2:16 PM Kent Yao  wrote:

> Thanks, Maxim.
>
> Leave my -1 for this release candidate.
>
> Unfortunately, I don't know which PR fixed this.
> Does anyone happen to know?
>
> BR,
> Kent Yao
>
> Maxim Gekk  于2022年5月19日周四 13:42写道:
> >
> > Hi Kent,
> >
> > > Shall we backport the fix from the master to 3.3 too?
> >
> > Yes, we shall.
> >
> > Maxim Gekk
> >
> > Software Engineer
> >
> > Databricks, Inc.
> >
> >
> >
> > On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
> >>
> >> Hi,
> >>
> >> I verified the simple case below with the binary release, and it looks
> >> like a bug to me.
> >>
> >> bin/spark-sql -e "select date '2018-11-17' > 1"
> >>
> >> Error in query: Invalid call to toAttribute on unresolved object;
> >> 'Project [unresolvedalias((2018-11-17 > 1), None)]
> >> +- OneRowRelation
> >>
> >> Both 3.2 releases and the master branch work fine with correct errors
> >> -  'due to data type mismatch'.
> >>
> >> Shall we backport the fix from the master to 3.3 too?
> >>
> >> Bests
> >>
> >> Kent Yao
> >>
> >>
> >> Yuming Wang  于2022年5月18日周三 19:04写道:
> >> >
> >> > -1. There is a regression: https://github.com/apache/spark/pull/36595
> >> >
> >> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> [X] +1 Release this package as Apache Spark 3.3.0
> >> >>
> >> >> Tested:
> >> >> - make local distribution from sources (with
> ./dev/make-distribution.sh --tgz --name with-volcano
> -Pkubernetes,volcano,hadoop-3)
> >> >> - create a Docker image (with JDK 11)
> >> >> - run Pi example on
> >> >> -- local
> >> >> -- Kubernetes with default scheduler
> >> >> -- Kubernetes with Volcano scheduler
> >> >>
> >> >> On both x86_64 and aarch64 !
> >> >>
> >> >> Regards,
> >> >> Martin
> >> >>
> >> >>
> >> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk <
> maxim.g...@databricks.com.invalid> wrote:
> >> >>>
> >> >>> Please vote on releasing the following candidate as Apache Spark
> version 3.3.0.
> >> >>>
> >> >>> The vote is open until 11:59pm Pacific time May 19th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >> >>>
> >> >>> [ ] +1 Release this package as Apache Spark 3.3.0
> >> >>> [ ] -1 Do not release this package because ...
> >> >>>
> >> >>> To learn more about Apache Spark, please see
> http://spark.apache.org/
> >> >>>
> >> >>> The tag to be voted on is v3.3.0-rc2 (commit
> c8c657b922ac8fd8dcf9553113e11a80079db059):
> >> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
> >> >>>
> >> >>> The release files, including signatures, digests, etc. can be found
> at:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
> >> >>>
> >> >>> Signatures used for Spark RCs can be found in this file:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >>>
> >> >>> The staging repository for this release can be found at:
> >> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1403
> >> >>>
> >> >>> The documentation corresponding to this release can be found at:
> >> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
> >> >>>
> >> >>> The list of bug fixes going into 3.3.0 can be found at the
> following URL:
> >> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
> >> >>>
> >> >>> This release is using the release script of the tag v3.3.0-rc2.
> >> >>>
> >> >>>
> >> >>> FAQ
> >> >>>
> >> >>> =
> >> >>> How can I help test this release?
> >> >>> =
> >> >>> If you are a Spark user, you can help us test this release by taking
> >> >>> an existing Spark workload and running on this release candidate,
> then
> >> >>> reporting any regressions.
> >> >>>
> >> >>> If you're working in PySpark you can set up a virtual env and
> install
> >> >>> the current RC and see if anything important breaks, in the
> Java/Scala
> >> >>> you can add the staging repository to your projects resolvers and
> test
> >> >>> with the RC (make sure to clean up the artifact cache before/after
> so
> >> >>> you don't end up building with a out of date RC going forward).
> >> >>>
> >> >>> ===
> >> >>> What should happen to JIRA tickets still targeting 3.3.0?
> >> >>> ===
> >> >>> The current list of open tickets targeted at 3.3.0 can be found at:
> >> >>> https://issues.apache.org/jira/projects/SPARK and search for
> "Target Version/s" = 3.3.0
> >> >>>
> >> >>> Committers should look at those and triage. Extremely important bug
> >> >>> fixes, documentation, and API tweaks that impact compatibility
> should
> >> >>> be worked on immediately. Everything else please retarget to an
> >> >>> appropriate release.
> >> >>>
> >> >>> ==
> >> >>> But my bug isn't fixed?
> >> >>> ==

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Kent Yao
Thanks, Maxim.

Leave my -1 for this release candidate.

Unfortunately, I don't know which PR fixed this.
Does anyone happen to know?

BR,
Kent Yao

Maxim Gekk  于2022年5月19日周四 13:42写道:
>
> Hi Kent,
>
> > Shall we backport the fix from the master to 3.3 too?
>
> Yes, we shall.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
>
> On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:
>>
>> Hi,
>>
>> I verified the simple case below with the binary release, and it looks
>> like a bug to me.
>>
>> bin/spark-sql -e "select date '2018-11-17' > 1"
>>
>> Error in query: Invalid call to toAttribute on unresolved object;
>> 'Project [unresolvedalias((2018-11-17 > 1), None)]
>> +- OneRowRelation
>>
>> Both 3.2 releases and the master branch work fine with correct errors
>> -  'due to data type mismatch'.
>>
>> Shall we backport the fix from the master to 3.3 too?
>>
>> Bests
>>
>> Kent Yao
>>
>>
>> Yuming Wang  于2022年5月18日周三 19:04写道:
>> >
>> > -1. There is a regression: https://github.com/apache/spark/pull/36595
>> >
>> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov  
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> [X] +1 Release this package as Apache Spark 3.3.0
>> >>
>> >> Tested:
>> >> - make local distribution from sources (with ./dev/make-distribution.sh 
>> >> --tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
>> >> - create a Docker image (with JDK 11)
>> >> - run Pi example on
>> >> -- local
>> >> -- Kubernetes with default scheduler
>> >> -- Kubernetes with Volcano scheduler
>> >>
>> >> On both x86_64 and aarch64 !
>> >>
>> >> Regards,
>> >> Martin
>> >>
>> >>
>> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk 
>> >>  wrote:
>> >>>
>> >>> Please vote on releasing the following candidate as Apache Spark version 
>> >>> 3.3.0.
>> >>>
>> >>> The vote is open until 11:59pm Pacific time May 19th and passes if a 
>> >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 3.3.0
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>> To learn more about Apache Spark, please see http://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v3.3.0-rc2 (commit 
>> >>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>> >>>
>> >>> Signatures used for Spark RCs can be found in this file:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>> https://repository.apache.org/content/repositories/orgapachespark-1403
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>> >>>
>> >>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>> >>>
>> >>> This release is using the release script of the tag v3.3.0-rc2.
>> >>>
>> >>>
>> >>> FAQ
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>> If you are a Spark user, you can help us test this release by taking
>> >>> an existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set up a virtual env and install
>> >>> the current RC and see if anything important breaks, in the Java/Scala
>> >>> you can add the staging repository to your projects resolvers and test
>> >>> with the RC (make sure to clean up the artifact cache before/after so
>> >>> you don't end up building with a out of date RC going forward).
>> >>>
>> >>> ===
>> >>> What should happen to JIRA tickets still targeting 3.3.0?
>> >>> ===
>> >>> The current list of open tickets targeted at 3.3.0 can be found at:
>> >>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> >>> Version/s" = 3.3.0
>> >>>
>> >>> Committers should look at those and triage. Extremely important bug
>> >>> fixes, documentation, and API tweaks that impact compatibility should
>> >>> be worked on immediately. Everything else please retarget to an
>> >>> appropriate release.
>> >>>
>> >>> ==
>> >>> But my bug isn't fixed?
>> >>> ==
>> >>> In order to make timely releases, we will typically not hold the
>> >>> release unless the bug in question is a regression from the previous
>> >>> release. That being said, if there is something which is a regression
>> >>> that has not been correctly targeted please ping me or a committer to
>> >>> help target the issue.
>> >>>
>> >>> Maxim Gekk
>> >>>
>> >>> Software Engineer
>> >>>
>> >>> Databricks, Inc.
>>
>>