Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-09 Thread Maxim Gekk
Hi All,

Results of voting for Spark 3.3.0 RC5 are:

+1:
Sean Owen (*)
Dongjoon Hyun (*)
Yuming Wang
Yikun Jiang
Martin Grigorov
Thomas Graves (*)
Gengliang Wang
L. C. Hsieh (*)
Cheng Su
Chris Nauroth
Cheng Pan

0:
Hyukjin Kwon (*)

-1:
Jungtaek Lim
Jerry Peng
Prashant Singh
Huaxin Gao

I consider the voting as *failed*. I will prepare RC6 as soon as the issues
mentioned in the thread are solved.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Jun 8, 2022 at 9:18 PM huaxin gao  wrote:

> I agree with Prashant, -1 from me too because this may break iceberg
> usage.
>
> Thanks,
> Huaxin
>
> On Wed, Jun 8, 2022 at 10:07 AM Prashant Singh 
> wrote:
>
>> -1 from my side as well, found this today.
>>
>> While testing Apache iceberg with 3.3 found this bug where a table with
>> partitions with null values we get a NPE on partition discovery, earlier we
>> use to get `DEFAULT_PARTITION_NAME`
>>
>> Please look into : https://issues.apache.org/jira/browse/SPARK-39417 for
>> more details
>>
>> Regards,
>> Prashant Singh
>>
>> On Wed, Jun 8, 2022 at 10:27 PM Jerry Peng 
>> wrote:
>>
>>>
>>>
>>> I agree with Jungtaek,  -1 from me because of the issue of Kafka source
>>> throwing an error with an incorrect error message that was introduced
>>> recently.  This may mislead users and cause unnecessary confusion.
>>>
>>> On Wed, Jun 8, 2022 at 12:04 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Apologize for late participation.

 I'm sorry, but -1 (non-binding) from me.

 Unfortunately I found a major user-facing issue which hurts UX
 seriously on Kafka data source usage.

 In some cases, Kafka data source can throw IllegalStateException for
 the case of failOnDataLoss=true which condition is bound to the state of
 Kafka topic (not Spark's issue). With the recent change of Spark,
 IllegalStateException is now bound to the "internal error", and Spark gives
 incorrect guidance to the end users, telling to end users that Spark has a
 bug and they are encouraged to file a JIRA ticket which is simply wrong.

 Previously, Kafka data source provided the error message with the
 context why it failed, and how to workaround it. I feel this is a serious
 regression on UX.

 Please look into https://issues.apache.org/jira/browse/SPARK-39412 for
 more details.


 On Wed, Jun 8, 2022 at 3:40 PM Hyukjin Kwon 
 wrote:

> Okay. Thankfully the binary release is fine per
> https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-build.sh#L268
> .
> The source package (and GitHub tag) has 3.3.0.dev0, and the binary
> package has 3.3.0. Technically this is not a blocker now because PyPI
> upload will be able to be made correctly.
> I lowered the priority to critical. I switch my -1 to 0.
>
> On Wed, 8 Jun 2022 at 15:17, Hyukjin Kwon  wrote:
>
>> Arrrgh  .. I am very sorry that I found this problem late.
>> RC 5 does not have the correct version of PySpark, see
>> https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19
>> I think the release script was broken because the version now has
>> 'str' type, see
>> https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-tag.sh#L88
>> I filed a JIRA at https://issues.apache.org/jira/browse/SPARK-39411
>>
>> -1 from me
>>
>>
>>
>> On Wed, 8 Jun 2022 at 13:16, Cheng Pan  wrote:
>>
>>> +1 (non-binding)
>>>
>>> * Verified SPARK-39313 has been address[1]
>>> * Passed integration test w/ Apache Kyuubi (Incubating)[2]
>>>
>>> [1]
>>> https://github.com/housepower/spark-clickhouse-connector/pull/123
>>> [2] https://github.com/apache/incubator-kyuubi/pull/2817
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>> On Wed, Jun 8, 2022 at 7:04 AM Chris Nauroth 
>>> wrote:
>>> >
>>> > +1 (non-binding)
>>> >
>>> > * Verified all checksums.
>>> > * Verified all signatures.
>>> > * Built from source, with multiple profiles, to full success, for
>>> Java 11 and Scala 2.13:
>>> > * build/mvn -Phadoop-3 -Phadoop-cloud -Phive-thriftserver
>>> -Pkubernetes -Pscala-2.13 -Psparkr -Pyarn -DskipTests clean package
>>> > * Tests passed.
>>> > * Ran several examples successfully:
>>> > * bin/spark-submit --class org.apache.spark.examples.SparkPi
>>> examples/jars/spark-examples_2.12-3.3.0.jar
>>> > * bin/spark-submit --class
>>> org.apache.spark.examples.sql.hive.SparkHiveExample
>>> examples/jars/spark-examples_2.12-3.3.0.jar
>>> > * bin/spark-submit
>>> examples/src/main/python/streaming/network_wordcount.py localhost 
>>> > * Tested some of the issues that blocked prior release candidates:
>>> > * bin/spark-sql -e 'SELECT (SELECT IF(x, 1, 0)) AS a FROM
>>> (SELECT true) t(x) UNION SELECT 1 AS a;'
>>

[VOTE] Release Spark 3.3.0 (RC6)

2022-06-09 Thread Maxim Gekk
Please vote on releasing the following candidate as
Apache Spark version 3.3.0.

The vote is open until 11:59pm Pacific time June 14th and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.3.0-rc6 (commit
f74867bddfbcdd4d08076db36851e88b15e66556):
https://github.com/apache/spark/tree/v3.3.0-rc6

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1407

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/

The list of bug fixes going into 3.3.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12350369

This release is using the release script of the tag v3.3.0-rc6.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.0?
===
The current list of open tickets targeted at 3.3.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.3.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Maxim Gekk

Software Engineer

Databricks, Inc.