Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread huaxin gao
I agree with Prashant, -1 from me too because this may break iceberg usage. Thanks, Huaxin On Wed, Jun 8, 2022 at 10:07 AM Prashant Singh wrote: > -1 from my side as well, found this today. > > While testing Apache iceberg with 3.3 found this bug where a table with > partitions with null

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread huaxin gao
Thanks Dongjoon for opening a jira to track this issue. I agree this is a flaky test. I have seen the flakiness in our internal tests. I also agree this is a non-blocker because the feature is disabled by default. I will try to take a look to see if I can find the root cause. Thanks, Huaxin On

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Prashant Singh
-1 from my side as well, found this today. While testing Apache iceberg with 3.3 found this bug where a table with partitions with null values we get a NPE on partition discovery, earlier we use to get `DEFAULT_PARTITION_NAME` Please look into : https://issues.apache.org/jira/browse/SPARK-39417

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Jerry Peng
I agree with Jungtaek, -1 from me because of the issue of Kafka source throwing an error with an incorrect error message that was introduced recently. This may mislead users and cause unnecessary confusion. On Wed, Jun 8, 2022 at 12:04 AM Jungtaek Lim wrote: > Apologize for late

Root group membership

2022-06-08 Thread Rodrigo
Hi Everyone, My Security team has raised concerns about the requirement for root group membership for Spark running on Kubernetes. Does anyone know the reasons for that requirement, how insecure it is, and any alternatives if at all? Thanks, Rodrigo

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Jungtaek Lim
Apologize for late participation. I'm sorry, but -1 (non-binding) from me. Unfortunately I found a major user-facing issue which hurts UX seriously on Kafka data source usage. In some cases, Kafka data source can throw IllegalStateException for the case of failOnDataLoss=true which condition is

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Hyukjin Kwon
Okay. Thankfully the binary release is fine per https://github.com/apache/spark/blob/v3.3.0-rc5/dev/create-release/release-build.sh#L268 . The source package (and GitHub tag) has 3.3.0.dev0, and the binary package has 3.3.0. Technically this is not a blocker now because PyPI upload will be able to

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-08 Thread Hyukjin Kwon
Arrrgh .. I am very sorry that I found this problem late. RC 5 does not have the correct version of PySpark, see https://github.com/apache/spark/blob/v3.3.0-rc5/python/pyspark/version.py#L19 I think the release script was broken because the version now has 'str' type, see

[External] subscribe request

2022-06-08 Thread Zongsi Zhang
-- [image: Grab] [image: Twitter] [image: Facebook] [image: LinkedIn] [image: Instagram] [image: Youtube]