Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-12 Thread Chris Nauroth
+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success:
* build/mvn -Phadoop-3.2 -Phadoop-cloud -Phive-2.3 -Phive-thriftserver
-Pkubernetes -Pscala-2.12 -Psparkr -Pyarn -DskipTests clean package
* Tests passed.
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/spark-examples_2.12-3.2.4.jar
* bin/spark-submit --class
org.apache.spark.examples.sql.hive.SparkHiveExample
examples/jars/spark-examples_2.12-3.2.4.jar
* bin/spark-submit
examples/src/main/python/streaming/network_wordcount.py localhost 

Thank you, Dongjoon!

Chris Nauroth


On Wed, Apr 12, 2023 at 3:49 AM Shaoyun Chen  wrote:

> +1 (non-binding)
>
> On 2023/04/12 04:36:59 Jungtaek Lim wrote:
> > +1 (non-binding)
> >
> > Thanks for driving the release!
> >
> > On Wed, Apr 12, 2023 at 3:41 AM Xinrong Meng 
> > wrote:
> >
> > > +1 non-binding
> > >
> > > Thank you Doogjoon!
> > >
> > > Wenchen Fan 于2023年4月10日 周一下午11:32写道:
> > >
> > >> +1
> > >>
> > >> On Tue, Apr 11, 2023 at 10:09 AM Hyukjin Kwon 
> > >> wrote:
> > >>
> > >>> +1
> > >>>
> > >>> On Tue, 11 Apr 2023 at 11:04, Ruifeng Zheng 
> > >>> wrote:
> > >>>
> > >>>> +1 (non-binding)
> > >>>>
> > >>>> Thank you for driving this release!
> > >>>>
> > >>>> --
> > >>>> Ruifeng  Zheng
> > >>>> ruife...@foxmail.com
> > >>>>
> > >>>> <
> https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage=true=Ruifeng++Zheng=https%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Doidb%26k%3DTf4peOQcGSGPmJMrEjyy8A%26s%3D0=ruifengz%40foxmail.com=jZbrY21QDsAcndKywdMVSJp1IMRfkNRuG3FZaHEiBGuqp0tP0yQoosO3ynB9_PwnV99-o_S6OBufSRUEEkqBOV5EdipJwmqkFSlUiJu0oDI
> >
> > >>>>
> > >>>>
> > >>>>
> > >>>> -- Original --
> > >>>> *From:* "Yuming Wang" ;
> > >>>> *Date:* Tue, Apr 11, 2023 09:56 AM
> > >>>> *To:* "Mridul Muralidharan";
> > >>>> *Cc:* "huaxin gao";"Chao Sun"<
> > >>>> sunc...@apache.org>;"yangjie01";"Dongjoon
> Hyun"<
> > >>>> dongj...@apache.org>;"Sean Owen";"
> > >>>> dev@spark.apache.org";
> > >>>> *Subject:* Re: [VOTE] Release Apache Spark 3.2.4 (RC1)
> > >>>>
> > >>>> +1.
> > >>>>
> > >>>> On Tue, Apr 11, 2023 at 12:17 AM Mridul Muralidharan <
> mri...@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> +1
> > >>>>>
> > >>>>> Signatures, digests, etc check out fine.
> > >>>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos
> > >>>>> -Pkubernetes
> > >>>>>
> > >>>>> Regards,
> > >>>>> Mridul
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Apr 10, 2023 at 10:34 AM huaxin gao <
> huaxin.ga...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> +1
> > >>>>>>
> > >>>>>> On Mon, Apr 10, 2023 at 8:17 AM Chao Sun 
> wrote:
> > >>>>>>
> > >>>>>>> +1 (non-binding)
> > >>>>>>>
> > >>>>>>> On Mon, Apr 10, 2023 at 7:07 AM yangjie01 
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> +1 (non-binding)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> *发件人**: *Sean Owen 
> > >>>>>>>> *日期**: *2023年4月10日 星期一 21:19
> > >>>>>>>> *收件人**: *Dongjoon Hyun 
> > >>>>>>>> *抄送**: *"dev@spark.apache.org" 
> > >>>>>>>> *主题**: *Re: [VOTE] Release Apache Spark 3.2.4 (RC1)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>&g

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-12 Thread Chris Nauroth
+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success:
* build/mvn -Phadoop-cloud -Phive-thriftserver -Pkubernetes -Psparkr
-Pyarn -DskipTests clean package
* Tests passed.
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/spark-examples_2.13-3.4.0.jar
* bin/spark-submit --class
org.apache.spark.examples.sql.hive.SparkHiveExample
examples/jars/spark-examples_2.13-3.4.0.jar
* bin/spark-submit
examples/src/main/python/streaming/network_wordcount.py localhost 

Chris Nauroth


On Tue, Apr 11, 2023 at 10:36 PM beliefer  wrote:

> +1
>
>
> At 2023-04-08 07:29:46, "Xinrong Meng"  wrote:
>
> Please vote on releasing the following candidate(RC7) as Apache Spark
> version 3.4.0.
>
> The vote is open until 11:59pm Pacific time *April 12th* and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.4.0-rc7 (commit
> 87a5442f7ed96b11051d8a9333476d080054e5a0):
> https://github.com/apache/spark/tree/v3.4.0-rc7
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1441
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
>
> The list of bug fixes going into 3.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351465
>
> This release is using the release script of the tag v3.4.0-rc7.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.4.0?
> ===
> The current list of open tickets targeted at 3.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Thanks,
> Xinrong Meng
>
>


Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Chris Nauroth
+1 (non-binding)

Gengliang, thank you for the SPIP.

Chris Nauroth


On Wed, Nov 16, 2022 at 4:27 AM Maciej  wrote:

> +1
>
> On 11/16/22 13:19, Yuming Wang wrote:
> > +1, non-binding
> >
> > On Wed, Nov 16, 2022 at 8:12 PM Yang,Jie(INF)  > <mailto:yangji...@baidu.com>> wrote:
> >
> > +1, non-binding
> >
> > __ __
> >
> > Yang Jie
> >
> > __ __
> >
> > *发件人**: *Mridul Muralidharan  > <mailto:mri...@gmail.com>>
> > *日期**: *2022年11月16日星期三17:35
> > *收件人**: *Kent Yao mailto:y...@apache.org>>
> > *抄送**: *Gengliang Wang  > <mailto:ltn...@gmail.com>>, dev  > <mailto:dev@spark.apache.org>>
> > *主题**: *Re: [VOTE][SPIP] Better Spark UI scalability and Driver
> > stability for large applications
> >
> > __ __
> >
> > __ __
> >
> > +1
> >
> > __ __
> >
> > Would be great to see history server performance improvements and
> > lower resource utilization at driver !
> >
> > __ __
> >
> > Regards,
> >
> > Mridul 
> >
> > __ __
> >
> > On Wed, Nov 16, 2022 at 2:38 AM Kent Yao  > <mailto:y...@apache.org>> wrote:
> >
> > +1, non-binding
> >
> > Gengliang Wang mailto:ltn...@gmail.com>> 于
> > 2022年11月16日周三16:36写道:
> > >
> > > Hi all,
> > >
> > > I’d like to start a vote for SPIP: "Better Spark UI
> scalability and Driver stability for large applications"
> > >
> > > The goal of the SPIP is to improve the Driver's stability by
> supporting storing Spark's UI data on RocksDB. Furthermore, to fasten the
> read and write operations on RocksDB, it introduces a new Protobuf
> serializer.
> > >
> > > Please also refer to the following:
> > >
> > > Previous discussion in the dev mailing list: [DISCUSS] SPIP:
> Better Spark UI scalability and Driver stability for large applications
> > > Design Doc: Better Spark UI scalability and Driver stability
> for large applications
> > > JIRA: SPARK-41053
> > >
> > >
> > > Please vote on the SPIP for the next 72 hours:
> > >
> > > [ ] +1: Accept the proposal as an official SPIP
> > > [ ] +0
> > > [ ] -1: I don’t think this is a good idea because …
> > >
> > > Kind Regards,
> > > Gengliang
> >
> >
>  -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > <mailto:dev-unsubscr...@spark.apache.org>
> >
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
>


Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-16 Thread Chris Nauroth
+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success, for Java 11
and Scala 2.12:
* build/mvn -Phadoop-3.2 -Phadoop-cloud -Phive-2.3 -Phive-thriftserver
-Pkubernetes -Pscala-2.12 -Psparkr -Pyarn -DskipTests clean package
* Tests passed.
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/spark-examples_2.12-3.2.3.jar
* bin/spark-submit --class
org.apache.spark.examples.sql.hive.SparkHiveExample
examples/jars/spark-examples_2.12-3.2.3.jar
* bin/spark-submit
examples/src/main/python/streaming/network_wordcount.py localhost 

Chao, thank you for preparing the release.

Chris Nauroth


On Wed, Nov 16, 2022 at 5:22 AM Yuming Wang  wrote:

> +1
>
> On Wed, Nov 16, 2022 at 2:28 PM Yang,Jie(INF)  wrote:
>
>> I switched Scala 2.13 to Scala 2.12 today. The test is still in progress
>> and it has not been hung.
>>
>>
>>
>> Yang Jie
>>
>>
>>
>> *发件人**: *Dongjoon Hyun 
>> *日期**: *2022年11月16日 星期三 01:17
>> *收件人**: *"Yang,Jie(INF)" 
>> *抄送**: *huaxin gao , "L. C. Hsieh" <
>> vii...@gmail.com>, Chao Sun , dev <
>> dev@spark.apache.org>
>> *主题**: *Re: [VOTE] Release Spark 3.2.3 (RC1)
>>
>>
>>
>> Did you hit that in Scala 2.12, too?
>>
>>
>>
>> Dongjoon.
>>
>>
>>
>> On Tue, Nov 15, 2022 at 4:36 AM Yang,Jie(INF) 
>> wrote:
>>
>> Hi, all
>>
>>
>>
>> I test v3.2.3 with following command:
>>
>>
>>
>> ```
>>
>> dev/change-scala-version.sh 2.13
>>
>> build/mvn clean install -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn
>> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>> -Pscala-2.13 -fn
>>
>> ```
>>
>>
>>
>> The testing environment is:
>>
>>
>>
>> OS: CentOS 6u3 Final
>>
>> Java: zulu 11.0.17
>>
>> Python: 3.9.7
>>
>> Scala: 2.13
>>
>>
>>
>> The above test command has been executed twice, and all times hang in the
>> following stack:
>>
>>
>>
>> ```
>>
>> "ScalaTest-main-running-JoinSuite" #1 prio=5 os_prio=0 cpu=312870.06ms
>> elapsed=1552.65s tid=0x7f2ddc02d000 nid=0x7132 waiting on condition
>> [0x7f2de3929000]
>>
>>java.lang.Thread.State: WAITING (parking)
>>
>>at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
>>
>>- parking to wait for  <0x000790d00050> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>
>>at java.util.concurrent.locks.LockSupport.park(java.base@11.0.17
>> /LockSupport.java:194)
>>
>>at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.17
>> /AbstractQueuedSynchronizer.java:2081)
>>
>>at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.17
>> /LinkedBlockingQueue.java:433)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:275)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec$$Lambda$9429/0x000802269840.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:228)
>>
>>- locked <0x000790d00208> (a java.lang.Object)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:370)
>>
>>at
>> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:355)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan$$Lambda$8573/0x000801f99c40.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
>>
>>at
>> org.apache.spark.sql.execution.SparkPlan$$Lambda$8574/0x000801f9a040.apply(Unknown
>> Source)
>>
>>at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>
>>at
>> or

Re: Missing data in spark output

2022-10-21 Thread Chris Nauroth
Some users have observed issues like what you're describing related to the
job commit algorithm, which is controlled by configuration
property spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.
Hadoop's default value for this setting is 2. You can find a description of
the algorithms in Hadoop's configuration documentation:

https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Algorithm version 2 is faster, because the final task output file renames
can be issued in parallel by individual tasks. Unfortunately, there have
been reports of it causing side effects like what you described, especially
if there are a lot of task attempt retries or speculative execution
(configuration property spark.speculation set to true instead of the
default false). You could try switching to algorithm version 1. The
drawback is that it's slower, because the final output renames are executed
single-threaded at the end of the job. The performance impact is more
noticeable for jobs with many tasks, and the effect is amplified when using
cloud storage as opposed to HDFS running in the same network.

If you are using speculative execution, then you could also potentially try
turning that off.

Chris Nauroth


On Wed, Oct 19, 2022 at 8:18 AM Martin Andersson 
wrote:

> Is your spark job batch or streaming?
> --
> *From:* Sandeep Vinayak 
> *Sent:* Tuesday, October 18, 2022 19:48
> *To:* dev@spark.apache.org 
> *Subject:* Missing data in spark output
>
>
> EXTERNAL SENDER. Do not click links or open attachments unless you
> recognize the sender and know the content is safe. DO NOT provide your
> username or password.
>
> Hello Everyone,
>
> We are recently observing an intermittent data loss in the spark with
> output to GCS (google cloud storage). When there are missing rows, they are
> accompanied by duplicate rows. The re-run of the job doesn't have any
> duplicate or missing rows. Since it's hard to debug, we are first trying to
> understand the potential theoretical root cause of this issue, can this be
> a GCS specific issue where GCS might not be handling the consistencies
> well? Any tips will be super helpful.
>
> Thanks,
>
>


Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Chris Nauroth
+1 (non-binding)

I repeated all checks I described for RC5:

https://lists.apache.org/thread/ksoxmozgz7q728mnxl6c2z7ncmo87vls

Maxim, thank you for your dedication on these release candidates.

Chris Nauroth


On Mon, Jun 13, 2022 at 3:21 PM Mridul Muralidharan 
wrote:

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>
> The test "SPARK-33084: Add jar support Ivy URI in SQL" in
> sql.SQLQuerySuite fails; but other than that, rest looks good.
>
> Regards,
> Mridul
>
>
>
> On Mon, Jun 13, 2022 at 4:25 PM Tom Graves 
> wrote:
>
>> +1
>>
>> Tom
>>
>> On Thursday, June 9, 2022, 11:27:50 PM CDT, Maxim Gekk
>>  wrote:
>>
>>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time June 14th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc6 (commit
>> f74867bddfbcdd4d08076db36851e88b15e66556):
>> https://github.com/apache/spark/tree/v3.3.0-rc6
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1407
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc6.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>


Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-07 Thread Chris Nauroth
+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success, for Java 11
and Scala 2.13:
* build/mvn -Phadoop-3 -Phadoop-cloud -Phive-thriftserver -Pkubernetes
-Pscala-2.13 -Psparkr -Pyarn -DskipTests clean package
* Tests passed.
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/spark-examples_2.12-3.3.0.jar
* bin/spark-submit --class
org.apache.spark.examples.sql.hive.SparkHiveExample
examples/jars/spark-examples_2.12-3.3.0.jar
* bin/spark-submit
examples/src/main/python/streaming/network_wordcount.py localhost 
* Tested some of the issues that blocked prior release candidates:
* bin/spark-sql -e 'SELECT (SELECT IF(x, 1, 0)) AS a FROM (SELECT true)
t(x) UNION SELECT 1 AS a;'
* bin/spark-sql -e "select date '2018-11-17' > 1"
* SPARK-39293 ArrayAggregate fix

Chris Nauroth


On Tue, Jun 7, 2022 at 1:30 PM Cheng Su  wrote:

> +1 (non-binding). Built and ran some internal test for Spark SQL.
>
>
>
> Thanks,
>
> Cheng Su
>
>
>
> *From: *L. C. Hsieh 
> *Date: *Tuesday, June 7, 2022 at 1:23 PM
> *To: *dev 
> *Subject: *Re: [VOTE] Release Spark 3.3.0 (RC5)
>
> +1
>
> Liang-Chi
>
> On Tue, Jun 7, 2022 at 1:03 PM Gengliang Wang  wrote:
> >
> > +1 (non-binding)
> >
> > Gengliang
> >
> > On Tue, Jun 7, 2022 at 12:24 PM Thomas Graves 
> wrote:
> >>
> >> +1
> >>
> >> Tom Graves
> >>
> >> On Sat, Jun 4, 2022 at 9:50 AM Maxim Gekk
> >>  wrote:
> >> >
> >> > Please vote on releasing the following candidate as Apache Spark
> version 3.3.0.
> >> >
> >> > The vote is open until 11:59pm Pacific time June 8th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 3.3.0
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see http://spark.apache.org/
> >> >
> >> > The tag to be voted on is v3.3.0-rc5 (commit
> 7cf29705272ab8e8c70e8885a3664ad8ae3cd5e9):
> >> > https://github.com/apache/spark/tree/v3.3.0-rc5
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-bin/
> >> >
> >> > Signatures used for Spark RCs can be found in this file:
> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1406
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc5-docs/
> >> >
> >> > The list of bug fixes going into 3.3.0 can be found at the following
> URL:
> >> > https://issues.apache.org/jira/projects/SPARK/versions/12350369
> >> >
> >> > This release is using the release script of the tag v3.3.0-rc5.
> >> >
> >> >
> >> > FAQ
> >> >
> >> > =
> >> > How can I help test this release?
> >> > =
> >> > If you are a Spark user, you can help us test this release by taking
> >> > an existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > If you're working in PySpark you can set up a virtual env and install
> >> > the current RC and see if anything important breaks, in the Java/Scala
> >> > you can add the staging repository to your projects resolvers and test
> >> > with the RC (make sure to clean up the artifact cache before/after so
> >> > you don't end up building with a out of date RC going forward).
> >> >
> >> > ===
> >> > What should happen to JIRA tickets still targeting 3.3.0?
> >> > ===
> >> > The current list of open tickets targeted at 3.3.0 can be found at:
> >> > https://issues.apache.org/jira/projects/SPARK  and search for
> "Target Version/s" = 3.3.0
> >> >
> >> > Committers should look at those and triage. Extremel

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-26 Thread Chris Nauroth
+1 (non-binding)

* Verified all checksums.
* Verified all signatures.
* Built from source, with multiple profiles, to full success, for Java 11
and Scala 2.13:
* build/mvn -Phadoop-3 -Phadoop-cloud -Phive-thriftserver -Pkubernetes
-Pscala-2.13 -Psparkr -Pyarn -DskipTests clean package
* Almost all unit tests passed. (Some tests related to LevelDB and RocksDB
failed in JNI initialization. If others aren't seeing this, then I probably
just need to work out an environment issue.)
* Ran several examples successfully:
* bin/spark-submit --class org.apache.spark.examples.SparkPi
examples/jars/spark-examples_2.12-3.3.0.jar
* bin/spark-submit --class
org.apache.spark.examples.sql.hive.SparkHiveExample
examples/jars/spark-examples_2.12-3.3.0.jar
* bin/spark-submit
examples/src/main/python/streaming/network_wordcount.py localhost 
* Tested some of the prior issues that blocked RC2:
* bin/spark-sql -e 'SELECT (SELECT IF(x, 1, 0)) AS a FROM (SELECT true)
t(x) UNION SELECT 1 AS a;'
* bin/spark-sql -e "select date '2018-11-17' > 1"

Chris Nauroth


On Wed, May 25, 2022 at 8:00 AM Sean Owen  wrote:

> +1 works for me as usual, with Java 8 + Scala 2.12, Java 11 + Scala 2.13.
>
> On Tue, May 24, 2022 at 12:14 PM Maxim Gekk
>  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time May 27th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc3 (commit
>> a7259279d07b302a51456adb13dc1e41a6fd06ed):
>> https://github.com/apache/spark/tree/v3.3.0-rc3
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1404
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc3-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc3.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>


Re: CVE-2021-38296: Apache Spark Key Negotiation Vulnerability - 2.4 Backport?

2022-04-14 Thread Chris Nauroth
Thanks for the quick reply, Sean!

Chris Nauroth


On Thu, Apr 14, 2022 at 10:15 AM Sean Owen  wrote:

> It does affect 2.4.x, yes. 2.4.x was EOL a while ago, so there wouldn't be
> a new release of 2.4.x in any event. It's recommended to update instead, at
> least to 3.1.3.
>
> On Thu, Apr 14, 2022 at 12:07 PM Chris Nauroth 
> wrote:
>
>> A fix for CVE-2021-38296 was committed and released in Apache Spark
>> 3.1.3. I'm curious, is the issue relevant to the 2.4 version line, and if
>> so, are there any plans for a backport?
>>
>> https://lists.apache.org/thread/70x8fw2gx3g9ty7yk0f2f1dlpqml2smd
>>
>> Chris Nauroth
>>
>


CVE-2021-38296: Apache Spark Key Negotiation Vulnerability - 2.4 Backport?

2022-04-14 Thread Chris Nauroth
A fix for CVE-2021-38296 was committed and released in Apache Spark 3.1.3.
I'm curious, is the issue relevant to the 2.4 version line, and if so, are
there any plans for a backport?

https://lists.apache.org/thread/70x8fw2gx3g9ty7yk0f2f1dlpqml2smd

Chris Nauroth