Hi All, Considering there are no notable release blockers now, I am going to cut the RC2 for Apache Spark 3.4.0.
On Fri, Feb 24, 2023 at 10:44 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > Yes we should fix. I will take a look > > On Thu, 23 Feb 2023 at 07:32, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > >> Thanks! I was wondering about that ClientE2ETestSuite failure today, so >> I'm glad to know that it's also being experienced by others. >> >> On a similar note, I am experiencing the following error when running the >> Python tests with Python 3.7: >> >> + ./python/run-tests --python-executables=python3 >> Running PySpark tests. Output is in >> /home/ec2-user/spark/python/unit-tests.log >> Will test against the following Python executables: ['python3'] >> Will test the following Python modules: ['pyspark-connect', >> 'pyspark-core', 'pyspark-errors', 'pyspark-ml', 'pyspark-mllib', >> 'pyspark-pandas', 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', >> 'pyspark-streaming'] >> python3 python_implementation is CPython >> python3 version is: Python 3.7.16 >> Starting test(python3): pyspark.ml.tests.test_feature (temp output: >> /home/ec2-user/spark/python/target/8ca9ab1a-05cc-4845-bf89-30d9001510bc/python3__pyspark.ml.tests.test_feature__kg6sseie.log) >> Starting test(python3): pyspark.ml.tests.test_base (temp output: >> /home/ec2-user/spark/python/target/f2264f3b-6b26-4e61-9452-8d6ddd7eb002/python3__pyspark.ml.tests.test_base__0902zf9_.log) >> Starting test(python3): pyspark.ml.tests.test_algorithms (temp output: >> /home/ec2-user/spark/python/target/d1dc4e07-e58c-4c03-abe5-09d8fab22e6a/python3__pyspark.ml.tests.test_algorithms__lh3wb2u8.log) >> Starting test(python3): pyspark.ml.tests.test_evaluation (temp output: >> /home/ec2-user/spark/python/target/3f42dc79-c945-4cf2-a1eb-83e72b40a9ee/python3__pyspark.ml.tests.test_evaluation__89idc7fa.log) >> Finished test(python3): pyspark.ml.tests.test_base (16s) >> Starting test(python3): pyspark.ml.tests.test_functions (temp output: >> /home/ec2-user/spark/python/target/5a3b90f0-216b-4edd-9d15-6619d3e03300/python3__pyspark.ml.tests.test_functions__g5u1290s.log) >> Traceback (most recent call last): >> File "/usr/lib64/python3.7/runpy.py", line 193, in _run_module_as_main >> "__main__", mod_spec) >> File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code >> exec(code, run_globals) >> File "/home/ec2-user/spark/python/pyspark/ml/tests/test_functions.py", >> line 21, in <module> >> from pyspark.ml.functions import predict_batch_udf >> File "/home/ec2-user/spark/python/pyspark/ml/functions.py", line 38, in >> <module> >> from typing import Any, Callable, Iterator, List, Mapping, Protocol, >> TYPE_CHECKING, Tuple, Union >> ImportError: cannot import name 'Protocol' from 'typing' >> (/usr/lib64/python3.7/typing.py) >> Had test failures in pyspark.ml.tests.test_functions with python3; see >> logs. >> >> I know we should move on to a newer version of Python, but isn't Python >> 3.7 still officially supported? >> >> Thank you, >> Jonathan Kelly >> >> On Wed, Feb 22, 2023 at 1:47 PM Herman van Hovell >> <her...@databricks.com.invalid> wrote: >> >>> Hi All, >>> >>> Thanks for testing the 3.4.0 RC! I apologize for the maven testing >>> failures for the Spark Connect Scala Client. We will try to get those >>> sorted as soon as possible. >>> >>> This is an artifact of having multiple build systems, and only running >>> CI for one (SBT). That, however, is a debate for another day :)... >>> >>> Cheers, >>> Herman >>> >>> On Wed, Feb 22, 2023 at 5:32 PM Bjørn Jørgensen < >>> bjornjorgen...@gmail.com> wrote: >>> >>>> ./build/mvn clean package >>>> >>>> I'm using ubuntu rolling, python 3.11 openjdk 17 >>>> >>>> CompatibilitySuite: >>>> - compatibility MiMa tests *** FAILED *** >>>> java.lang.AssertionError: assertion failed: Failed to find the jar >>>> inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target >>>> at scala.Predef$.assert(Predef.scala:223) >>>> at >>>> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69) >>>> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) >>>> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) >>>> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) >>>> at org.scalatest.Transformer.apply(Transformer.scala:22) >>>> at org.scalatest.Transformer.apply(Transformer.scala:20) >>>> ... >>>> - compatibility API tests: Dataset *** FAILED *** >>>> java.lang.AssertionError: assertion failed: Failed to find the jar >>>> inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target >>>> at scala.Predef$.assert(Predef.scala:223) >>>> at >>>> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53) >>>> at >>>> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110) >>>> at >>>> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) >>>> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) >>>> at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) >>>> at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) >>>> at org.scalatest.Transformer.apply(Transformer.scala:22) >>>> ... >>>> SparkConnectClientSuite: >>>> - Placeholder test: Create SparkConnectClient >>>> - Test connection >>>> - Test connection string >>>> - Check URI: sc://host, isCorrect: true >>>> - Check URI: sc://localhost/, isCorrect: true >>>> - Check URI: sc://localhost:1234/, isCorrect: true >>>> - Check URI: sc://localhost/;, isCorrect: true >>>> - Check URI: sc://host:123, isCorrect: true >>>> - Check URI: sc://host:123/;user_id=a94, isCorrect: true >>>> - Check URI: scc://host:12, isCorrect: false >>>> - Check URI: http://host, isCorrect: false >>>> - Check URI: sc:/host:1234/path, isCorrect: false >>>> - Check URI: sc://host/path, isCorrect: false >>>> - Check URI: sc://host/;parm1;param2, isCorrect: false >>>> - Check URI: sc://host:123;user_id=a94, isCorrect: false >>>> - Check URI: sc:///user_id=123, isCorrect: false >>>> - Check URI: sc://host:-4, isCorrect: false >>>> - Check URI: sc://:123/, isCorrect: false >>>> - Non user-id parameters throw unsupported errors >>>> DatasetSuite: >>>> - limit >>>> - select >>>> - filter >>>> - write >>>> UserDefinedFunctionSuite: >>>> - udf and encoder serialization >>>> Run completed in 21 seconds, 944 milliseconds. >>>> Total number of tests run: 389 >>>> Suites: completed 10, aborted 0 >>>> Tests: succeeded 386, failed 3, canceled 0, ignored 0, pending 0 >>>> *** 3 TESTS FAILED *** >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] Reactor Summary for Spark Project Parent POM 3.4.0: >>>> [INFO] >>>> [INFO] Spark Project Parent POM ........................... SUCCESS [ >>>> 47.096 s] >>>> [INFO] Spark Project Tags ................................. SUCCESS [ >>>> 14.759 s] >>>> [INFO] Spark Project Sketch ............................... SUCCESS [ >>>> 21.628 s] >>>> [INFO] Spark Project Local DB ............................. SUCCESS [ >>>> 20.311 s] >>>> [INFO] Spark Project Networking ........................... SUCCESS >>>> [01:07 min] >>>> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ >>>> 15.921 s] >>>> [INFO] Spark Project Unsafe ............................... SUCCESS [ >>>> 16.020 s] >>>> [INFO] Spark Project Launcher ............................. SUCCESS [ >>>> 10.873 s] >>>> [INFO] Spark Project Core ................................. SUCCESS >>>> [37:10 min] >>>> [INFO] Spark Project ML Local Library ..................... SUCCESS [ >>>> 40.841 s] >>>> [INFO] Spark Project GraphX ............................... SUCCESS >>>> [02:39 min] >>>> [INFO] Spark Project Streaming ............................ SUCCESS >>>> [05:53 min] >>>> [INFO] Spark Project Catalyst ............................. SUCCESS >>>> [11:22 min] >>>> [INFO] Spark Project SQL .................................. SUCCESS [ >>>> 02:27 h] >>>> [INFO] Spark Project ML Library ........................... SUCCESS >>>> [22:45 min] >>>> [INFO] Spark Project Tools ................................ SUCCESS [ >>>> 7.263 s] >>>> [INFO] Spark Project Hive ................................. SUCCESS [ >>>> 01:21 h] >>>> [INFO] Spark Project REPL ................................. SUCCESS >>>> [02:07 min] >>>> [INFO] Spark Project Assembly ............................. SUCCESS [ >>>> 11.704 s] >>>> [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ >>>> 26.748 s] >>>> [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS >>>> [01:44 min] >>>> [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS >>>> [33:27 min] >>>> [INFO] Spark Project Examples ............................. SUCCESS >>>> [01:17 min] >>>> [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ >>>> 27.292 s] >>>> [INFO] Spark Avro ......................................... SUCCESS >>>> [02:18 min] >>>> [INFO] Spark Project Connect Common ....................... SUCCESS [ >>>> 43.728 s] >>>> [INFO] Spark Project Connect Server ....................... SUCCESS >>>> [01:19 min] >>>> [INFO] Spark Project Connect Client ....................... FAILURE [ >>>> 53.524 s] >>>> [INFO] Spark Protobuf ..................................... SKIPPED >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] BUILD FAILURE >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [INFO] Total time: 05:58 h >>>> [INFO] Finished at: 2023-02-22T22:28:38+01:00 >>>> [INFO] >>>> ------------------------------------------------------------------------ >>>> [ERROR] Failed to execute goal >>>> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project >>>> spark-connect-client-jvm_2.12: There are test failures -> [Help 1] >>>> >>>> ons. 22. feb. 2023 kl. 21:41 skrev Mridul Muralidharan < >>>> mri...@gmail.com>: >>>> >>>>> >>>>> Signatures, digests, etc check out fine - thanks for updating them ! >>>>> Checked out tag and build/tested with -Phive -Pyarn -Pmesos >>>>> -Pkubernetes >>>>> >>>>> >>>>> The test ClientE2ETestSuite.simple udf failed [1] in "Connect Client " >>>>> module ... yet to test "Spark Protobuf" module due to the failure. >>>>> >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> [1] >>>>> >>>>> - simple udf *** FAILED *** >>>>> >>>>> io.grpc.StatusRuntimeException: INTERNAL: >>>>> org.apache.spark.sql.ClientE2ETestSuite >>>>> >>>>> at io.grpc.Status.asRuntimeException(Status.java:535) >>>>> >>>>> at >>>>> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) >>>>> >>>>> at org.apache.spark.sql.connect.client.SparkResult.org >>>>> $apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:50) >>>>> >>>>> at >>>>> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:95) >>>>> >>>>> at >>>>> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:112) >>>>> >>>>> at >>>>> org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2037) >>>>> >>>>> at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2267) >>>>> >>>>> at org.apache.spark.sql.Dataset.collect(Dataset.scala:2036) >>>>> >>>>> at >>>>> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$5(ClientE2ETestSuite.scala:65) >>>>> >>>>> at >>>>> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) >>>>> >>>>> ... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Feb 22, 2023 at 2:07 AM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> Thanks Xinrong ! >>>>>> The signature verifications are fine now ... will continue with >>>>>> testing the release. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Mridul >>>>>> >>>>>> >>>>>> On Wed, Feb 22, 2023 at 1:27 AM Xinrong Meng < >>>>>> xinrong.apa...@gmail.com> wrote: >>>>>> >>>>>>> Hi Mridul, >>>>>>> >>>>>>> Would you please try that again? It should work now. >>>>>>> >>>>>>> On Wed, Feb 22, 2023 at 2:04 PM Mridul Muralidharan < >>>>>>> mri...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Xinrong, >>>>>>>> >>>>>>>> Was it signed with the same key as present in KEYS [1] ? >>>>>>>> I am seeing errors with gpg when validating. For example: >>>>>>>> >>>>>>>> >>>>>>>> $ gpg --verify pyspark-3.4.0.tar.gz.asc >>>>>>>> >>>>>>>> gpg: assuming signed data in 'pyspark-3.4.0.tar.gz' >>>>>>>> >>>>>>>> gpg: Signature made Tue 21 Feb 2023 05:56:05 AM CST >>>>>>>> >>>>>>>> gpg: using RSA key >>>>>>>> CC68B3D16FE33A766705160BA7E57908C7A4E1B1 >>>>>>>> >>>>>>>> gpg: issuer "xinr...@apache.org" >>>>>>>> >>>>>>>> gpg: Can't check signature: No public key >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Mridul >>>>>>>> >>>>>>>> [1] https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 21, 2023 at 10:36 PM Xinrong Meng < >>>>>>>> xinrong.apa...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>>> version 3.4.0. >>>>>>>>> >>>>>>>>> The vote is open until 11:59pm Pacific time *February 27th* and >>>>>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 >>>>>>>>> votes. >>>>>>>>> >>>>>>>>> [ ] +1 Release this package as Apache Spark 3.4.0 >>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>> >>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>> http://spark.apache.org/ >>>>>>>>> >>>>>>>>> The tag to be voted on is *v3.4.0-rc1* (commit >>>>>>>>> e2484f626bb338274665a49078b528365ea18c3b): >>>>>>>>> https://github.com/apache/spark/tree/v3.4.0-rc1 >>>>>>>>> >>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>> found at: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-bin/ >>>>>>>>> >>>>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>>> >>>>>>>>> The staging repository for this release can be found at: >>>>>>>>> >>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1435 >>>>>>>>> >>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc1-docs/ >>>>>>>>> >>>>>>>>> The list of bug fixes going into 3.4.0 can be found at the >>>>>>>>> following URL: >>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12351465 >>>>>>>>> >>>>>>>>> This release is using the release script of the tag v3.4.0-rc1. >>>>>>>>> >>>>>>>>> >>>>>>>>> FAQ >>>>>>>>> >>>>>>>>> ========================= >>>>>>>>> How can I help test this release? >>>>>>>>> ========================= >>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>> taking >>>>>>>>> an existing Spark workload and running on this release candidate, >>>>>>>>> then >>>>>>>>> reporting any regressions. >>>>>>>>> >>>>>>>>> If you're working in PySpark you can set up a virtual env and >>>>>>>>> install >>>>>>>>> the current RC and see if anything important breaks, in the >>>>>>>>> Java/Scala >>>>>>>>> you can add the staging repository to your projects resolvers and >>>>>>>>> test >>>>>>>>> with the RC (make sure to clean up the artifact cache before/after >>>>>>>>> so >>>>>>>>> you don't end up building with a out of date RC going forward). >>>>>>>>> >>>>>>>>> =========================================== >>>>>>>>> What should happen to JIRA tickets still targeting 3.4.0? >>>>>>>>> =========================================== >>>>>>>>> The current list of open tickets targeted at 3.4.0 can be found at: >>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>>>> "Target Version/s" = 3.4.0 >>>>>>>>> >>>>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>>>> fixes, documentation, and API tweaks that impact compatibility >>>>>>>>> should >>>>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>>>> appropriate release. >>>>>>>>> >>>>>>>>> ================== >>>>>>>>> But my bug isn't fixed? >>>>>>>>> ================== >>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>> release unless the bug in question is a regression from the >>>>>>>>> previous >>>>>>>>> release. That being said, if there is something which is a >>>>>>>>> regression >>>>>>>>> that has not been correctly targeted please ping me or a committer >>>>>>>>> to >>>>>>>>> help target the issue. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Xinrong Meng >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> Bjørn Jørgensen >>>> Vestre Aspehaug 4, 6010 Ålesund >>>> Norge >>>> >>>> +47 480 94 297 >>>> >>>