Yeah let's get that fix in, but it seems to be a minor test only issue so should not block release.
On Fri, Feb 16, 2024, 9:30 AM yangjie01 <yangji...@baidu.com> wrote: > Very sorry. When I was fixing `SPARK-45242 ( > https://github.com/apache/spark/pull/43594)` > <https://github.com/apache/spark/pull/43594)>, I noticed that its > `Affects Version` and `Fix Version` of SPARK-45242 were both 4.0, and I > didn't realize that it had also been merged into branch-3.5, so I didn't > advocate for SPARK-45357 to be backported to branch-3.5. > > > > As far as I know, the condition to trigger this test failure is: when > using Maven to test the `connect` module, if `sparkTestRelation` in > `SparkConnectProtoSuite` is not the first `DataFrame` to be initialized, > then the `id` of `sparkTestRelation` will no longer be 0. So, I think this > is indeed related to the order in which Maven executes the test cases in > the `connect` module. > > > > I have submitted a backport PR > <https://github.com/apache/spark/pull/45141> to branch-3.5, and if > necessary, we can merge it to fix this test issue. > > > > Jie Yang > > > > *发件人**: *Jungtaek Lim <kabhwan.opensou...@gmail.com> > *日期**: *2024年2月16日 星期五 22:15 > *收件人**: *Sean Owen <sro...@gmail.com>, Rui Wang <amaliu...@apache.org> > *抄送**: *dev <dev@spark.apache.org> > *主题**: *Re: [VOTE] Release Apache Spark 3.5.1 (RC2) > > > > I traced back relevant changes and got a sense of what happened. > > > > Yangjie figured out the issue via link > <https://mailshield.baidu.com/check?q=8dOSfwXDFpe5HSp%2b%2bgCPsNQ52B7S7TAFG56Vj3tiFgMkCyOrQEGbg03AVWDX5bwwyIW7sZx3JZox3w8Jz1iw%2bPjaOZYmLWn2>. > It's a tricky issue according to the comments from Yangjie - the test is > dependent on ordering of execution for test suites. He said it does not > fail in sbt, hence CI build couldn't catch it. > > He fixed it via link > <https://mailshield.baidu.com/check?q=ojK3dg%2fDFf3xmQ8SPzsIou3EKaE1ZePctdB%2fUzhWmewnZb5chnQM1%2f8D1JDJnkxF>, > but we missed that the offending commit was also ported back to 3.5 as > well, hence the fix wasn't ported back to 3.5. > > > > Surprisingly, I can't reproduce locally even with maven. In my attempt to > reproduce, SparkConnectProtoSuite was executed at > third, SparkConnectStreamingQueryCacheSuite, and ExecuteEventsManagerSuite, > and then SparkConnectProtoSuite. Maybe very specific to the environment, > not just maven? My env: MBP M1 pro chip, MacOS 14.3.1, Openjdk 17.0.9. I > used build/mvn (Maven 3.8.8). > > > > I'm not 100% sure this is something we should fail the release as it's a > test only and sounds very environment dependent, but I'll respect your call > on vote. > > > > Btw, looks like Rui also made a relevant fix via link > <https://mailshield.baidu.com/check?q=TUbVzroxG%2fbi2P4qN0kbggzXuPzSN%2bKDoUFGhS9xMet8aXVw6EH0rMr1MKJqp2E2> > (not > to fix the failing test but to fix other issues), but this also wasn't > ported back to 3.5. @Rui Wang <amaliu...@apache.org> Do you think this is > a regression issue and warrants a new RC? > > > > > > On Fri, Feb 16, 2024 at 11:38 AM Sean Owen <sro...@gmail.com> wrote: > > Is anyone seeing this Spark Connect test failure? then again, I have some > weird issue with this env that always fails 1 or 2 tests that nobody else > can replicate. > > > > - Test observe *** FAILED *** > == FAIL: Plans do not match === > !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS > max_val#0, sum(id#0) AS sum(id)#0L], 0 CollectMetrics my_metric, > [min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L], > 44 > +- LocalRelation <empty>, [id#0, name#0] > +- LocalRelation <empty>, [id#0, name#0] > (PlanTest.scala:179) > > > > On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > > DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured > out doc generation issue after tagging RC1. > > > > Please vote on releasing the following candidate as Apache Spark version > 3.5.1. > > The vote is open until February 18th 9AM (PST) and passes if a majority +1 > PMC votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 3.5.1 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see https://spark.apache.org/ > <https://mailshield.baidu.com/check?q=iR6md5rYrz%2bpTPJlEXXlR6NN3aGjunZT0DADO3Pcgs0%3d> > > The tag to be voted on is v3.5.1-rc2 (commit > fd86f85e181fc2dc0f50a096855acf83a6cc5d9c): > https://github.com/apache/spark/tree/v3.5.1-rc2 > <https://mailshield.baidu.com/check?q=BMfFodF3wXGjeH1b9pbW8V4xeWam1vqNNCMtg1lcpC0d4WtLLiIr8UPiFKSwNMjbEy0AJw%3d%3d> > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/ > <https://mailshield.baidu.com/check?q=GisJJtraQY1N6Eyahj4wgpwh0wps%2bZC4JtMrCvefk0scRi8wuiCglswMgKTAct5KKjhc%2fw%2f2NWCY4YCv2NIWVg%3d%3d> > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > <https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d> > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1452/ > <https://mailshield.baidu.com/check?q=buXpvEpH6X6T3RyvYe2VQXDD5HPLWSOBI0hXYHpxkBXBL%2fNC9HFVp0G4wysilGp6L%2fsWBxhLMf%2fMM49FKQGLLLRk9qhtZZKn7aRvpA%3d%3d> > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/ > <https://mailshield.baidu.com/check?q=Wsh6KVSzVutMi1gwfF3Ssjy7t%2fs%2bXRvROyK0j2iIKRoBUNFfgMDWcoa56dn4otQsMMKWJTXpiWBjs5MlYb3FMzrn0Ew%3d> > > The list of bug fixes going into 3.5.1 can be found at the following URL: > https://issues.apache.org/jira/projects/SPARK/versions/12353495 > <https://mailshield.baidu.com/check?q=kyejAwWf%2fvrHE5t0mqT6o4PEEi9Z4hr1JA5CjnkW%2fBpSavBxI95Jj7GEoLSvfDxUhKsrPUg8ex%2fhmPshmWKR%2fmZyktY%3d> > > FAQ > > ========================= > How can I help test this release? > ========================= > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC via "pip install > https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz > <https://mailshield.baidu.com/check?q=ELhy2kh7hectlW5w04ynVn%2f%2b6m5VbFf74gdy5r7c2c3%2bcjqCYCTTnHH2RBO4f4KQDpGxVe8epjVDicg7wr9U%2bTROX0Y8%2fmZjZ2ZMVzcLUWz%2flmGL> > " > and see if anything important breaks. > In the Java/Scala, you can add the staging repository to your projects > resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward). > > =========================================== > What should happen to JIRA tickets still targeting 3.5.1? > =========================================== > > The current list of open tickets targeted at 3.5.1 can be found at: > https://issues.apache.org/jira/projects/SPARK > <https://mailshield.baidu.com/check?q=4UUpJqq41y71Gnuj0qTUYo6hTjqzT7oytN6x%2fvgC5XUtQUC8MfJ77tj7K70O%2f1QMmNoa1A%3d%3d> > and search for "Target Version/s" = 3.5.1 > > Committers should look at those and triage. Extremely important bug > fixes, documentation, and API tweaks that impact compatibility should > be worked on immediately. Everything else please retarget to an > appropriate release. > > ================== > But my bug isn't fixed? > ================== > > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. > >