If were ok waiting for it, I’d like to get https://github.com/apache/spark/pull/31298 in as well (it’s not a regression but it is a bug fix).
On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > It looks like a cool one but it's a pretty big one and affects the plans > considerably ... maybe it's best to avoid adding it into 3.1.1 in > particular during the RC period if this isn't a clear regression that > affects many users. > > 2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.t...@gmail.com>님이 작성: > >> Hey, >> >> Sorry for chiming in a bit late, but I would like to suggest my PR ( >> https://github.com/apache/spark/pull/28885) for review and inclusion >> into 3.1.1. >> >> Currently, invalid reuse reference nodes appear in many queries, causing >> performance issues and incorrect explain plans. Now that >> https://github.com/apache/spark/pull/31243 got merged these invalid >> references can be easily found in many of our golden files on master: >> https://github.com/apache/spark/pull/28885#issuecomment-767530441. >> But the issue isn't master (3.2) specific, actually it has been there >> since 3.0 when Dynamic Partition Pruning was added. >> So it is not a regression from 3.0 to 3.1.1, but in some cases (like >> TPCDS q23b) it is causing performance regression from 2.4 to 3.x. >> >> Thanks, >> Peter >> >> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Guys, I plan to make an RC as soon as we have no visible issues. I have >>> merged a few correctness issues. There look: >>> - https://github.com/apache/spark/pull/31319 waiting for a review (I >>> will do it too soon). >>> - https://github.com/apache/spark/pull/31336 >>> - I know Max's investigating the perf regression one which hopefully >>> will be fixed soon. >>> >>> Are there any more blockers or correctness issues? Please ping me or say >>> it out here. >>> I would like to avoid making an RC when there are clearly some issues to >>> be fixed. >>> If you're investigating something suspicious, that's fine too. It's >>> better to make sure we're safe instead of rushing an RC without finishing >>> the investigation. >>> >>> Thanks all. >>> >>> >>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>> >>>> Sure, thanks guys. I'll start another RC after the fixes. Looks like >>>> we're almost there. >>>> >>>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote: >>>> >>>>> BTW, there is a correctness bug being fixed at >>>>> https://github.com/apache/spark/pull/30788 . It's not a regression, >>>>> but the fix is very simple and it would be better to start the next RC >>>>> after merging that fix. >>>>> >>>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.g...@databricks.com> >>>>> wrote: >>>>> >>>>>> Also I am investigating a performance regression in some TPC-DS >>>>>> queries (q88 for instance) that is caused by a recent commit in 3.1, >>>>>> highly >>>>>> likely in the period from 19th November, 2020 to 18th December, 2020. >>>>>> >>>>>> Maxim Gekk >>>>>> >>>>>> Software Engineer >>>>>> >>>>>> Databricks, Inc. >>>>>> >>>>>> >>>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> -1 as I just found a regression in 3.1. A self-join query works well >>>>>>> in 3.0 but fails in 3.1. It's being fixed at >>>>>>> https://github.com/apache/spark/pull/31287 >>>>>>> >>>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves >>>>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> built from tarball, verified sha and regular CI and tests all pass. >>>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon < >>>>>>>> gurwls...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>> version 3.1.1. >>>>>>>> >>>>>>>> The vote is open until January 22nd 4PM PST and passes if a >>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>>> >>>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0 >>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>> >>>>>>>> To learn more about Apache Spark, please see >>>>>>>> http://spark.apache.org/ >>>>>>>> >>>>>>>> The tag to be voted on is v3.1.1-rc1 (commit >>>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d): >>>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1 >>>>>>>> >>>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>>> at: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/ >>>>>>>> >>>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>> >>>>>>>> The staging repository for this release can be found at: >>>>>>>> >>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364 >>>>>>>> >>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/ >>>>>>>> >>>>>>>> The list of bug fixes going into 3.1.1 can be found at the >>>>>>>> following URL: >>>>>>>> https://s.apache.org/41kf2 >>>>>>>> >>>>>>>> This release is using the release script of the tag v3.1.1-rc1. >>>>>>>> >>>>>>>> FAQ >>>>>>>> >>>>>>>> =================== >>>>>>>> What happened to 3.1.0? >>>>>>>> =================== >>>>>>>> >>>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation, >>>>>>>> and it was discussed and decided to skip 3.1.0. >>>>>>>> Please see >>>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html >>>>>>>> for more details. >>>>>>>> >>>>>>>> ========================= >>>>>>>> How can I help test this release? >>>>>>>> ========================= >>>>>>>> >>>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>>> an existing Spark workload and running on this release candidate, >>>>>>>> then >>>>>>>> reporting any regressions. >>>>>>>> >>>>>>>> If you're working in PySpark you can set up a virtual env and >>>>>>>> install >>>>>>>> the current RC via "pip install >>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz >>>>>>>> " >>>>>>>> and see if anything important breaks. >>>>>>>> In the Java/Scala, you can add the staging repository to your >>>>>>>> projects resolvers and test >>>>>>>> with the RC (make sure to clean up the artifact cache before/after >>>>>>>> so >>>>>>>> you don't end up building with an out of date RC going forward). >>>>>>>> >>>>>>>> =========================================== >>>>>>>> What should happen to JIRA tickets still targeting 3.1.1? >>>>>>>> =========================================== >>>>>>>> >>>>>>>> The current list of open tickets targeted at 3.1.1 can be found at: >>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>>> "Target Version/s" = 3.1.1 >>>>>>>> >>>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>>> fixes, documentation, and API tweaks that impact compatibility >>>>>>>> should >>>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>>> appropriate release. >>>>>>>> >>>>>>>> ================== >>>>>>>> But my bug isn't fixed? >>>>>>>> ================== >>>>>>>> >>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>> release unless the bug in question is a regression from the previous >>>>>>>> release. That being said, if there is something which is a >>>>>>>> regression >>>>>>>> that has not been correctly targeted please ping me or a committer >>>>>>>> to >>>>>>>> help target the issue. >>>>>>>> >>>>>>>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau