Hi, Please check if the following regression should be included: https://github.com/apache/spark/pull/31352
Thanks, Terry On Tue, Jan 26, 2021 at 7:54 AM Holden Karau <hol...@pigscanfly.ca> wrote: > If were ok waiting for it, I’d like to get > https://github.com/apache/spark/pull/31298 in as well (it’s not a > regression but it is a bug fix). > > On Tue, Jan 26, 2021 at 6:38 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> It looks like a cool one but it's a pretty big one and affects the plans >> considerably ... maybe it's best to avoid adding it into 3.1.1 in >> particular during the RC period if this isn't a clear regression that >> affects many users. >> >> 2021년 1월 26일 (화) 오후 11:23, Peter Toth <peter.t...@gmail.com>님이 작성: >> >>> Hey, >>> >>> Sorry for chiming in a bit late, but I would like to suggest my PR ( >>> https://github.com/apache/spark/pull/28885) for review and inclusion >>> into 3.1.1. >>> >>> Currently, invalid reuse reference nodes appear in many queries, causing >>> performance issues and incorrect explain plans. Now that >>> https://github.com/apache/spark/pull/31243 got merged these invalid >>> references can be easily found in many of our golden files on master: >>> https://github.com/apache/spark/pull/28885#issuecomment-767530441. >>> But the issue isn't master (3.2) specific, actually it has been there >>> since 3.0 when Dynamic Partition Pruning was added. >>> So it is not a regression from 3.0 to 3.1.1, but in some cases (like >>> TPCDS q23b) it is causing performance regression from 2.4 to 3.x. >>> >>> Thanks, >>> Peter >>> >>> On Tue, Jan 26, 2021 at 6:30 AM Hyukjin Kwon <gurwls...@gmail.com> >>> wrote: >>> >>>> Guys, I plan to make an RC as soon as we have no visible issues. I have >>>> merged a few correctness issues. There look: >>>> - https://github.com/apache/spark/pull/31319 waiting for a review (I >>>> will do it too soon). >>>> - https://github.com/apache/spark/pull/31336 >>>> - I know Max's investigating the perf regression one which hopefully >>>> will be fixed soon. >>>> >>>> Are there any more blockers or correctness issues? Please ping me or >>>> say it out here. >>>> I would like to avoid making an RC when there are clearly some issues >>>> to be fixed. >>>> If you're investigating something suspicious, that's fine too. It's >>>> better to make sure we're safe instead of rushing an RC without finishing >>>> the investigation. >>>> >>>> Thanks all. >>>> >>>> >>>> 2021년 1월 22일 (금) 오후 6:19, Hyukjin Kwon <gurwls...@gmail.com>님이 작성: >>>> >>>>> Sure, thanks guys. I'll start another RC after the fixes. Looks like >>>>> we're almost there. >>>>> >>>>> On Fri, 22 Jan 2021, 17:47 Wenchen Fan, <cloud0...@gmail.com> wrote: >>>>> >>>>>> BTW, there is a correctness bug being fixed at >>>>>> https://github.com/apache/spark/pull/30788 . It's not a regression, >>>>>> but the fix is very simple and it would be better to start the next RC >>>>>> after merging that fix. >>>>>> >>>>>> On Fri, Jan 22, 2021 at 3:54 PM Maxim Gekk <maxim.g...@databricks.com> >>>>>> wrote: >>>>>> >>>>>>> Also I am investigating a performance regression in some TPC-DS >>>>>>> queries (q88 for instance) that is caused by a recent commit in 3.1, >>>>>>> highly >>>>>>> likely in the period from 19th November, 2020 to 18th December, 2020. >>>>>>> >>>>>>> Maxim Gekk >>>>>>> >>>>>>> Software Engineer >>>>>>> >>>>>>> Databricks, Inc. >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 22, 2021 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> -1 as I just found a regression in 3.1. A self-join query works >>>>>>>> well in 3.0 but fails in 3.1. It's being fixed at >>>>>>>> https://github.com/apache/spark/pull/31287 >>>>>>>> >>>>>>>> On Fri, Jan 22, 2021 at 4:34 AM Tom Graves >>>>>>>> <tgraves...@yahoo.com.invalid> wrote: >>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> built from tarball, verified sha and regular CI and tests all pass. >>>>>>>>> >>>>>>>>> Tom >>>>>>>>> >>>>>>>>> On Monday, January 18, 2021, 06:06:42 AM CST, Hyukjin Kwon < >>>>>>>>> gurwls...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>>> version 3.1.1. >>>>>>>>> >>>>>>>>> The vote is open until January 22nd 4PM PST and passes if a >>>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>>>>>>>> >>>>>>>>> [ ] +1 Release this package as Apache Spark 3.1.0 >>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>> >>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>> http://spark.apache.org/ >>>>>>>>> >>>>>>>>> The tag to be voted on is v3.1.1-rc1 (commit >>>>>>>>> 53fe365edb948d0e05a5ccb62f349cd9fcb4bb5d): >>>>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc1 >>>>>>>>> >>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>> found at: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/ >>>>>>>>> >>>>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>>>> >>>>>>>>> The staging repository for this release can be found at: >>>>>>>>> >>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1364 >>>>>>>>> >>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-docs/ >>>>>>>>> >>>>>>>>> The list of bug fixes going into 3.1.1 can be found at the >>>>>>>>> following URL: >>>>>>>>> https://s.apache.org/41kf2 >>>>>>>>> >>>>>>>>> This release is using the release script of the tag v3.1.1-rc1. >>>>>>>>> >>>>>>>>> FAQ >>>>>>>>> >>>>>>>>> =================== >>>>>>>>> What happened to 3.1.0? >>>>>>>>> =================== >>>>>>>>> >>>>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation, >>>>>>>>> and it was discussed and decided to skip 3.1.0. >>>>>>>>> Please see >>>>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html >>>>>>>>> for more details. >>>>>>>>> >>>>>>>>> ========================= >>>>>>>>> How can I help test this release? >>>>>>>>> ========================= >>>>>>>>> >>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>> taking >>>>>>>>> an existing Spark workload and running on this release candidate, >>>>>>>>> then >>>>>>>>> reporting any regressions. >>>>>>>>> >>>>>>>>> If you're working in PySpark you can set up a virtual env and >>>>>>>>> install >>>>>>>>> the current RC via "pip install >>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/pyspark-3.1.1.tar.gz >>>>>>>>> " >>>>>>>>> and see if anything important breaks. >>>>>>>>> In the Java/Scala, you can add the staging repository to your >>>>>>>>> projects resolvers and test >>>>>>>>> with the RC (make sure to clean up the artifact cache before/after >>>>>>>>> so >>>>>>>>> you don't end up building with an out of date RC going forward). >>>>>>>>> >>>>>>>>> =========================================== >>>>>>>>> What should happen to JIRA tickets still targeting 3.1.1? >>>>>>>>> =========================================== >>>>>>>>> >>>>>>>>> The current list of open tickets targeted at 3.1.1 can be found at: >>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for >>>>>>>>> "Target Version/s" = 3.1.1 >>>>>>>>> >>>>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>>>> fixes, documentation, and API tweaks that impact compatibility >>>>>>>>> should >>>>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>>>> appropriate release. >>>>>>>>> >>>>>>>>> ================== >>>>>>>>> But my bug isn't fixed? >>>>>>>>> ================== >>>>>>>>> >>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>> release unless the bug in question is a regression from the >>>>>>>>> previous >>>>>>>>> release. That being said, if there is something which is a >>>>>>>>> regression >>>>>>>>> that has not been correctly targeted please ping me or a committer >>>>>>>>> to >>>>>>>>> help target the issue. >>>>>>>>> >>>>>>>>> -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >