I confirmed that Q17 and Q39a/b have matching results between Spark 3.0 and
3.1 after enabling spark.sql.legacy.statisticalAggregate. The result
changes are expected. For more details, you can read the PR
https://github.com/apache/spark/pull/29983/ Also, the result of Q18 is
affected by the overflow checking in Spark. These issues exist in all the
releases. We will continue to improve our ANSI mode and fix them in the
upcoming releases.

Thus, I change my vote from -1 to +1.

As Ismael suggested, we can add some Github Actions to validate the TPC-DS
and TPC-H results for small scale datasets.

Cheers,

Xiao



Ismaël Mejía <ieme...@gmail.com> 于2021年2月25日周四 下午12:16写道:

> Since the TPC-DS performance tests are one of the main validation sources
> for regressions on Spark releases maybe it is time to automate the query
> outputs validation to find correctness issues eagerly (it would be also
> nice to validate the performance regressions but correctness >>>
> performance).
>
> This has been a long standing open issue [1] that is probably worth to
> address and it seems that automating this via Github Actions could be
> relatively straight-forward.
>
> [1] https://github.com/databricks/spark-sql-perf/issues/184
>
>
> On Wed, Feb 24, 2021 at 8:15 PM Reynold Xin <r...@databricks.com> wrote:
>
>> +1 Correctness issues are serious!
>>
>>
>> On Wed, Feb 24, 2021 at 11:08 AM, Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>> That is indeed cause for concern.
>>> +1 on extending the voting deadline until we finish investigation of
>>> this.
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Wed, Feb 24, 2021 at 12:55 PM Xiao Li <gatorsm...@gmail.com> wrote:
>>>
>>>> -1 Could we extend the voting deadline?
>>>>
>>>> A few TPC-DS queries (q17, q18, q39a, q39b) are returning different
>>>> results between Spark 3.0 and Spark 3.1. We need a few more days to
>>>> understand whether these changes are expected.
>>>>
>>>> Xiao
>>>>
>>>>
>>>> Mridul Muralidharan <mri...@gmail.com> 于2021年2月24日周三 上午10:41写道:
>>>>
>>>>>
>>>>> Sounds good, thanks for clarifying Hyukjin !
>>>>> +1 on release.
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Wed, Feb 24, 2021 at 2:46 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I remember HiveExternalCatalogVersionsSuite was flaky for a while
>>>>>> which is fixed in
>>>>>> https://github.com/apache/spark/commit/0d5d248bdc4cdc71627162a3d20c42ad19f24ef4
>>>>>> and .. KafkaDelegationTokenSuite is flaky (
>>>>>> https://issues.apache.org/jira/browse/SPARK-31250).
>>>>>>
>>>>>> 2021년 2월 24일 (수) 오후 5:19, Mridul Muralidharan <mri...@gmail.com>님이
>>>>>> 작성:
>>>>>>
>>>>>>>
>>>>>>> Signatures, digests, etc check out fine.
>>>>>>> Checked out tag and build/tested with -Pyarn -Phadoop-2.7 -Phive
>>>>>>> -Phive-thriftserver -Pmesos -Pkubernetes
>>>>>>>
>>>>>>> I keep getting test failures with
>>>>>>> * org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite
>>>>>>> * org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.
>>>>>>> (Note: I remove $HOME/.m2 and $HOME/.iv2 paths before build)
>>>>>>>
>>>>>>> Removing these suites gets the build through though - does anyone
>>>>>>> have suggestions on how to fix it ? I did not face this with RC1.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mridul
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 22, 2021 at 12:57 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>>> version 3.1.1.
>>>>>>>>
>>>>>>>> The vote is open until February 24th 11PM PST and passes if a
>>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>>>
>>>>>>>> [ ] +1 Release this package as Apache Spark 3.1.1
>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>
>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>> http://spark.apache.org/
>>>>>>>>
>>>>>>>> The tag to be voted on is v3.1.1-rc3 (commit
>>>>>>>> 1d550c4e90275ab418b9161925049239227f3dc9):
>>>>>>>> https://github.com/apache/spark/tree/v3.1.1-rc3
>>>>>>>>
>>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>>> at:
>>>>>>>> <https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc1-bin/>
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/
>>>>>>>>
>>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>>
>>>>>>>> The staging repository for this release can be found at:
>>>>>>>>
>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1367
>>>>>>>>
>>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-docs/
>>>>>>>>
>>>>>>>> The list of bug fixes going into 3.1.1 can be found at the
>>>>>>>> following URL:
>>>>>>>> https://s.apache.org/41kf2
>>>>>>>>
>>>>>>>> This release is using the release script of the tag v3.1.1-rc3.
>>>>>>>>
>>>>>>>> FAQ
>>>>>>>>
>>>>>>>> ===================
>>>>>>>> What happened to 3.1.0?
>>>>>>>> ===================
>>>>>>>>
>>>>>>>> There was a technical issue during Apache Spark 3.1.0 preparation,
>>>>>>>> and it was discussed and decided to skip 3.1.0.
>>>>>>>> Please see
>>>>>>>> https://spark.apache.org/news/next-official-release-spark-3.1.1.html 
>>>>>>>> for
>>>>>>>> more details.
>>>>>>>>
>>>>>>>> =========================
>>>>>>>> How can I help test this release?
>>>>>>>> =========================
>>>>>>>>
>>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>>> then
>>>>>>>> reporting any regressions.
>>>>>>>>
>>>>>>>> If you're working in PySpark you can set up a virtual env and
>>>>>>>> install
>>>>>>>> the current RC via "pip install
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.1.1-rc3-bin/pyspark-3.1.1.tar.gz
>>>>>>>> "
>>>>>>>> and see if anything important breaks.
>>>>>>>> In the Java/Scala, you can add the staging repository to your
>>>>>>>> projects resolvers and test
>>>>>>>> with the RC (make sure to clean up the artifact cache before/after
>>>>>>>> so
>>>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>>>
>>>>>>>> ===========================================
>>>>>>>> What should happen to JIRA tickets still targeting 3.1.1?
>>>>>>>> ===========================================
>>>>>>>>
>>>>>>>> The current list of open tickets targeted at 3.1.1 can be found at:
>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>>> "Target Version/s" = 3.1.1
>>>>>>>>
>>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>>> fixes, documentation, and API tweaks that impact compatibility
>>>>>>>> should
>>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>>> appropriate release.
>>>>>>>>
>>>>>>>> ==================
>>>>>>>> But my bug isn't fixed?
>>>>>>>> ==================
>>>>>>>>
>>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>>> release. That being said, if there is something which is a
>>>>>>>> regression
>>>>>>>> that has not been correctly targeted please ping me or a committer
>>>>>>>> to
>>>>>>>> help target the issue.
>>>>>>>>
>>>>>>>
>>

Reply via email to