[ https://issues.apache.org/jira/browse/SPARK-35327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro updated SPARK-35327: ------------------------------------- Description: This ticket aims at filtering out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). was: This ticket aims at merging similar v1.4(`resources/tpcds`)/v2.7(`resources/tpcds-v2.7.0`) TPCDS queries; it copies 13 query files (q6,q11,q12,q20,q24,q34,q47,q57,q64,q74,q75,q78,q98) from`resources/tpcds-v2.7.0` to `resources/tpcds`, and then remove the files in `resources/tpcds-v2.7.0`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to use `tpcds-v2.7.0/q6.sql` for stable testing in this case. > Filters out the TPC-DS queries that can cause flaky test results > ---------------------------------------------------------------- > > Key: SPARK-35327 > URL: https://issues.apache.org/jira/browse/SPARK-35327 > Project: Spark > Issue Type: Test > Components: SQL, Tests > Affects Versions: 3.0.2, 3.1.1, 3.2.0 > Reporter: Takeshi Yamamuro > Priority: Major > > This ticket aims at filtering out TPCDS v1.4 q6 and q75 in > `TPCDSQueryTestSuite`. > I saw`TPCDSQueryTestSuite` failed nondeterministically because output row > orders were different with those in the golden files. For example, the > failure in the GA job, > https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, > happened because the `tpcds/q6.sql` query output rows were only sorted by > `cnt`: > https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 > Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and > the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and > `a.ca_state`: > https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 > So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 > has the same issue). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org