[jira] [Updated] (SPARK-35327) Filters out the TPC-DS queries that can cause flaky test results

Takeshi Yamamuro (Jira) Thu, 06 May 2021 17:52:08 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-35327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Takeshi Yamamuro updated SPARK-35327:
-------------------------------------
    Description: 
This ticket aims at filtering out TPCDS v1.4 q6 and q75 in 
`TPCDSQueryTestSuite`.

I saw`TPCDSQueryTestSuite` failed nondeterministically because output row 
orders were different with those in the golden files. For example, the failure 
in the GA job, 
https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, 
happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`:

https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the 
only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`:
https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has 
the same issue).

  was:
This ticket aims at merging similar 
v1.4(`resources/tpcds`)/v2.7(`resources/tpcds-v2.7.0`) TPCDS queries; it copies 
13 query files (q6,q11,q12,q20,q24,q34,q47,q57,q64,q74,q75,q78,q98) 
from`resources/tpcds-v2.7.0` to `resources/tpcds`, and then remove the files in 
`resources/tpcds-v2.7.0`.

I saw`TPCDSQueryTestSuite` failed nondeterministically because output row 
orders were different with those in the golden files. For example, the failure 
in the GA job, 
https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, 
happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`:

https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
Actually, `tpcds/q6.sql`  and `tpcds-v2.7.0/q6.sql` are almost the same and the 
only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`:
https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
So, I think it's okay just to use `tpcds-v2.7.0/q6.sql` for stable testing in 
this case.



> Filters out the TPC-DS queries that can cause flaky test results
> ----------------------------------------------------------------
>
>                 Key: SPARK-35327
>                 URL: https://issues.apache.org/jira/browse/SPARK-35327
>             Project: Spark
>          Issue Type: Test
>          Components: SQL, Tests
>    Affects Versions: 3.0.2, 3.1.1, 3.2.0
>            Reporter: Takeshi Yamamuro
>            Priority: Major
>
> This ticket aims at filtering out TPCDS v1.4 q6 and q75 in 
> `TPCDSQueryTestSuite`.
> I saw`TPCDSQueryTestSuite` failed nondeterministically because output row 
> orders were different with those in the golden files. For example, the 
> failure in the GA job, 
> https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true,
>  happened because the `tpcds/q6.sql` query output rows were only sorted by 
> `cnt`:
> https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
> Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and 
> the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and 
> `a.ca_state`:
> https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
> So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 
> has the same issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35327) Filters out the TPC-DS queries that can cause flaky test results

Reply via email to