melgenek commented on issue #4495: URL: https://github.com/apache/arrow-datafusion/issues/4495#issuecomment-1407497719
I migrated `decimal.rs` https://github.com/apache/arrow-datafusion/pull/5086. I took the tests as is, and transformed them into `.slt`. The problem is that ordering in these tests is not defined. For example, [this test](https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/decimal.slt#L91-L96) doesn't have an `order by` clause. ``` query RRI?R select * from decimal_simple where c1 > c5; ---- 0.00002 0.000000000002 3 false 0.000019 0.00003 0.000000000003 5 true 0.000011 0.00005 0.000000000005 8 false 0.000033 ``` It seems that the fact that this test passes now in ci, and had passed before when it was in the `decimal.rs` is that the Datafusion implementation hasn't yet changed significantly enough to cause the order of the results to change. @jackwener wasn't this lucky with his union tests, and they eventually failed in the master branch https://github.com/apache/arrow-datafusion/pull/5095. I'd like to introduce some determinism to the decimal tests, and probably some other tests that don't have explicit ordering. My question is what is the best way to do this? I see that Datafusion uses both `rowsort` and `order by`. [DuckDB states](https://duckdb.org/dev/sqllogictest/result_verification#result-sorting) that it prefers an explicit `order by`. But, for example, [CocroachDB](https://github.com/cockroachdb/cockroach/search?q=rowsort) and [Risingwave](https://github.com/risingwavelabs/risingwave/search?p=1&q=rowsort) use `rowsort` quite extensively. Are there any guidelines for using or not using `rowsort` and `order by` in Datafusion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
