When I group by a column in DataFusion SQL, the order of the results is 
different every time. For example, "select country from data group by country" 
against 
https://github.com/Teradata/kylo/blob/master/samples/sample-data/csv/userdata3.csv
 might return "Moldova" first one time, and then "Sweden" first the next time I 
execute it.

It appears that this is known and acknowledged behavior (it is mentioned at 
https://issues.apache.org/jira/browse/ARROW-5680), but is there good reason for 
it (e.g., performance; simplicity; random hash seeding)? I understand why it 
makes sense to not unnecessarily impose a particular ordering, but is there any 
reason the results are not consistent between two identical SQL statements 
executed against the same datafusion::execution::context::ExecutionContext?

Reply via email to