Are DataFrame rows ordered without an explicit ordering clause?

Nicholas Chammas Mon, 18 Sep 2023 08:48:03 -0700

I’ve always considered DataFrames to be logically equivalent to SQL tables or 
queries.


In SQL, the result order of any query is implementation-dependent without an 
explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10 
times in a row and get 10 different orderings.

I thought the same applied to DataFrames, but the docstring for the recently 
added method DataFrame.offset 
<https://github.com/apache/spark/pull/40873/files#diff-4ff57282598a3b9721b8d6f8c2fea23a62e4bc3c0f1aa5444527549d1daa38baR1293-R1301>
 implies otherwise.

This example will work fine in practice, of course. But if DataFrames are 
technically unordered without an explicit ordering clause, then in theory a 
future implementation change may result in “Bob" being the “first” row in the 
DataFrame, rather than “Tom”. That would make the example incorrect.

Is that not the case?

Nick

Are DataFrame rows ordered without an explicit ordering clause?

Reply via email to