LOL,
Hindsight is a very good thing and often one learns these through
experience.Once told off because strict ordering was not maintained, then
the lesson will never be forgotten!
HTH
Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom
view
Now, if you are ruthless it'd make sense to randomise the order of results
if someone left out the order by, to stop complacency.
like that time sun changed the ordering that methods were returned in a
Class.listMethods() call and everyone's junit test cases failed if they'd
assumed that ordering
AFAIK, The order is free whether it's SQL without spcified ORDER BY clause or
DataFrame without sort. The behavior is consistent between them.
At 2023-09-18 23:47:40, "Nicholas Chammas" wrote:
I’ve always considered DataFrames to be logically equivalent to SQL tables or
queries.
In
These are good points. In traditional RDBMSs, SQL query results without an
explicit *ORDER BY* clause may vary in order due to optimization,
especially when no clustered index is defined. In contrast, systems like
Hive and Spark SQL, which are based on distributed file storage, do not
rely on
Hi Nicholas,
Your point
"In SQL, the result order of any query is implementation-dependent without
an explicit ORDER BY clause. Technically, you could run `SELECT * FROM
table;` 10 times in a row and get 10 different orderings."
yes I concur my understanding is the same.
In SQL, the result
It should be the same as SQL. Otherwise it takes away a lot of potential future
optimization opportunities.
On Mon, Sep 18 2023 at 8:47 AM, Nicholas Chammas < nicholas.cham...@gmail.com >
wrote:
>
> I’ve always considered DataFrames to be logically equivalent to SQL tables
> or queries.
>
>
I think it's the same, and always has been - yes you don't have a
guaranteed ordering unless an operation produces a specific ordering. Could
be the result of order by, yes; I believe you would be guaranteed that
reading input files results in data in the order they appear in the file,
etc. 1:1
I’ve always considered DataFrames to be logically equivalent to SQL tables or
queries.
In SQL, the result order of any query is implementation-dependent without an
explicit ORDER BY clause. Technically, you could run `SELECT * FROM table;` 10
times in a row and get 10 different orderings.
I