I am not sure if I've missed something obvious but as far as I can tell
DataFrame API doesn't provide a clearly defined ordering rules excluding
NaN handling. Methods like DataFrame.sort or sql.functions like min /
max provide only general description. Discrepancy between functions.max
(min) and GroupedData.max where the latter one supports only numeric
makes current situation even more confusing. With growing number of
orderable types I believe that documentation should clearly define
ordering rules including:

- NULL behavior
- collation
- behavior on complex types (structs, arrays)

While this information can extracted from the source it is not easily
accessible and without explicit specification it is not clear if current
behavior is contractual. It can be also confusing if user expects an
order depending on a current locale (R).

Best,
Maciej

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to