I am not sure if I've missed something obvious but as far as I can tell DataFrame API doesn't provide a clearly defined ordering rules excluding NaN handling. Methods like DataFrame.sort or sql.functions like min / max provide only general description. Discrepancy between functions.max (min) and GroupedData.max where the latter one supports only numeric makes current situation even more confusing. With growing number of orderable types I believe that documentation should clearly define ordering rules including:
- NULL behavior - collation - behavior on complex types (structs, arrays) While this information can extracted from the source it is not easily accessible and without explicit specification it is not clear if current behavior is contractual. It can be also confusing if user expects an order depending on a current locale (R). Best, Maciej
signature.asc
Description: OpenPGP digital signature