InterpretedOrdering I think is internal, so it's not useful to add it there for public docs. We should definitely add a small section to the guide.
On Fri, Feb 19, 2016 at 4:09 PM Maciej Szymkiewicz <mszymkiew...@gmail.com> wrote: > I am not sure. Spark SQL, DataFrames and Datasets Guide already has a > section about NaN semantics. This could be a good place to add at least > some basic description. > > For the rest InterpretedOrdering could be a good choice. > > On 02/19/2016 12:35 AM, Reynold Xin wrote: > > You are correct and we should document that. > > Any suggestions on where we should document this? In DoubleType and > FloatType? > > On Tuesday, February 16, 2016, Maciej Szymkiewicz <mszymkiew...@gmail.com> > wrote: > >> I am not sure if I've missed something obvious but as far as I can tell >> DataFrame API doesn't provide a clearly defined ordering rules excluding >> NaN handling. Methods like DataFrame.sort or sql.functions like min / >> max provide only general description. Discrepancy between functions.max >> (min) and GroupedData.max where the latter one supports only numeric >> makes current situation even more confusing. With growing number of >> orderable types I believe that documentation should clearly define >> ordering rules including: >> >> - NULL behavior >> - collation >> - behavior on complex types (structs, arrays) >> >> While this information can extracted from the source it is not easily >> accessible and without explicit specification it is not clear if current >> behavior is contractual. It can be also confusing if user expects an >> order depending on a current locale (R). >> >> Best, >> Maciej >> >> >