Re: DataFrame API and Ordering

Reynold Xin Sun, 21 Feb 2016 22:37:40 -0800

InterpretedOrdering I think is internal, so it's not useful to add it there
for public docs. We should definitely add a small section to the guide.




On Fri, Feb 19, 2016 at 4:09 PM Maciej Szymkiewicz <[email protected]>
wrote:

> I am not sure. Spark SQL, DataFrames and Datasets Guide already has a
> section about NaN semantics. This could be a good place to add at least
> some basic description.
>
> For the rest InterpretedOrdering could be a good choice.
>
> On 02/19/2016 12:35 AM, Reynold Xin wrote:
>
> You are correct and we should document that.
>
> Any suggestions on where we should document this? In DoubleType and
> FloatType?
>
> On Tuesday, February 16, 2016, Maciej Szymkiewicz <[email protected]>
> wrote:
>
>> I am not sure if I've missed something obvious but as far as I can tell
>> DataFrame API doesn't provide a clearly defined ordering rules excluding
>> NaN handling. Methods like DataFrame.sort or sql.functions like min /
>> max provide only general description. Discrepancy between functions.max
>> (min) and GroupedData.max where the latter one supports only numeric
>> makes current situation even more confusing. With growing number of
>> orderable types I believe that documentation should clearly define
>> ordering rules including:
>>
>> - NULL behavior
>> - collation
>> - behavior on complex types (structs, arrays)
>>
>> While this information can extracted from the source it is not easily
>> accessible and without explicit specification it is not clear if current
>> behavior is contractual. It can be also confusing if user expects an
>> order depending on a current locale (R).
>>
>> Best,
>> Maciej
>>
>>
>

Re: DataFrame API and Ordering

Reply via email to