Yes, it was done on purpose to match the behavior of Hive (
https://issues.apache.org/jira/browse/SPARK-10865).

And I believe Hive returns `Long`s because they adopted the definition used
in MySQL (https://issues.apache.org/jira/browse/HIVE-615).

On Fri, May 19, 2017 at 10:51 AM, Anton Okolnychyi <
anton.okolnyc...@gmail.com> wrote:

> Hi Dongjoon,
>
> yeah, it seems to be the same. So, was it done on purpose to match the
> behavior of Hive?
>
> Best regards,
> Anton
>
> 2017-05-19 16:39 GMT+02:00 Dong Joon Hyun <dh...@hortonworks.com>:
>
>> Hi, Anton.
>>
>>
>>
>> It’s the same result with Hive, isn’t it?
>>
>>
>>
>> hive> select 9.223372036854786E20, ceil(9.223372036854786E20);
>>
>> OK
>>
>> _c0      _c1
>>
>> 9.223372036854786E20         9223372036854775807
>>
>> Time taken: 2.041 seconds, Fetched: 1 row(s)
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>> *From: *Anton Okolnychyi <anton.okolnyc...@gmail.com>
>> *Date: *Friday, May 19, 2017 at 7:26 AM
>> *To: *"dev@spark.apache.org" <dev@spark.apache.org>
>> *Subject: *[Spark SQL] ceil and floor functions on doubles
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I am wondering why the results of ceil and floor functions on doubles are
>> internally casted to longs. This causes loss of precision since doubles can
>> hold bigger numbers.
>>
>>
>>
>> Consider the following example:
>>
>>
>>
>> // 9.223372036854786E20 is greater than Long.MaxValue
>>
>> val df = sc.parallelize(Array(("col", 9.223372036854786E20))).toDF()
>>
>> df.createOrReplaceTempView("tbl")
>>
>> spark.sql("select _2 AS original_value, ceil(_2) as ceil_result from
>> tbl").show()
>>
>>
>>
>> +---------------------------------+---------------------------------+
>>
>> |        original_value           |         ceil_result               |
>>
>> +---------------------------------+---------------------------------+
>>
>> | 9.223372036854786E20 | 9223372036854775807 |
>>
>> +---------------------------------+---------------------------------+
>>
>>
>>
>> So, the original double value is rounded to 9223372036854775807, which is
>> Long.MaxValue.
>>
>> I think that it would be better to return 9.223372036854786E20 as it was
>> (and as it is actually returned by math.ceil before the cast to long). If
>> it is a problem, then I can fix this.
>>
>>
>>
>> Best regards,
>>
>> Anton
>>
>
>

Reply via email to