[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935860#comment-13935860
 ] 

Eric Hanson commented on HIVE-6664:
-----------------------------------

In general, sum/avg/variance aggregate results that involve floating point 
arithmetic in the sum calculation will return different answers depending on 
execution order. This is due the nature of floating point arithmetic, where it 
is easy to show examples where (a + b) + c <> a + (b + c). So it is probably 
not critical that row-mode and vector mode have results that are compatible to 
the last decimal place. However, the change here is simple enough and it makes 
for better compatibility without any serious drawbacks for performance, so I 
think this is fine.

> Vectorized variance computation differs from row mode computation.
> ------------------------------------------------------------------
>
>                 Key: HIVE-6664
>                 URL: https://issues.apache.org/jira/browse/HIVE-6664
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: HIVE-6664.1.patch
>
>
> Following query can show the difference:
> select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
> stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
> The reason for the difference is that row mode converts the decimal value to 
> double upfront to calculate sum of values, when computing variance. But the 
> vector mode performs local aggregate sum as decimal and converts into double 
> only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to