[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935860#comment-13935860 ]
Eric Hanson commented on HIVE-6664: ----------------------------------- In general, sum/avg/variance aggregate results that involve floating point arithmetic in the sum calculation will return different answers depending on execution order. This is due the nature of floating point arithmetic, where it is easy to show examples where (a + b) + c <> a + (b + c). So it is probably not critical that row-mode and vector mode have results that are compatible to the last decimal place. However, the change here is simple enough and it makes for better compatibility without any serious drawbacks for performance, so I think this is fine. > Vectorized variance computation differs from row mode computation. > ------------------------------------------------------------------ > > Key: HIVE-6664 > URL: https://issues.apache.org/jira/browse/HIVE-6664 > Project: Hive > Issue Type: Bug > Reporter: Jitendra Nath Pandey > Assignee: Jitendra Nath Pandey > Attachments: HIVE-6664.1.patch > > > Following query can show the difference: > select var_samp(ss_sales_price), var_pop(ss_sales_price), > stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. > The reason for the difference is that row mode converts the decimal value to > double upfront to calculate sum of values, when computing variance. But the > vector mode performs local aggregate sum as decimal and converts into double > only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)