[ 
https://issues.apache.org/jira/browse/HIVE-25653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444137#comment-17444137
 ] 

Stamatis Zampetakis commented on HIVE-25653:
--------------------------------------------

Hi [~sankarh] , [~ashish-kumar-sharma] , I am not completely sure if this is a 
bug or it can be considered normal behavior.

Moreover, the summary and the scenario do not seem to be inline. The summary 
mentions floating point data types so I assume we talk about aproximate numeric 
types and the example contains a table of integers which is exact numeric.

The following scenario with approximate numerics in Postgres gives back a 
result which resembles the result we get in Hive.
{code:sql}
create table tbl_double ( col1 double precision);
insert into tbl_double values 
('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72');
select STDDEV_SAMP(col1) AS STDDEV_6M , STDDEV(col1) as STDDEV 
,STDDEV_POP(col1) as STDDEV_POP from tbl_double;
{code}

{noformat}
      stddev_6m       |        stddev        |      stddev_pop       
----------------------+----------------------+-----------------------
 5.42317860890711e-13 | 5.42317860890711e-13 | 5.020887760782747e-13
{noformat}

I think that trying the same scenario in different databases will give 
different results and it depends on various things. The result type of the UDFs 
which the SQL standard defines as implementation specific approximate numeric 
and the implicit conversion and rules holding in each database.

> Incorrect results returned by STDDEV, STDDEV_SAMP, STDDEV_POP for floating 
> point data types.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25653
>                 URL: https://issues.apache.org/jira/browse/HIVE-25653
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>    Affects Versions: 3.1.0, 3.1.2
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Description
> *Script*- 
> create table test ( col1 int );
> insert into values 
> ('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72'),('10230.72');
> select STDDEV_SAMP(col1) AS STDDEV_6M , STDDEV(col1) as STDDEV 
> ,STDDEV_POP(col1) as STDDEV_POP from test;
> *Result*- 
> STDDDEV_SAMP                            STDDEV                      
> STDDEV_POP 
> 5.940794514955821E-13     5.42317860890711E-13         5.42317860890711E-13
> *Expected*- 
> STDDDEV_SAMP                            STDDEV                      
> STDDEV_POP 
> 0                                                           0                 
>                0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to