Standard Deviation in Hive 2 is still incorrect

2016-04-04 Thread Mich Talebzadeh
Hi,

I reported back in April 2015 that what Hive calls Standard Deviation
Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
been rectified in Hive 2

Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I
did tests with Spark 1.6 as well and Spark correctly points STTDEV to
STDDEV_SAMP.

The following query was used

SELECT

SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
AS MYSTDDEV,
STDDEV(amount_sold) AS STDDEV,
STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
STDDEV_POP(amount_sold) AS STDDEV_POP
fromsales;

The following is from running the above query on Hive where STDDEV -->
 STDDEV_POP which is incorrect

++-++-+--+
|  mystddev  |   stddev|stddev_samp |
stddev_pop  |
++-++-+--+
| 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
260.72704617040444
|
++-++-+--+

The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
correct

++-++-+--+
|  mystddev  |   stddev|stddev_samp |
stddev_pop  |
++-++-+--+
| 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
260.72704617042166  |
++-++-+--+

Hopefully The Hive one will be corrected.

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


Re: Standard Deviation in Hive 2 is still incorrect

2016-04-19 Thread Alan Gates
Have you filed a JIRA ticket for this?  If not, please do so we can track it 
and fix it.  Patches are welcomed as well. :)

Alan.

> On Apr 4, 2016, at 15:27, Mich Talebzadeh  wrote:
> 
> 
> Hi,
> 
> I reported back in April 2015 that what Hive calls Standard Deviation 
> Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not 
> been rectified in Hive 2
> 
> Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also I did 
> tests with Spark 1.6 as well and Spark correctly points STTDEV to STDDEV_SAMP.
> 
> The following query was used
> 
> SELECT
> 
> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
>  AS MYSTDDEV,
> STDDEV(amount_sold) AS STDDEV,
> STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
> STDDEV_POP(amount_sold) AS STDDEV_POP
> fromsales;
> 
> The following is from running the above query on Hive where STDDEV -->  
> STDDEV_POP which is incorrect
> 
> 
> ++-++-+--+
> |  mystddev  |   stddev|stddev_samp | 
> stddev_pop  |
> ++-++-+--+
> | 260.7270919450411  | 260.72704617040444  | 260.7270722861465  | 
> 260.72704617040444  |
> ++-++-+--+
> 
> The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is correct
> 
> ++-++-+--+
> |  mystddev  |   stddev|stddev_samp | 
> stddev_pop  |
> ++-++-+--+
> | 260.7270919450411  | 260.7270722861637   | 260.7270722861637  | 
> 260.72704617042166  |
> ++-++-+--+
> 
> Hopefully The Hive one will be corrected.
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  



Re: Standard Deviation in Hive 2 is still incorrect

2016-04-19 Thread Mich Talebzadeh
Will do thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 April 2016 at 23:33, Alan Gates  wrote:

> Have you filed a JIRA ticket for this?  If not, please do so we can track
> it and fix it.  Patches are welcomed as well. :)
>
> Alan.
>
> > On Apr 4, 2016, at 15:27, Mich Talebzadeh 
> wrote:
> >
> >
> > Hi,
> >
> > I reported back in April 2015 that what Hive calls Standard Deviation
> Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
> been rectified in Hive 2
> >
> > Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also
> I did tests with Spark 1.6 as well and Spark correctly points STTDEV to
> STDDEV_SAMP.
> >
> > The following query was used
> >
> > SELECT
> >
>  
> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
> AS MYSTDDEV,
> > STDDEV(amount_sold) AS STDDEV,
> > STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
> > STDDEV_POP(amount_sold) AS STDDEV_POP
> > fromsales;
> >
> > The following is from running the above query on Hive where STDDEV -->
> STDDEV_POP which is incorrect
> >
> >
> >
> ++-++-+--+
> > |  mystddev  |   stddev|stddev_samp |
>  stddev_pop  |
> >
> ++-++-+--+
> > | 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
> 260.72704617040444  |
> >
> ++-++-+--+
> >
> > The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
> correct
> >
> >
> ++-++-+--+
> > |  mystddev  |   stddev|stddev_samp |
>  stddev_pop  |
> >
> ++-++-+--+
> > | 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
> 260.72704617042166  |
> >
> ++-++-+--+
> >
> > Hopefully The Hive one will be corrected.
> >
> > Thanks
> >
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
>
>


Re: Standard Deviation in Hive 2 is still incorrect

2016-04-21 Thread Mich Talebzadeh
HIVE-13574 

Created and assigned to myself

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 20 April 2016 at 06:54, Mich Talebzadeh 
wrote:

> Will do thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 April 2016 at 23:33, Alan Gates  wrote:
>
>> Have you filed a JIRA ticket for this?  If not, please do so we can track
>> it and fix it.  Patches are welcomed as well. :)
>>
>> Alan.
>>
>> > On Apr 4, 2016, at 15:27, Mich Talebzadeh 
>> wrote:
>> >
>> >
>> > Hi,
>> >
>> > I reported back in April 2015 that what Hive calls Standard Deviation
>> Function  STDDEV is a pointer to STDDEV_POP. This is incorrect and has not
>> been rectified in Hive 2
>> >
>> > Both Oracle and Sybase point STDDEV to STDDEV_SAMP not STDDEV_POP. Also
>> I did tests with Spark 1.6 as well and Spark correctly points STTDEV to
>> STDDEV_SAMP.
>> >
>> > The following query was used
>> >
>> > SELECT
>> >
>>  
>> SQRT((SUM(POWER(AMOUNT_SOLD,2))-(COUNT(1)*POWER(AVG(AMOUNT_SOLD),2)))/(COUNT(1)-1))
>> AS MYSTDDEV,
>> > STDDEV(amount_sold) AS STDDEV,
>> > STDDEV_SAMP(amount_sold) AS STDDEV_SAMP,
>> > STDDEV_POP(amount_sold) AS STDDEV_POP
>> > fromsales;
>> >
>> > The following is from running the above query on Hive where STDDEV -->
>> STDDEV_POP which is incorrect
>> >
>> >
>> >
>> ++-++-+--+
>> > |  mystddev  |   stddev|stddev_samp |
>>  stddev_pop  |
>> >
>> ++-++-+--+
>> > | 260.7270919450411  | 260.72704617040444  | 260.7270722861465  |
>> 260.72704617040444  |
>> >
>> ++-++-+--+
>> >
>> > The following is from Spark-sql where STDDEV -->  STDDEV_SAMP which is
>> correct
>> >
>> >
>> ++-++-+--+
>> > |  mystddev  |   stddev|stddev_samp |
>>  stddev_pop  |
>> >
>> ++-++-+--+
>> > | 260.7270919450411  | 260.7270722861637   | 260.7270722861637  |
>> 260.72704617042166  |
>> >
>> ++-++-+--+
>> >
>> > Hopefully The Hive one will be corrected.
>> >
>> > Thanks
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> > LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>>
>>
>