Re: ORDER BY clause in Hive

2015-03-31 Thread Lefty Leverenz
Thanks for the update, Mich.  And thanks to you & Gopal for prompting me to
get the documentation done.

-- Lefty

On Tue, Mar 31, 2015 at 5:20 AM, Mich Talebzadeh 
wrote:

> Thanks Lefty for the information provided.
>
>
>
> My version of hive is 014.0
>
>
>
> hive --version
>
> Hive 0.14.0
>
>
>
> Which should support the configuration parameter at the session level à
> set hive.groupby.orderby.position.alias=true
>
>
>
>
>
> set hive.groupby.orderby.position.alias=true;
>
> SELECT
>
>   rs.Customer_ID
>
> , rs.Number_of_orders
>
> , *rs.Total_customer_amount*
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> stddev_samp(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < stddev_samp(amount_sold)
>
> ) rs
>
> ORDER BY
>
>   -- Total_customer_amount DESC
>
> *  3 DESC*
>
> *;*
>
>
>
>
> +-+--+---+-++--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-+--+---+-++--+
>
> | 11407.0 | 248  | 103412.655|
> 416.9865322580643   | 623.4797510518939  |
>
> | 10747.0 | 256  | 99578.087 |
> 388.9769140624999   | 601.9383117167412  |
>
> | 42167.0 | 266  | 98585.962 |
> 370.6239097744362   | 592.0790992800527  |
>
> | 4974.0  | 235  | 98006.162 |
> 417.0474893617022   | 625.670115050053   |
>
> | 12783.0 | 240  | 97573.546 |
> 406.556458332   | 591.6785002882084  |
>
> | 6395.0  | 268  | 97010.478 |
> 361.97940298507456  | 577.991447849281   |
>
> | 2994.0  | 227  | 94862.616 |
> 417.89696035242315  | 624.5379298449825  |
>
> | 429.0   | 231  | 94819.416 |
> 410.4736363636366   | 615.0384039014772  |
>
> | 1743.0  | 238  | 94786.123 |
> 398.2610504201678   | 582.2684502048478  |
>
>
> +-+--+---+-++--+
>
> 9 rows selected (212.535 seconds)
>
>
>
> Indeed this is the correct result ordering by column posirion
>
>
>
> At the Hive server level I added the folowing properties to
> $HIVE_HOME/conf/ hive-site.xml
>
>
>
>   
>
> hive.groupby.orderby.position.alias
>
> true
>
> Eenables using Column Position Alias in GROUP BY and
> ORDER BY clauses of queries.
>
>   
>
>
>
> And ran the above query without session level setting and it worked
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Lefty Leverenz [mailto:leftylever...@gmail.com]
> *Sent:* 31 March 2015 07:18
> *To:* user@hive.apache.org
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
>  Give a warning when grouping or ordering by a constant column.
>
>
>
> Thanks Gopal.
>
>
> -- Lefty
>
>
&

RE: ORDER BY clause in Hive

2015-03-31 Thread Mich Talebzadeh
Thanks Lefty for the information provided.

 

My version of hive is 014.0

 

hive --version

Hive 0.14.0

 

Which should support the configuration parameter at the session level à set 
hive.groupby.orderby.position.alias=true

 

 

set hive.groupby.orderby.position.alias=true;

SELECT

  rs.Customer_ID

, rs.Number_of_orders

, rs.Total_customer_amount

, rs.Average_order

, rs.Standard_deviation

FROM

(

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

stddev_samp(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < stddev_samp(amount_sold)

) rs

ORDER BY

  -- Total_customer_amount DESC

  3 DESC

;

 

+-+--+---+-++--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |  
rs.average_order   | rs.standard_deviation  |

+-+--+---+-++--+

| 11407.0 | 248  | 103412.655| 
416.9865322580643   | 623.4797510518939  |

| 10747.0 | 256  | 99578.087 | 
388.9769140624999   | 601.9383117167412  |

| 42167.0 | 266  | 98585.962 | 
370.6239097744362   | 592.0790992800527  |

| 4974.0  | 235  | 98006.162 | 
417.0474893617022   | 625.670115050053   |

| 12783.0 | 240  | 97573.546 | 
406.556458332   | 591.6785002882084  |

| 6395.0  | 268  | 97010.478 | 
361.97940298507456  | 577.991447849281   |

| 2994.0  | 227  | 94862.616 | 
417.89696035242315  | 624.5379298449825  |

| 429.0   | 231  | 94819.416 | 
410.4736363636366   | 615.0384039014772  |

| 1743.0  | 238  | 94786.123 | 
398.2610504201678   | 582.2684502048478  |

+-+--+---+-++--+

9 rows selected (212.535 seconds)

 

Indeed this is the correct result ordering by column posirion

 

At the Hive server level I added the folowing properties to $HIVE_HOME/conf/ 
hive-site.xml

 

  

hive.groupby.orderby.position.alias

true

Eenables using Column Position Alias in GROUP BY and ORDER BY 
clauses of queries.

  

 

And ran the above query without session level setting and it worked

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Lefty Leverenz [mailto:leftylever...@gmail.com] 
Sent: 31 March 2015 07:18
To: user@hive.apache.org
Subject: Re: ORDER BY clause in Hive

 

I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160> :  
Give a warning when grouping or ordering by a constant column.

 

Thanks Gopal.




-- Lefty

 

On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz  wrote:

-- Forwarded message --

From: Lefty Leverenz 
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive

To: Mich Talebzadeh 



Hive as I see it does not support ORDER BY Column position. It only supports 
ORDER BY Column name.

 

That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you can set 
the configuration parameter 
<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
  hive.groupby.orderby.position.alias to true, and then you can use column 
positions in ORDER BY.

 

Here's the new documentation:

*
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
 hive.groupby.orderby.position.alias
*   Order By 
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-Sy

Re: ORDER BY clause in Hive

2015-03-30 Thread Lefty Leverenz
I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
 Give a warning when grouping or ordering by a constant column.

Thanks Gopal.

-- Lefty

On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz 
wrote:

> -- Forwarded message --
> From: Lefty Leverenz 
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
> To: Mich Talebzadeh 
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>
> That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
>  *hive.groupby.orderby.position.**alias* to true, and then you can use
> column positions in ORDER BY.
>
> Here's the new documentation:
>
>- hive.groupby.orderby.position.alias
>
> <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
>- Order By
>
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
>- Group By
>
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>
> On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh 
> wrote:
>
>> Gents,
>>
>>
>>
>> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Gopal Vijayaraghavan [mailto:go...@hortonworks.com] *On Behalf
>> Of *Gopal Vijayaraghavan
>> *Sent:* 30 March 2015 05:26
>> *To:* user@hive.apache.org
>> *Cc:* Lefty Leverenz
>>
>> *Subject:* Re: ORDER BY clause in Hive
>>
>>
>>
>> Hi Lefty,
>>
>>
>>
>> Couldn’t find the documentation for what
>> hive.groupby.orderby.position.alias=true does.
>>
>>
>>
>> I suspect that might be what Mich was looking for (though I tend to write
>> the column names explicitly).
>>
>>
>>
>> Cheers,
>>
>> Gopal
>>
>>
>>
>> *From: *Lefty Leverenz 
>> *Reply-To: *"user@hive.apache.org" 
>> *Date: *Sunday, March 29, 2015 at 8:32 PM
>> *To: *"user@hive.apache.org" 
>> *Subject: *Re: ORDER BY clause in Hive
>>
>>
>>
>> I added information about this in the Order By
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>> section of the wiki.  Thanks, Mich and Gopal!
>>
>>
>> -- Lefty
>>
>>
>>
>> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
>> wrote:
>>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>>   rs.Customer_ID
>>
>> , rs.Number_of_orders
>>
>> , rs.Total_customer_amount
>>
>> , rs.Average_order
>>
>> , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> 

Fwd: ORDER BY clause in Hive

2015-03-30 Thread Lefty Leverenz
-- Forwarded message --
From: Lefty Leverenz 
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive
To: Mich Talebzadeh 


Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>

That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you can set
the configuration parameter
<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
 *hive.groupby.orderby.position.**alias* to true, and then you can use
column positions in ORDER BY.

Here's the new documentation:

   - hive.groupby.orderby.position.alias
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
   - Order By
   
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>

   - Group By
   
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>


-- Lefty

On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh 
wrote:

> Gents,
>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> Thanks
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Gopal Vijayaraghavan [mailto:go...@hortonworks.com] *On Behalf Of
> *Gopal Vijayaraghavan
> *Sent:* 30 March 2015 05:26
> *To:* user@hive.apache.org
> *Cc:* Lefty Leverenz
>
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> Hi Lefty,
>
>
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
>
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
>
>
> Cheers,
>
> Gopal
>
>
>
> *From: *Lefty Leverenz 
> *Reply-To: *"user@hive.apache.org" 
> *Date: *Sunday, March 29, 2015 at 8:32 PM
> *To: *"user@hive.apache.org" 
> *Subject: *Re: ORDER BY clause in Hive
>
>
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki.  Thanks, Mich and Gopal!
>
>
> -- Lefty
>
>
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
> wrote:
>
> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
>   rs.Customer_ID
>
> , rs.Number_of_orders
>
> , rs.Total_customer_amount
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
>   *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-+--+---+-++--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-+--+---+-++--+
>
> | 1743.0  | 238  | 94786.123 |
> 398.26105042

Fwd: ORDER BY clause in Hive

2015-03-30 Thread Lefty Leverenz
Oops, failed to send this to user@hive.
-- Lefty

-- Forwarded message --
From: Lefty Leverenz 
Date: Mon, Mar 30, 2015 at 12:42 AM
Subject: Re: ORDER BY clause in Hive
To: Gopal Vijayaraghavan 


Oho!  Good point, Gopal.

hive.groupby.orderby.position.alias isn't in the wiki yet, but I can put it
there (and revise the ORDER BY and GROUP BY docs).  It was introduced in
release 0.11.0 by HIVE-581
<https://issues.apache.org/jira/browse/HIVE-581> with
a default of false.

Description from HiveConf.java:  "Whether to enable using Column Position
Alias in Group By or Order By."

But shouldn't Mich have gotten an error message?

-- Lefty

On Mon, Mar 30, 2015 at 12:25 AM, Gopal Vijayaraghavan 
wrote:

> Hi Lefty,
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
> Cheers,
> Gopal
>
> From: Lefty Leverenz 
> Reply-To: "user@hive.apache.org" 
> Date: Sunday, March 29, 2015 at 8:32 PM
> To: "user@hive.apache.org" 
> Subject: Re: ORDER BY clause in Hive
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki.  Thanks, Mich and Gopal!
>
> -- Lefty
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
> wrote:
>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>>   rs.Customer_ID
>>
>> , rs.Number_of_orders
>>
>> , rs.Total_customer_amount
>>
>> , rs.Average_order
>>
>> , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ) rs
>>
>> ORDER BY
>>
>>   *rs.Total_customer_amount*
>>
>> ;
>>
>>
>>
>>
>>
>>
>> +-+--+---+-++--+
>>
>> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
>> rs.average_order   | rs.standard_deviation  |
>>
>>
>> +-+--+---+-++--+
>>
>> | 1743.0  | 238  | 94786.123 |
>> 398.2610504201678   | 581.0439095219863  |
>>
>> | 429.0   | 231  | 94819.416 |
>> 410.4736363636366   | 613.7057080691426  |
>>
>> | 2994.0  | 227  | 94862.616 |
>> 417.89696035242315  | 623.1607772763742  |
>>
>> | 6395.0  | 268  | 97010.478 |
>> 361.97940298507456  | 576.9120977984521  |
>>
>> | 12783.0 | 240  | 97573.546 |
>> 406.556458332   | 590.4445500393804  |
>>
>> | 4974.0  | 235  | 98006.162 |
>> 417.0474893617022   | 624.337482834059   |
>>
>> | 42167.0 | 266  | 98585.962 |
>> 370.6239097744362   | 590.965120684093   |
>>
>> | 10747.0 | 256  | 99578.087 |
>> 388.9769140624999   | 600.7615005975689  |
>>
>> | 11407.0 | 248  | 103412.655|
>> 416.9865322580643   | 622.221465710723   |
>>
>>
>> +-+--+---+-++--+
>>
>> 9 rows selected (209.699 seconds)
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> Mich Talebzadeh
>

RE: ORDER BY clause in Hive

2015-03-30 Thread Mich Talebzadeh
Gents,

 

Hive as I see it does not support ORDER BY Column position. It only supports
ORDER BY Column name.

 

Thanks

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 30 March 2015 05:26
To: user@hive.apache.org
Cc: Lefty Leverenz
Subject: Re: ORDER BY clause in Hive

 

Hi Lefty,

 

Couldn't find the documentation for what
hive.groupby.orderby.position.alias=true does.

 

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

 

Cheers,

Gopal

 

From: Lefty Leverenz 
Reply-To: "user@hive.apache.org" 
Date: Sunday, March 29, 2015 at 8:32 PM
To: "user@hive.apache.org" 
Subject: Re: ORDER BY clause in Hive

 

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!




-- Lefty

 

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
wrote:

Hi Lefty, Gopal,

 

It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.

 

So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK

 

SELECT

  rs.Customer_ID

, rs.Number_of_orders

, rs.Total_customer_amount

, rs.Average_order

, rs.Standard_deviation

FROM

(

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

) rs

ORDER BY

  rs.Total_customer_amount

;

 

 

+-+--+---+--
---++--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
rs.average_order   | rs.standard_deviation  |

+-+--+---+--
---++--+

| 1743.0  | 238  | 94786.123 |
398.2610504201678   | 581.0439095219863  |

| 429.0   | 231  | 94819.416 |
410.4736363636366   | 613.7057080691426  |

| 2994.0  | 227  | 94862.616 |
417.89696035242315  | 623.1607772763742  |

| 6395.0  | 268  | 97010.478 |
361.97940298507456  | 576.9120977984521  |

| 12783.0 | 240  | 97573.546 |
406.556458332   | 590.4445500393804  |

| 4974.0  | 235  | 98006.162 |
417.0474893617022   | 624.337482834059   |

| 42167.0 | 266  | 98585.962 |
370.6239097744362   | 590.965120684093   |

| 10747.0 | 256  | 99578.087 |
388.9769140624999   | 600.7615005975689  |

| 11407.0 | 248  | 103412.655|
416.9865322580643   | 622.221465710723   |

+-+--+---+--
---++--+

9 rows selected (209.699 seconds)

 

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzad

Re: ORDER BY clause in Hive

2015-03-29 Thread Gopal Vijayaraghavan
Hi Lefty,

Couldn¹t find the documentation for what
hive.groupby.orderby.position.alias=true does.

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

Cheers,
Gopal

From:  Lefty Leverenz 
Reply-To:  "user@hive.apache.org" 
Date:  Sunday, March 29, 2015 at 8:32 PM
To:  "user@hive.apache.org" 
Subject:  Re: ORDER BY clause in Hive

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!

-- Lefty

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
wrote:
> Hi Lefty, Gopal,
>  
> It appears that ORDER BY 3 is not interpreted as ORDERR BY ³the result set
> column three² which standard SQL as evident from Oracle and Sybase does.
>  
> So I made it an ORDER BY from the result set EXPLICITELY as shown below and it
> worked OK
>  
> SELECT
>   rs.Customer_ID
> , rs.Number_of_orders
> , rs.Total_customer_amount
> , rs.Average_order
> , rs.Standard_deviation
> FROM
> (
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ) rs
> ORDER BY
>  rs.Total_customer_amount
> ;
>  
>  
> +-+--+---+
> -++--+
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
> +-+--+---+
> -++--+
> | 1743.0  | 238  | 94786.123 |
> 398.2610504201678   | 581.0439095219863  |
> | 429.0   | 231  | 94819.416 |
> 410.4736363636366   | 613.7057080691426  |
> | 2994.0  | 227  | 94862.616 |
> 417.89696035242315  | 623.1607772763742  |
> | 6395.0  | 268  | 97010.478 |
> 361.97940298507456  | 576.9120977984521  |
> | 12783.0 | 240  | 97573.546 |
> 406.556458332   | 590.4445500393804  |
> | 4974.0  | 235  | 98006.162 |
> 417.0474893617022   | 624.337482834059   |
> | 42167.0 | 266  | 98585.962 |
> 370.6239097744362   | 590.965120684093   |
> | 10747.0 | 256  | 99578.087 |
> 388.9769140624999   | 600.7615005975689  |
> | 11407.0 | 248  | 103412.655|
> 416.9865322580643   | 622.221465710723   |
> +-+--+---+
> -++--+
> 9 rows selected (209.699 seconds)
>  
>  
> Regards,
>  
> 
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  
> 
> From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
> Sent: 29 March 2015 00:11
> To: user@hive.apache.org
> Subject: ORDER BY clause in Hive
> 
>  
> Hi,
>  
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is
> working OK
>  
> I have a simple aggregate query as follows:
>  
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ORDER BY 3 ;
>  
> The original table and

Re: ORDER BY clause in Hive

2015-03-29 Thread Lefty Leverenz
I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
section of the wiki.  Thanks, Mich and Gopal!

-- Lefty

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh 
wrote:

> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
>   rs.Customer_ID
>
> , rs.Number_of_orders
>
> , rs.Total_customer_amount
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
>   *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-+--+---+-++--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-+--+---+-++--+
>
> | 1743.0  | 238  | 94786.123 |
> 398.2610504201678   | 581.0439095219863  |
>
> | 429.0   | 231  | 94819.416 |
> 410.4736363636366   | 613.7057080691426  |
>
> | 2994.0  | 227  | 94862.616 |
> 417.89696035242315  | 623.1607772763742  |
>
> | 6395.0  | 268  | 97010.478 |
> 361.97940298507456  | 576.9120977984521  |
>
> | 12783.0 | 240  | 97573.546 |
> 406.556458332   | 590.4445500393804  |
>
> | 4974.0  | 235  | 98006.162 |
> 417.0474893617022   | 624.337482834059   |
>
> | 42167.0 | 266  | 98585.962 |
> 370.6239097744362   | 590.965120684093   |
>
> | 10747.0 | 256  | 99578.087 |
> 388.9769140624999   | 600.7615005975689  |
>
> | 11407.0 | 248  | 103412.655|
> 416.9865322580643   | 622.221465710723   |
>
>
> +-+--+---+-++--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes bac

RE: ORDER BY clause in Hive

2015-03-29 Thread Mich Talebzadeh
Hi Lefty, Gopal,

 

It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.

 

So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK

 

SELECT

  rs.Customer_ID

, rs.Number_of_orders

, rs.Total_customer_amount

, rs.Average_order

, rs.Standard_deviation

FROM

(

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

) rs

ORDER BY

  rs.Total_customer_amount

;

 

 

+-+--+---+--
---++--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
rs.average_order   | rs.standard_deviation  |

+-+--+---+--
---++--+

| 1743.0  | 238  | 94786.123 |
398.2610504201678   | 581.0439095219863  |

| 429.0   | 231  | 94819.416 |
410.4736363636366   | 613.7057080691426  |

| 2994.0  | 227  | 94862.616 |
417.89696035242315  | 623.1607772763742  |

| 6395.0  | 268  | 97010.478 |
361.97940298507456  | 576.9120977984521  |

| 12783.0 | 240  | 97573.546 |
406.556458332   | 590.4445500393804  |

| 4974.0  | 235  | 98006.162 |
417.0474893617022   | 624.337482834059   |

| 42167.0 | 266  | 98585.962 |
370.6239097744362   | 590.965120684093   |

| 10747.0 | 256  | 99578.087 |
388.9769140624999   | 600.7615005975689  |

| 11407.0 | 248  | 103412.655|
416.9865322580643   | 622.221465710723   |

+-+--+---+--
---++--+

9 rows selected (209.699 seconds)

 

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzadeh [mailto:m...@peridale.co.uk] 
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive

 

Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

---  - -
--

   1743  238  94786.13 398.26105
582.26845

429  231  94819.41410.473636
615.038404

   2994  227  94862.61 417.89696
624.53793

   6395  268  97010.48361.979403
577.991448

  12783  240  97573.55406.556458
591.6785

   4974  235  98006.16417.047489
625.670115

  42167  266  98585.96 370.62391
592.079099

  10747  256  99578.09388.976914
601.938312

  11407  248 103412.66416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive retu

Re: ORDER BY clause in Hive

2015-03-28 Thread Lefty Leverenz
Have you looked at the Hive wiki?  Here's the section on ORDER BY

.

-- Lefty

On Sat, Mar 28, 2015 at 9:45 PM, Gopal Vijayaraghavan 
wrote:

>
> > SELECT cust_id AS Customer_ID,
> > Š
> > ORDER BY 3 ;
>
>
> You¹re sorting on a constant (literal value 3). The results are what you
> get when you run a non-stable sort on a constant.
>
> Cheers,
> Gopal
>
>
>


Re: ORDER BY clause in Hive

2015-03-28 Thread Gopal Vijayaraghavan
 
> SELECT cust_id AS Customer_ID,
> Š
> ORDER BY 3 ;
 

You¹re sorting on a constant (literal value 3). The results are what you
get when you run a non-stable sort on a constant.

Cheers,
Gopal




ORDER BY clause in Hive

2015-03-28 Thread Mich Talebzadeh
Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

---  - -
--

   1743  238  94786.13 398.26105
582.26845

429  231  94819.41410.473636
615.038404

   2994  227  94862.61 417.89696
624.53793

   6395  268  97010.48361.979403
577.991448

  12783  240  97573.55406.556458
591.6785

   4974  235  98006.16417.047489
625.670115

  42167  266  98585.96 370.62391
592.079099

  10747  256  99578.09388.976914
601.938312

  11407  248 103412.66416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive returns for the same query

 

+--+---++---
--+-+--+

| customer_id  | number_of_orders  | total_customer_amount  |
average_order| standard_deviation  |

+--+---++---
--+-+--+

| 42167.0  | 266   | 98585.962  |
370.6239097744362   | 590.965120684093|

| 12783.0  | 240   | 97573.546  |
406.556458332   | 590.4445500393804   |

| 11407.0  | 248   | 103412.655 |
416.9865322580643   | 622.221465710723|

| 10747.0  | 256   | 99578.087  |
388.9769140624999   | 600.7615005975689   |

| 6395.0   | 268   | 97010.478  |
361.97940298507456  | 576.9120977984521   |

| 4974.0   | 235   | 98006.162  |
417.0474893617022   | 624.337482834059|

| 2994.0   | 227   | 94862.616  |
417.89696035242315  | 623.1607772763742   |

| 1743.0   | 238   | 94786.123  |
398.2610504201678   | 581.0439095219863   |

| 429.0| 231   | 94819.416  |
410.4736363636366   | 613.7057080691426   |

+--+---++---
--+-+--+

9 rows selected (215.774 seconds)

 

But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount

 

I also tried this in Sybase and got the same as Oracle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID  Number_of_orders Total_customer_amount
Average_orderStandard_deviation

 
-

---

1743  238
94786.13398.2610504201680
582.268450

  429  231
94819.41410.4736363636363
615.038404

 2994  227
94862.61417.8969603524229
624.537930

 6395  268
97010.48361.9794029850746
577.991448

12783  240
97573.55406.556458333
591.678500

 4974  235
98006.16417.0474893617021
625.670115

42167  266
98585.96370.6239097744360
592.079099

10747  256
99578.09388.9769140625000
601.938312

11407  248
103412.66416.9865322580645
623.479751

 

I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediatel