Re: Hive on Spark performance

2016-03-14 Thread sjayatheertha
Thanks for your response. We were evaluating Spark and were curious to know how 
it is used today and the lowest latency it can provide. 

> On Mar 14, 2016, at 8:37 AM, Mich Talebzadeh  
> wrote:
> 
> Hi Wlodeck,
> 
> Let us look at this.
> 
> In Oracle I have two tables channels and sales. This code works in Oracle
> 
>   1  select c.channel_id, sum(c.channel_id * (select count(1) from sales s 
> WHERE c.channel_id = s.channel_id)) As R
>   2  from channels c
>   3* group by c.channel_id
> s...@mydb.mich.LOCAL> /
> CHANNEL_ID  R
> -- --
>  2 516050
>  31620984
>  4 473664
>  5  0
>  9  18666
> 
> I have the same tables In Hive but the same query crashes!
> 
> hive> select c.channel_id, sum(c.channel_id * (select count(1) from sales s 
> WHERE c.channel_id = s.channel_id)) As R
> > from channels c
> > group by c.channel_id
> > ;
> NoViableAltException(232@[435:1: precedenceEqualExpression : ( ( LPAREN 
> precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple | 
> precedenceEqualExpressionSingle );])
> 
> The solution is to use a temporary table to keep the sum/group by from sales 
> table as an intermediate stage  (temporary tables are session specific and 
> they are created and dropped after you finish the session)
> 
> hive> create temporary table tmp as select channel_id, count(channel_id) as 
> total from sales group by channel_id;
> 
> 
> Ok the rest is pretty easy
> 
> hive> select c.channel_id, c.channel_id * t.total as results
> > from channels c, tmp t
> > where c.channel_id = t.channel_id;
> 
> 2.0 2800432.0
> 3.0 8802300.0
> 4.0 2583552.0
> 9.0 104013.0
> 
> HTH
> 
> 
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 14 March 2016 at 14:22, ws  wrote:
>> Hive 1.2.1.2.3.4.0-3485
>> Spark 1.5.2
>> Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
>> 
>> ### 
>> SELECT 
>>  f.description,
>>  f.item_number,
>>  sum(f.df_a * (select count(1) from e.mv_A_h_a where hb_h_name = 
>> r.h_id)) as df_a
>> FROM e.eng_fac_atl_sc_bf_qty f, wv_ATL_2_qty_df_rates r
>> where f.item_number NOT LIKE 'HR%' AND f.item_number NOT LIKE 'UG%' AND 
>> f.item_number NOT LIKE 'DEV%'
>> group by 
>>  f.description,
>>  f.item_number
>> ###
>> 
>> This query works fine in oracle but not Hive or Spark.
>> So the problem is: "sum(f.df_a * (select count(1) from e.mv_A_h_a where 
>> hb_h_name = r.h_id)) as df_a" field.
>> 
>> 
>> Thanks,
>> Wlodek
>> --
>> 
>> 
>> On Sunday, March 13, 2016 7:36 PM, Mich Talebzadeh 
>>  wrote:
>> 
>> 
>> Depending on the version of Hive on Spark engine.
>> 
>> As far as I am aware the latest version of Hive that I am using (Hive 2) has 
>> improvements compared to the previous versions of Hive (0.14,1.2.1) on Spark 
>> engine.
>> 
>> As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it is 
>> not the latest Spark but it is pretty good.
>> 
>> What specific concerns do you have in mind?
>> 
>> HTH
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
>>  
>> 
>> On 13 March 2016 at 23:27, sjayatheertha  wrote:
>> Just curious if you could share your experience on the performance of spark 
>> in your company? How much data do you process? And what's the latency you 
>> are getting with spark engine?
>> 
>> Vidya
> 


Re: Hive on Spark performance

2016-03-14 Thread Mich Talebzadeh
Hi Wlodeck,

Let us look at this.

In Oracle I have two tables channels and sales. This code works in Oracle

  1  select c.channel_id, sum(c.channel_id * (select count(1) from sales s
WHERE c.channel_id = s.channel_id)) As R
  2  from channels c
  3* group by c.channel_id
s...@mydb.mich.LOCAL> /
CHANNEL_ID  R
-- --
 2 516050
 31620984
 4 473664
 5  0
 9  18666

I have the same tables In Hive but the same query crashes!

hive> select c.channel_id, sum(c.channel_id * (select count(1) from sales s
WHERE c.channel_id = s.channel_id)) As R
> from channels c
> group by c.channel_id
> ;
NoViableAltException(232@[435:1: precedenceEqualExpression : ( ( LPAREN
precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple |
precedenceEqualExpressionSingle );])

The solution is to use a temporary table to keep the sum/group by from
sales table as an intermediate stage  (temporary tables are session
specific and they are created and dropped after you finish the session)

hive> create temporary table tmp as select channel_id, count(channel_id) as
total from sales group by channel_id;


Ok the rest is pretty easy

hive> select c.channel_id, c.channel_id * t.total as results
> from channels c, tmp t
> where c.channel_id = t.channel_id;

2.0 2800432.0
3.0 8802300.0
4.0 2583552.0
9.0 104013.0

HTH







Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 14 March 2016 at 14:22, ws  wrote:

> Hive 1.2.1.2.3.4.0-3485
> Spark 1.5.2
> Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit
> Production
>
> ###
> SELECT
> f.description,
> f.item_number,
> sum(f.df_a * (select count(1) from e.mv_A_h_a where hb_h_name = r.h_id))
> as df_a
> FROM e.eng_fac_atl_sc_bf_qty f, wv_ATL_2_qty_df_rates r
> where f.item_number NOT LIKE 'HR%' AND f.item_number NOT LIKE 'UG%' AND
> f.item_number NOT LIKE 'DEV%'
> group by
> f.description,
> f.item_number
> ###
>
> This query works fine in oracle but not Hive or Spark.
> So the problem is: "sum(f.df_a * (select count(1) from e.mv_A_h_a where
> hb_h_name = r.h_id)) as df_a" field.
>
>
> Thanks,
> Wlodek
> --
>
>
> On Sunday, March 13, 2016 7:36 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>
> Depending on the version of Hive on Spark engine.
>
> As far as I am aware the latest version of Hive that I am using (Hive 2)
> has improvements compared to the previous versions of Hive (0.14,1.2.1) on
> Spark engine.
>
> As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it
> is not the latest Spark but it is pretty good.
>
> What specific concerns do you have in mind?
>
> HTH
>
>
> Dr Mich Talebzadeh
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
> http://talebzadehmich.wordpress.com
>
>
> On 13 March 2016 at 23:27, sjayatheertha  wrote:
>
> Just curious if you could share your experience on the performance of
> spark in your company? How much data do you process? And what's the latency
> you are getting with spark engine?
>
> Vidya
>
>
>
>
>


Re: Hive on Spark performance

2016-03-14 Thread ws
Hive 1.2.1.2.3.4.0-3485Spark 1.5.2Oracle Database 11g Enterprise Edition 
Release 11.2.0.4.0 - 64bit Production
### SELECT  f.description, f.item_number, sum(f.df_a * (select count(1) from 
e.mv_A_h_a where hb_h_name = r.h_id)) as df_aFROM e.eng_fac_atl_sc_bf_qty f, 
wv_ATL_2_qty_df_rates rwhere f.item_number NOT LIKE 'HR%' AND f.item_number NOT 
LIKE 'UG%' AND f.item_number NOT LIKE 'DEV%'group by  f.description, 
f.item_number###
This query works fine in oracle but not Hive or Spark.So the problem is: 
"sum(f.df_a * (select count(1) from e.mv_A_h_a where hb_h_name = r.h_id)) as 
df_a" field.

Thanks,Wlodek-- 

On Sunday, March 13, 2016 7:36 PM, Mich Talebzadeh 
 wrote:
 

 Depending on the version of Hive on Spark engine. 
As far as I am aware the latest version of Hive that I am using (Hive 2) has 
improvements compared to the previous versions of Hive (0.14,1.2.1) on Spark 
engine.
As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it is not 
the latest Spark but it is pretty good.
What specific concerns do you have in mind?
HTH

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 13 March 2016 at 23:27, sjayatheertha  wrote:

Just curious if you could share your experience on the performance of spark in 
your company? How much data do you process? And what's the latency you are 
getting with spark engine?

Vidya