Re: hive query to calculate percentage

2014-02-26 Thread Manish

Yong,
Thanks for your response..

I am getting an error :

FAILED: SemanticException [Error 10002]: Line 11:28 Invalid column 
reference 'timestamp_dt'



Sorry I didn't mention earlier, the  table is partitioned on timestamp_dt 


On 2/25/14, 5:57 PM, java8964 wrote:

one query won't work, as totalcount is not in group by.

You have 2 options:

1) use the sub query

select a.timestamp_dt, a.totalcount/b.total_sum
from daily_count_per_kg_domain a join
(
select timestamp_dt, sum(totalcount) as total_sum
from
daily_count_per_kg_domain
group by timestamp_dt
) b on (a.timestamp_dt = b.timestamp_dt)

2) If you are using hive 11 or above, using windows functions.

Yong


Date: Tue, 25 Feb 2014 18:27:34 -0600
Subject: Re: hive query to calculate percentage
From: kkrishna...@gmail.com
To: user@hive.apache.org

Modfiy the query to :
select totalcount / sum(totalcount) from daily_count_per_kg_domain 
where timestamp_dt = '20140219' group by timestamp_dt;


if you dont specify the where clause, you will get result for all 
partitions.



On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com 
mailto:maa...@gmail.com wrote:


I have a partitioned table  on timestamp_dt:

 desc daily_count_per_kg_domain;
OK
ddnamesyskg string
totalcount  int
timestamp_dtstring

hive select * from daily_count_per_kg_domain;
OK
sys_kg_band 224 20140219
sys_kg_event343520140219
sys_kg_movies   44987   20140219
sys_kg_oly  417220140219
sys_kg_sp_countr549920140219
sys_kg_sports   395420140219
sys_kg_tv   21387   20140219
sys_kg_venue152 20140219
sys_kgathlete   900020140219
sys_kgpeople300064  20140219


Looking to compute percentages for each row, (per day):

100* totalcount / sum(totalcount)


Intuitively i tried :
 select totalcount / sum(totalcount) from
daily_count_per_kg_domain where timestamp_dt = '20140219' ;

FAILED: SemanticException [Error 10025]: Line 1:7 Expression not
in GROUP BY key 'totalcount'

I am not sure what group by on totalcount means..

Any ides ?







Re: hive query to calculate percentage

2014-02-25 Thread Krishnan K
Modfiy the query to :
select totalcount / sum(totalcount) from daily_count_per_kg_domain where
timestamp_dt = '20140219' group by timestamp_dt;

if you dont specify the where clause, you will get result for all
partitions.


On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com wrote:

 I have a partitioned table  on timestamp_dt:

  desc daily_count_per_kg_domain;
 OK
 ddnamesyskg string
 totalcount  int
 timestamp_dtstring

 hive select * from daily_count_per_kg_domain;
 OK
 sys_kg_band 224 20140219
 sys_kg_event343520140219
 sys_kg_movies   44987   20140219
 sys_kg_oly  417220140219
 sys_kg_sp_countr549920140219
 sys_kg_sports   395420140219
 sys_kg_tv   21387   20140219
 sys_kg_venue152 20140219
 sys_kgathlete   900020140219
 sys_kgpeople300064  20140219


 Looking to compute percentages for each row, (per day):

 100* totalcount / sum(totalcount)


 Intuitively i tried :
  select totalcount / sum(totalcount) from daily_count_per_kg_domain where
 timestamp_dt = '20140219' ;

 FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP
 BY key 'totalcount'

 I am not sure what group by on totalcount means..

 Any ides ?





RE: hive query to calculate percentage

2014-02-25 Thread java8964
one query won't work, as totalcount is not in group by.
You have 2 options:
1) use the sub query
select a.timestamp_dt, a.totalcount/b.total_sumfrom daily_count_per_kg_domain a 
join(select timestamp_dt, sum(totalcount) as 
total_sumfromdaily_count_per_kg_domaingroup by timestamp_dt) b on 
(a.timestamp_dt = b.timestamp_dt)
2) If you are using hive 11 or above, using windows functions.
Yong

Date: Tue, 25 Feb 2014 18:27:34 -0600
Subject: Re: hive query to calculate percentage
From: kkrishna...@gmail.com
To: user@hive.apache.org

Modfiy the query to :select totalcount / sum(totalcount) from 
daily_count_per_kg_domain where timestamp_dt = '20140219' group by timestamp_dt;

if you dont specify the where clause, you will get result for all partitions.


On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com wrote:

I have a partitioned table  on timestamp_dt:



 desc daily_count_per_kg_domain;

OK

ddnamesyskg string

totalcount  int

timestamp_dtstring



hive select * from daily_count_per_kg_domain;

OK

sys_kg_band 224 20140219

sys_kg_event343520140219

sys_kg_movies   44987   20140219

sys_kg_oly  417220140219

sys_kg_sp_countr549920140219

sys_kg_sports   395420140219

sys_kg_tv   21387   20140219

sys_kg_venue152 20140219

sys_kgathlete   900020140219

sys_kgpeople300064  20140219





Looking to compute percentages for each row, (per day):



100* totalcount / sum(totalcount)





Intuitively i tried :

 select totalcount / sum(totalcount) from daily_count_per_kg_domain where 
 timestamp_dt = '20140219' ;



FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY 
key 'totalcount'



I am not sure what group by on totalcount means..



Any ides ?