Re: hive query to calculate percentage
Yong, Thanks for your response.. I am getting an error : FAILED: SemanticException [Error 10002]: Line 11:28 Invalid column reference 'timestamp_dt' Sorry I didn't mention earlier, the table is partitioned on timestamp_dt On 2/25/14, 5:57 PM, java8964 wrote: one query won't work, as totalcount is not in group by. You have 2 options: 1) use the sub query select a.timestamp_dt, a.totalcount/b.total_sum from daily_count_per_kg_domain a join ( select timestamp_dt, sum(totalcount) as total_sum from daily_count_per_kg_domain group by timestamp_dt ) b on (a.timestamp_dt = b.timestamp_dt) 2) If you are using hive 11 or above, using windows functions. Yong Date: Tue, 25 Feb 2014 18:27:34 -0600 Subject: Re: hive query to calculate percentage From: kkrishna...@gmail.com To: user@hive.apache.org Modfiy the query to : select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' group by timestamp_dt; if you dont specify the where clause, you will get result for all partitions. On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com mailto:maa...@gmail.com wrote: I have a partitioned table on timestamp_dt: desc daily_count_per_kg_domain; OK ddnamesyskg string totalcount int timestamp_dtstring hive select * from daily_count_per_kg_domain; OK sys_kg_band 224 20140219 sys_kg_event343520140219 sys_kg_movies 44987 20140219 sys_kg_oly 417220140219 sys_kg_sp_countr549920140219 sys_kg_sports 395420140219 sys_kg_tv 21387 20140219 sys_kg_venue152 20140219 sys_kgathlete 900020140219 sys_kgpeople300064 20140219 Looking to compute percentages for each row, (per day): 100* totalcount / sum(totalcount) Intuitively i tried : select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' ; FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'totalcount' I am not sure what group by on totalcount means.. Any ides ?
Re: hive query to calculate percentage
Modfiy the query to : select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' group by timestamp_dt; if you dont specify the where clause, you will get result for all partitions. On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com wrote: I have a partitioned table on timestamp_dt: desc daily_count_per_kg_domain; OK ddnamesyskg string totalcount int timestamp_dtstring hive select * from daily_count_per_kg_domain; OK sys_kg_band 224 20140219 sys_kg_event343520140219 sys_kg_movies 44987 20140219 sys_kg_oly 417220140219 sys_kg_sp_countr549920140219 sys_kg_sports 395420140219 sys_kg_tv 21387 20140219 sys_kg_venue152 20140219 sys_kgathlete 900020140219 sys_kgpeople300064 20140219 Looking to compute percentages for each row, (per day): 100* totalcount / sum(totalcount) Intuitively i tried : select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' ; FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'totalcount' I am not sure what group by on totalcount means.. Any ides ?
RE: hive query to calculate percentage
one query won't work, as totalcount is not in group by. You have 2 options: 1) use the sub query select a.timestamp_dt, a.totalcount/b.total_sumfrom daily_count_per_kg_domain a join(select timestamp_dt, sum(totalcount) as total_sumfromdaily_count_per_kg_domaingroup by timestamp_dt) b on (a.timestamp_dt = b.timestamp_dt) 2) If you are using hive 11 or above, using windows functions. Yong Date: Tue, 25 Feb 2014 18:27:34 -0600 Subject: Re: hive query to calculate percentage From: kkrishna...@gmail.com To: user@hive.apache.org Modfiy the query to :select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' group by timestamp_dt; if you dont specify the where clause, you will get result for all partitions. On Tue, Feb 25, 2014 at 3:14 PM, Manish maa...@gmail.com wrote: I have a partitioned table on timestamp_dt: desc daily_count_per_kg_domain; OK ddnamesyskg string totalcount int timestamp_dtstring hive select * from daily_count_per_kg_domain; OK sys_kg_band 224 20140219 sys_kg_event343520140219 sys_kg_movies 44987 20140219 sys_kg_oly 417220140219 sys_kg_sp_countr549920140219 sys_kg_sports 395420140219 sys_kg_tv 21387 20140219 sys_kg_venue152 20140219 sys_kgathlete 900020140219 sys_kgpeople300064 20140219 Looking to compute percentages for each row, (per day): 100* totalcount / sum(totalcount) Intuitively i tried : select totalcount / sum(totalcount) from daily_count_per_kg_domain where timestamp_dt = '20140219' ; FAILED: SemanticException [Error 10025]: Line 1:7 Expression not in GROUP BY key 'totalcount' I am not sure what group by on totalcount means.. Any ides ?