Re: Hive JSON Serde question
Try get_json_object UDF. No iterations need. :) On Mon, Jan 26, 2015 at 12:25 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com wrote: Thanks Ed. Let me try a few more iterations. Somehow I am not doing this correctly :-) regards sanjay -- *From:* Edward Capriolo edlinuxg...@gmail.com *To:* user@hive.apache.org user@hive.apache.org; Sanjay Subramanian sanjaysubraman...@yahoo.com *Sent:* Sunday, January 25, 2015 8:11 AM *Subject:* Re: Hive JSON Serde question Nested lists require nested lateral views. On Sun, Jan 25, 2015 at 11:02 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com wrote: hey guys This is the Hive table definition I have created based on the JSON I am using this version of hive json serde https://github.com/rcongiu/Hive-JSON-Serde ADD JAR /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar ; DROP TABLE IF EXISTS datafeed_json ; CREATE EXTERNAL TABLE IF NOT EXISTS datafeed_json ( object STRING, entry array struct id:STRING, time:BIGINT, changes:array struct field:STRING, value:struct item:STRING, verb:STRING, parent_id:STRING, sender_id:BIGINT, created_time:BIGINT ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE LOCATION '/data/sanjay/datafeed' ; QUERY 1 === ADD JAR /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar ; SELECT object, entry[0].id, entry[0].time, entry[0].changes[0].field, entry[0].changes[0].value.item, entry[0].changes[0].value.verb, entry[0].changes[0].value.parent_id, entry[0].changes[0].value.sender_id, entry[0].changes[0].value.created_time FROM datafeed_json ; RESULT1 == foo123 113621765320467 1418608223 leads song1 rock 113621765320467_1107142375968396 14748082019 1418608223 QUERY2 == ADD JAR /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar ; SELECT object, entry.id, entry.time, ntry FROM datafeed_json LATERAL VIEW EXPLODE (datafeed_json.entry.changes) oc1 AS ntry ; RESULT2 === This gives 4 rows but I was not able to iteratively do the LATERAL VIEW EXPLODE I tried various combinations of LATERAL VIEW , LATERAL VIEW EXPLODE, json_tuple to extract all fields in an exploded view from the JSON in tab separated format but no luck. Any thoughts ? Thanks sanjay
Re: possible to pass in a list of values as param ?
Hi Yang, try: hive -hiveconf myargs=1','2','3','4 SELECT * from mytable where my_id in ('${hiveconf:myargs}'); On Tue, Oct 28, 2014 at 5:57 AM, Yang tedd...@gmail.com wrote: I have a query that does something like SELECT * from mytable where my_id in ('1', '2', '3', '4' ); now instead of hard coding the value list in the (), I'd like to pass it in as a param I tried in (${hiveconf:myargs}) and hive -hiveconf myargs='1','2','3','4' or ='1,2,3,4' neither seems to work what is the best way to do this? thanks! yang
Re: Queries : partitioning
Hi Dhaval, 1. you can add a new partitioned column based on the range of you interested column. 2. try to use dynamic partition. https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions On Tue, Sep 23, 2014 at 10:49 AM, Dhaval Patel er.dcpa...@gmail.com wrote: -- Forwarded message -- From: Dhaval Patel er.dcpa...@gmail.com Date: 22/09/2014 7:33 pm Subject: Queries : partitioning To: user@hive.apache.org Cc: Hey folks, 1) Hive table can be partitioned by column, is there any way to partition by range? 2) while loading data into hive table we specify the partition column value. Is there any way to auto partition the data, rather than we specify logic to partition? As i am very new to hive, please suggest the way. Thanks, -- 丁桂涛
Re: How to use joins and averages both in the same hive query
try this: SELECT A.vendor, AVG(totalmatchesperuser) as avgmatches FROM (SELECT A.vendor, A.uid, count(*) as totalmatchesperuser FROM Table1 A INNER JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid GROUP BY A.vendor, A.uid ) t GROUP BY A.vendor On Thu, Sep 4, 2014 at 3:38 AM, Mohit Durgapal durgapalmo...@gmail.com wrote: I have two tables in hive: Table1: uid,txid,amt,vendor Table2: uid,txid Now I need to join the tables on txid which basically confirms a transaction is finally recorded. There will be some transactions which will be present only in Table1 and not in Table2. I need to find out number of avg of transaction matches found per user(uid) per vendor. Then I need to find the avg of these averages by adding all the averages and divide them by the number of unique users per vendor. Let's say I have the data: Table1: u1,120,44,vend1 u1,199,33,vend1 u1,100,23,vend1 u1,101,24,vend1 u2,200,34,vend1 u2,202,32,vend2 Table2: u1,100 u1,101 u2,200 u2,202 Example For vendor vend1: u1- Avg transaction find rate = 2(matches found in both Tables,Table1 and Table2)/4(total occurrence in Table1) =0.5 u2 - Avg transaction find rate = 1/1 = 1 Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75 Required output: vend1,0.75 vend2,1 I can't seem to find count of both matches and occurrence in just Table1 in one hive query per user per vendor. I have reached to this query and can't find how to change it further. SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid Any help would be great. -- 丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Yeah. After setting hive.cache.expr.evaluation=false, all queries output expected results. And I found that it's related to the getDisplayString function in the UDF. At first the function returns a string regardless of its parameters. And I had to set hive.cache.expr.evaluation = false. But after I changed the function to return string in depend of parameters, all queries returned expected results even when the hive.cache.expr.evaluation was set to true. Thanks Navis. It really helps me a lot. Best Regards, Guitao On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 navis@nexr.com wrote: Looks like it's caused by HIVE-7314. Could you try that with hive.cache.expr.evaluation=false? Thanks, Navis 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) dinggui...@baixing.com: Yes. The output is correct: [tp,p,sp]. I developed the UDF using JAVA in eclipse and exported the jar file into the auxlib directory of hive. Then add the following line into the ~/.hiverc file. create temporary function getad as 'xxx'; The hive version is 0.12.0. Perhaps the problem resulted from the mis-optimization of hive. On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin hellojin...@gmail.com wrote: Have you tried this query without UDF, say: select array(tp, p, sp) as ps from ( select 'tp' as tp, 'p' as p, 'sp' as sp from table_name where id = ) t; And how you implement the UDF? 谢谢 金杰 (Jie Jin) On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) dinggui...@baixing.com wrote: Recently I developed a Hive Generic UDF *getad*. It accepts a map type and a string type parameter and outputs a string value. But I found the UDF output really confusing in different conditions. Condition A: select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ; The output is right: 'tp', 'p', 'sp'. Condition B: select array(tp, p, sp) as ps from ( select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ) t; The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the same result: select array( getad(map_col, 'tp'), getad(map_col, 'p'), getad(map_col, 'sp') ) as ps from table_name where id = ; Could you please provide me some hints on this? Thanks! -- 丁桂涛 -- 丁桂涛 -- 丁桂涛
Re: HIVE 0.12 SUM() returning NULL for decimal values
try select sum(sales) from salestemp where sales is not null; On Thu, Jul 24, 2014 at 11:10 PM, Abhishek Gayakwad a.gayak...@gmail.com wrote: I am trying to aggregate one column of decimal type, which is returning me null. If I cast this column to double it returns me some value. following are the steps to recreate this scenario. CREATE TABLE salestemp(sku int, sales decimal); LOAD DATA LOCAL INPATH '00_0' OVERWRITE INTO TABLE salestemp; select sum(sales) from salestemp; // returns null select sum(cast(sales AS double)) from salestemp; // returns 353.9609497977414 data in '00_0' looks like 24687329978.000156909543021743116491818900326715 246873303113.06617408356827568833561026591436615 246873314156.00031381908604348623298363780065343 2468733246.8943049855440756796081014881223248771 2468733280 Please let me know what is going wrong here. Thanks Abhishek -- 丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Yes. The output is correct: [tp,p,sp]. I developed the UDF using JAVA in eclipse and exported the jar file into the auxlib directory of hive. Then add the following line into the ~/.hiverc file. create temporary function getad as 'xxx'; The hive version is 0.12.0. Perhaps the problem resulted from the mis-optimization of hive. On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin hellojin...@gmail.com wrote: Have you tried this query without UDF, say: select array(tp, p, sp) as ps from ( select 'tp' as tp, 'p' as p, 'sp' as sp from table_name where id = ) t; And how you implement the UDF? 谢谢 金杰 (Jie Jin) On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) dinggui...@baixing.com wrote: Recently I developed a Hive Generic UDF *getad*. It accepts a map type and a string type parameter and outputs a string value. But I found the UDF output really confusing in different conditions. Condition A: select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ; The output is right: 'tp', 'p', 'sp'. Condition B: select array(tp, p, sp) as ps from ( select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ) t; The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the same result: select array( getad(map_col, 'tp'), getad(map_col, 'p'), getad(map_col, 'sp') ) as ps from table_name where id = ; Could you please provide me some hints on this? Thanks! -- 丁桂涛 -- 丁桂涛
Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and a string type parameter and outputs a string value. But I found the UDF output really confusing in different conditions. Condition A: select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ; The output is right: 'tp', 'p', 'sp'. Condition B: select array(tp, p, sp) as ps from ( select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ) t; The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the same result: select array( getad(map_col, 'tp'), getad(map_col, 'p'), getad(map_col, 'sp') ) as ps from table_name where id = ; Could you please provide me some hints on this? Thanks! -- 丁桂涛