Re: Hive JSON Serde question

2015-01-25 Thread
Try get_json_object UDF. No iterations need. :)

On Mon, Jan 26, 2015 at 12:25 AM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com wrote:

 Thanks Ed. Let me try a few more iterations. Somehow I am not doing this
 correctly :-)

 regards

 sanjay
   --
  *From:* Edward Capriolo edlinuxg...@gmail.com
 *To:* user@hive.apache.org user@hive.apache.org; Sanjay Subramanian 
 sanjaysubraman...@yahoo.com
 *Sent:* Sunday, January 25, 2015 8:11 AM
 *Subject:* Re: Hive JSON Serde question

 Nested lists require nested lateral views.



 On Sun, Jan 25, 2015 at 11:02 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com wrote:

 hey guys

 This is the Hive table definition I have created based on the JSON
 I am using this version of hive json serde
 https://github.com/rcongiu/Hive-JSON-Serde

 ADD JAR
 /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar
 ;
 DROP TABLE IF EXISTS
   datafeed_json
 ;
 CREATE EXTERNAL TABLE IF NOT EXISTS
datafeed_json (
object STRING,
entry array
   struct
 id:STRING,
  time:BIGINT,
  changes:array
struct
  field:STRING,
   value:struct
 item:STRING,
  verb:STRING,
  parent_id:STRING,
  sender_id:BIGINT,
  created_time:BIGINT
 ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE
 LOCATION '/data/sanjay/datafeed'
 ;


 QUERY 1
 ===
 ADD JAR
 /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar
 ;
  SELECT
 object,
 entry[0].id,
 entry[0].time,
 entry[0].changes[0].field,
 entry[0].changes[0].value.item,
 entry[0].changes[0].value.verb,
 entry[0].changes[0].value.parent_id,
 entry[0].changes[0].value.sender_id,
 entry[0].changes[0].value.created_time
   FROM
 datafeed_json
 ;

 RESULT1
 ==
 foo123  113621765320467 1418608223 leads song1 rock
 113621765320467_1107142375968396 14748082019 1418608223


 QUERY2
 ==
 ADD JAR
 /home/sanjay/mycode/jar/jsonserde/json-serde-1.3.1-SNAPSHOT-jar-with-dependencies.jar
 ;
  SELECT
 object,
 entry.id,
 entry.time,
 ntry
   FROM
 datafeed_json
   LATERAL VIEW EXPLODE
 (datafeed_json.entry.changes) oc1 AS ntry
 ;

 RESULT2
 ===
 This gives 4 rows but I was not able to iteratively do the LATERAL VIEW
 EXPLODE


 I tried various combinations of LATERAL VIEW , LATERAL VIEW EXPLODE,
 json_tuple to extract all fields in an exploded view from the JSON in tab
 separated format but no luck.

 Any thoughts ?



 Thanks

 sanjay








Re: possible to pass in a list of values as param ?

2014-10-27 Thread
Hi Yang,

try:

hive -hiveconf myargs=1','2','3','4

SELECT * from mytable where my_id in ('${hiveconf:myargs}');


On Tue, Oct 28, 2014 at 5:57 AM, Yang tedd...@gmail.com wrote:

 I have a query that does something like


 SELECT * from mytable where my_id in ('1', '2', '3', '4' );


 now instead of hard coding the value list in the (), I'd like to pass it
 in as a param

 I tried in (${hiveconf:myargs}) 

 and hive -hiveconf myargs='1','2','3','4'
 or ='1,2,3,4'

 neither seems to work


 what is the best way to do this?

 thanks!
 yang



Re: Queries : partitioning

2014-09-22 Thread
Hi Dhaval,

1. you can add a new partitioned column based on the range of you
interested column.

2.  try to use dynamic partition.

https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions

On Tue, Sep 23, 2014 at 10:49 AM, Dhaval Patel er.dcpa...@gmail.com wrote:

 -- Forwarded message --
 From: Dhaval Patel er.dcpa...@gmail.com
 Date: 22/09/2014 7:33 pm
 Subject: Queries : partitioning
 To: user@hive.apache.org
 Cc:

 Hey folks,
 1) Hive table can be partitioned by column, is there any way to partition
 by range?
 2) while loading data into hive table we specify the partition column
 value. Is there any way to auto partition the data, rather than we specify
 logic to partition?

 As i am very new to hive, please suggest the way.

 Thanks,




-- 
丁桂涛


Re: How to use joins and averages both in the same hive query

2014-09-03 Thread
try this:


SELECT
A.vendor,
AVG(totalmatchesperuser) as avgmatches
FROM
(SELECT
A.vendor,
A.uid,
count(*) as totalmatchesperuser
FROM
Table1 A INNER JOIN
Table2 B
ON A.uid = B.uid AND B.txid =A.txid
GROUP BY
A.vendor,
A.uid
) t
GROUP BY
A.vendor
​


On Thu, Sep 4, 2014 at 3:38 AM, Mohit Durgapal durgapalmo...@gmail.com
wrote:

 I have two tables in hive:

 Table1: uid,txid,amt,vendor Table2: uid,txid

 Now I need to join the tables on txid which basically confirms a
 transaction is finally recorded. There will be some transactions which will
 be present only in Table1 and not in Table2.

 I need to find out number of avg of transaction matches found per
 user(uid) per vendor. Then I need to find the avg of these averages by
 adding all the averages and divide them by the number of unique users per
 vendor.

 Let's say I have the data:

 Table1:

 u1,120,44,vend1
 u1,199,33,vend1
 u1,100,23,vend1
 u1,101,24,vend1
 u2,200,34,vend1
 u2,202,32,vend2

 Table2:

 u1,100
 u1,101
 u2,200
 u2,202

 Example For vendor vend1:

 u1- Avg transaction find rate = 2(matches found in both Tables,Table1 and
 Table2)/4(total occurrence in Table1) =0.5

 u2 - Avg transaction find rate = 1/1 = 1

 Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75

 Required output:

 vend1,0.75
 vend2,1

 I can't seem to find count of both matches and occurrence in just Table1
 in one hive query per user per vendor. I have reached to this query and
 can't find how to change it further.

 SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN
 Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid

 Any help would be great.




-- 
丁桂涛


Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-24 Thread
Yeah. After setting hive.cache.expr.evaluation=false, all queries output
expected results.

And I found that it's related to the getDisplayString function in the UDF.
At first the function returns a string regardless of its parameters. And I
had to set hive.cache.expr.evaluation = false.

But after I changed the function to return string in depend of parameters,
all queries returned expected results even when the hive.cache.expr.evaluation
was set to true.

Thanks Navis. It really helps me a lot.

Best Regards,

Guitao


On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 navis@nexr.com wrote:

 Looks like it's caused by HIVE-7314. Could you try that with
 hive.cache.expr.evaluation=false?

 Thanks,
 Navis


 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) dinggui...@baixing.com:

 Yes. The output is correct: [tp,p,sp].

 I developed the UDF using JAVA in eclipse and exported the jar file into
 the auxlib directory of hive. Then add the following line into the
 ~/.hiverc file.

 create temporary function getad as 'xxx';

 The hive version is 0.12.0. Perhaps the problem resulted from the
 mis-optimization of hive.


 On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin hellojin...@gmail.com wrote:

 Have you tried this query without UDF, say:


 select
   array(tp, p, sp) as ps
 from
   (
   select
 'tp' as tp,
 'p' as p,
 'sp' as sp
   from
 table_name
   where
 id = 
   ) t;


 ​And how you implement the UDF?​


 谢谢
 金杰 (Jie Jin)


 On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) dinggui...@baixing.com wrote:

  Recently I developed a Hive Generic UDF *getad*. It accepts a map
 type and a string type parameter and outputs a string value. But I found
 the UDF output really confusing in different conditions.

 Condition A:


 select
   getad(map_col, 'tp') as tp,
   getad(map_col, 'p') as p,
   getad(map_col, 'sp') as sp
 from
   table_name
 where
   id = ;

 The output is right: 'tp', 'p', 'sp'.

 Condition B:


 select
   array(tp, p, sp) as ps
 from
   (
   select
 getad(map_col, 'tp') as tp,
 getad(map_col, 'p') as p,
 getad(map_col, 'sp') as sp
   from
 table_name
   where
 id = 
   ) t;

 The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
 the same result:


 select
   array(
 getad(map_col, 'tp'),
 getad(map_col, 'p'),
 getad(map_col, 'sp')
   ) as ps
 from
   table_name
 where
   id = ;

 Could you please provide me some hints on this? Thanks!

 --
 丁桂涛





 --
 丁桂涛





-- 
丁桂涛


Re: HIVE 0.12 SUM() returning NULL for decimal values

2014-07-24 Thread
try
select sum(sales) from salestemp where sales is not null;


On Thu, Jul 24, 2014 at 11:10 PM, Abhishek Gayakwad a.gayak...@gmail.com
wrote:

 I am trying to aggregate one column of decimal type, which is returning me
 null. If I cast this column to double it returns me some value. following
 are the steps to recreate this scenario.

 
 CREATE  TABLE salestemp(sku int, sales decimal);

 LOAD DATA LOCAL INPATH '00_0' OVERWRITE INTO TABLE salestemp;

 select sum(sales) from salestemp; // returns null

 select sum(cast(sales AS double)) from salestemp; // returns
 353.9609497977414

 
 data in  '00_0'  looks like
 
 24687329978.000156909543021743116491818900326715
 246873303113.06617408356827568833561026591436615
 246873314156.00031381908604348623298363780065343
 2468733246.8943049855440756796081014881223248771
 2468733280
 

 Please let me know what is going wrong here.

 Thanks
 Abhishek




-- 
丁桂涛


Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-23 Thread
Yes. The output is correct: [tp,p,sp].

I developed the UDF using JAVA in eclipse and exported the jar file into
the auxlib directory of hive. Then add the following line into the
~/.hiverc file.

create temporary function getad as 'xxx';

The hive version is 0.12.0. Perhaps the problem resulted from the
mis-optimization of hive.


On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin hellojin...@gmail.com wrote:

 Have you tried this query without UDF, say:

 select
   array(tp, p, sp) as ps
 from
   (
   select
 'tp' as tp,
 'p' as p,
 'sp' as sp
   from
 table_name
   where
 id = 
   ) t;


 ​And how you implement the UDF?​


 谢谢
 金杰 (Jie Jin)


 On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) dinggui...@baixing.com wrote:

  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
 and a string type parameter and outputs a string value. But I found the UDF
 output really confusing in different conditions.

 Condition A:

 select
   getad(map_col, 'tp') as tp,
   getad(map_col, 'p') as p,
   getad(map_col, 'sp') as sp
 from
   table_name
 where
   id = ;

 The output is right: 'tp', 'p', 'sp'.

 Condition B:

 select
   array(tp, p, sp) as ps
 from
   (
   select
 getad(map_col, 'tp') as tp,
 getad(map_col, 'p') as p,
 getad(map_col, 'sp') as sp
   from
 table_name
   where
 id = 
   ) t;

 The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
 the same result:

 select
   array(
 getad(map_col, 'tp'),
 getad(map_col, 'p'),
 getad(map_col, 'sp')
   ) as ps
 from
   table_name
 where
   id = ;

 Could you please provide me some hints on this? Thanks!

 --
 丁桂涛





-- 
丁桂涛


Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-22 Thread
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and
a string type parameter and outputs a string value. But I found the UDF
output really confusing in different conditions.

Condition A:

select
  getad(map_col, 'tp') as tp,
  getad(map_col, 'p') as p,
  getad(map_col, 'sp') as sp
from
  table_name
where
  id = ;

The output is right: 'tp', 'p', 'sp'.

Condition B:

select
  array(tp, p, sp) as ps
from
  (
  select
getad(map_col, 'tp') as tp,
getad(map_col, 'p') as p,
getad(map_col, 'sp') as sp
  from
table_name
  where
id = 
  ) t;

The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
same result:

select
  array(
getad(map_col, 'tp'),
getad(map_col, 'p'),
getad(map_col, 'sp')
  ) as ps
from
  table_name
where
  id = ;

Could you please provide me some hints on this? Thanks!

-- 
丁桂涛