Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-24 Thread 丁桂涛(桂花)
Yeah. After setting hive.cache.expr.evaluation=false, all queries output
expected results.

And I found that it's related to the getDisplayString function in the UDF.
At first the function returns a string regardless of its parameters. And I
had to set hive.cache.expr.evaluation = false.

But after I changed the function to return string in depend of parameters,
all queries returned expected results even when the hive.cache.expr.evaluation
was set to true.

Thanks Navis. It really helps me a lot.

Best Regards,

Guitao


On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우  wrote:

> Looks like it's caused by HIVE-7314. Could you try that with
> "hive.cache.expr.evaluation=false"?
>
> Thanks,
> Navis
>
>
> 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) :
>
> Yes. The output is correct: ["tp","p","sp"].
>>
>> I developed the UDF using JAVA in eclipse and exported the jar file into
>> the auxlib directory of hive. Then add the following line into the
>> ~/.hiverc file.
>>
>> create temporary function getad as 'xxx';
>>
>> The hive version is 0.12.0. Perhaps the problem resulted from the
>> mis-optimization of hive.
>>
>>
>> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin  wrote:
>>
>>> Have you tried this query without UDF, say:
>>>
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>> 'tp' as tp,
>>> 'p' as p,
>>> 'sp' as sp
>>>   from
>>> table_name
>>>   where
>>> id = 
>>>   ) t;
>>>
>>>
>>> ​And how you implement the UDF?​
>>>
>>>
>>> 谢谢
>>> 金杰 (Jie Jin)
>>>
>>>
>>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花)  wrote:
>>>
  Recently I developed a Hive Generic UDF *getad*. It accepts a map
 type and a string type parameter and outputs a string value. But I found
 the UDF output really confusing in different conditions.

 Condition A:


 select
   getad(map_col, 'tp') as tp,
   getad(map_col, 'p') as p,
   getad(map_col, 'sp') as sp
 from
   table_name
 where
   id = ;

 The output is right: 'tp', 'p', 'sp'.

 Condition B:


 select
   array(tp, p, sp) as ps
 from
   (
   select
 getad(map_col, 'tp') as tp,
 getad(map_col, 'p') as p,
 getad(map_col, 'sp') as sp
   from
 table_name
   where
 id = 
   ) t;

 The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
 the same result:


 select
   array(
 getad(map_col, 'tp'),
 getad(map_col, 'p'),
 getad(map_col, 'sp')
   ) as ps
 from
   table_name
 where
   id = ;

 Could you please provide me some hints on this? Thanks!

 --
 丁桂涛

>>>
>>>
>>
>>
>> --
>> 丁桂涛
>>
>
>


-- 
丁桂涛


Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-23 Thread Navis류승우
Looks like it's caused by HIVE-7314. Could you try that with
"hive.cache.expr.evaluation=false"?

Thanks,
Navis


2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) :

> Yes. The output is correct: ["tp","p","sp"].
>
> I developed the UDF using JAVA in eclipse and exported the jar file into
> the auxlib directory of hive. Then add the following line into the
> ~/.hiverc file.
>
> create temporary function getad as 'xxx';
>
> The hive version is 0.12.0. Perhaps the problem resulted from the
> mis-optimization of hive.
>
>
> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin  wrote:
>
>> Have you tried this query without UDF, say:
>>
>> select
>>   array(tp, p, sp) as ps
>> from
>>   (
>>   select
>> 'tp' as tp,
>> 'p' as p,
>> 'sp' as sp
>>   from
>> table_name
>>   where
>> id = 
>>   ) t;
>>
>>
>> ​And how you implement the UDF?​
>>
>>
>> 谢谢
>> 金杰 (Jie Jin)
>>
>>
>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花)  wrote:
>>
>>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>>> and a string type parameter and outputs a string value. But I found the UDF
>>> output really confusing in different conditions.
>>>
>>> Condition A:
>>>
>>> select
>>>   getad(map_col, 'tp') as tp,
>>>   getad(map_col, 'p') as p,
>>>   getad(map_col, 'sp') as sp
>>> from
>>>   table_name
>>> where
>>>   id = ;
>>>
>>> The output is right: 'tp', 'p', 'sp'.
>>>
>>> Condition B:
>>>
>>> select
>>>   array(tp, p, sp) as ps
>>> from
>>>   (
>>>   select
>>> getad(map_col, 'tp') as tp,
>>> getad(map_col, 'p') as p,
>>> getad(map_col, 'sp') as sp
>>>   from
>>> table_name
>>>   where
>>> id = 
>>>   ) t;
>>>
>>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>>> the same result:
>>>
>>> select
>>>   array(
>>> getad(map_col, 'tp'),
>>> getad(map_col, 'p'),
>>> getad(map_col, 'sp')
>>>   ) as ps
>>> from
>>>   table_name
>>> where
>>>   id = ;
>>>
>>> Could you please provide me some hints on this? Thanks!
>>>
>>> --
>>> 丁桂涛
>>>
>>
>>
>
>
> --
> 丁桂涛
>


Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-23 Thread 丁桂涛(桂花)
Yes. The output is correct: ["tp","p","sp"].

I developed the UDF using JAVA in eclipse and exported the jar file into
the auxlib directory of hive. Then add the following line into the
~/.hiverc file.

create temporary function getad as 'xxx';

The hive version is 0.12.0. Perhaps the problem resulted from the
mis-optimization of hive.


On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin  wrote:

> Have you tried this query without UDF, say:
>
> select
>   array(tp, p, sp) as ps
> from
>   (
>   select
> 'tp' as tp,
> 'p' as p,
> 'sp' as sp
>   from
> table_name
>   where
> id = 
>   ) t;
>
>
> ​And how you implement the UDF?​
>
>
> 谢谢
> 金杰 (Jie Jin)
>
>
> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花)  wrote:
>
>>  Recently I developed a Hive Generic UDF *getad*. It accepts a map type
>> and a string type parameter and outputs a string value. But I found the UDF
>> output really confusing in different conditions.
>>
>> Condition A:
>>
>> select
>>   getad(map_col, 'tp') as tp,
>>   getad(map_col, 'p') as p,
>>   getad(map_col, 'sp') as sp
>> from
>>   table_name
>> where
>>   id = ;
>>
>> The output is right: 'tp', 'p', 'sp'.
>>
>> Condition B:
>>
>> select
>>   array(tp, p, sp) as ps
>> from
>>   (
>>   select
>> getad(map_col, 'tp') as tp,
>> getad(map_col, 'p') as p,
>> getad(map_col, 'sp') as sp
>>   from
>> table_name
>>   where
>> id = 
>>   ) t;
>>
>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs
>> the same result:
>>
>> select
>>   array(
>> getad(map_col, 'tp'),
>> getad(map_col, 'p'),
>> getad(map_col, 'sp')
>>   ) as ps
>> from
>>   table_name
>> where
>>   id = ;
>>
>> Could you please provide me some hints on this? Thanks!
>>
>> --
>> 丁桂涛
>>
>
>


-- 
丁桂涛


Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-23 Thread Jie Jin
Have you tried this query without UDF, say:

select
  array(tp, p, sp) as ps
from
  (
  select
'tp' as tp,
'p' as p,
'sp' as sp
  from
table_name
  where
id = 
  ) t;


​And how you implement the UDF?​


谢谢
金杰 (Jie Jin)


On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花)  wrote:

> Recently I developed a Hive Generic UDF *getad*. It accepts a map type
> and a string type parameter and outputs a string value. But I found the UDF
> output really confusing in different conditions.
>
> Condition A:
>
> select
>   getad(map_col, 'tp') as tp,
>   getad(map_col, 'p') as p,
>   getad(map_col, 'sp') as sp
> from
>   table_name
> where
>   id = ;
>
> The output is right: 'tp', 'p', 'sp'.
>
> Condition B:
>
> select
>   array(tp, p, sp) as ps
> from
>   (
>   select
> getad(map_col, 'tp') as tp,
> getad(map_col, 'p') as p,
> getad(map_col, 'sp') as sp
>   from
> table_name
>   where
> id = 
>   ) t;
>
> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
> same result:
>
> select
>   array(
> getad(map_col, 'tp'),
> getad(map_col, 'p'),
> getad(map_col, 'sp')
>   ) as ps
> from
>   table_name
> where
>   id = ;
>
> Could you please provide me some hints on this? Thanks!
>
> --
> 丁桂涛
>


Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-22 Thread 丁桂涛(桂花)
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and
a string type parameter and outputs a string value. But I found the UDF
output really confusing in different conditions.

Condition A:

select
  getad(map_col, 'tp') as tp,
  getad(map_col, 'p') as p,
  getad(map_col, 'sp') as sp
from
  table_name
where
  id = ;

The output is right: 'tp', 'p', 'sp'.

Condition B:

select
  array(tp, p, sp) as ps
from
  (
  select
getad(map_col, 'tp') as tp,
getad(map_col, 'p') as p,
getad(map_col, 'sp') as sp
  from
table_name
  where
id = 
  ) t;

The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the
same result:

select
  array(
getad(map_col, 'tp'),
getad(map_col, 'p'),
getad(map_col, 'sp')
  ) as ps
from
  table_name
where
  id = ;

Could you please provide me some hints on this? Thanks!

-- 
丁桂涛