Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Yeah. After setting hive.cache.expr.evaluation=false, all queries output expected results. And I found that it's related to the getDisplayString function in the UDF. At first the function returns a string regardless of its parameters. And I had to set hive.cache.expr.evaluation = false. But after I changed the function to return string in depend of parameters, all queries returned expected results even when the hive.cache.expr.evaluation was set to true. Thanks Navis. It really helps me a lot. Best Regards, Guitao On Thu, Jul 24, 2014 at 2:55 PM, Navis류승우 wrote: > Looks like it's caused by HIVE-7314. Could you try that with > "hive.cache.expr.evaluation=false"? > > Thanks, > Navis > > > 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) : > > Yes. The output is correct: ["tp","p","sp"]. >> >> I developed the UDF using JAVA in eclipse and exported the jar file into >> the auxlib directory of hive. Then add the following line into the >> ~/.hiverc file. >> >> create temporary function getad as 'xxx'; >> >> The hive version is 0.12.0. Perhaps the problem resulted from the >> mis-optimization of hive. >> >> >> On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin wrote: >> >>> Have you tried this query without UDF, say: >>> >>> >>> select >>> array(tp, p, sp) as ps >>> from >>> ( >>> select >>> 'tp' as tp, >>> 'p' as p, >>> 'sp' as sp >>> from >>> table_name >>> where >>> id = >>> ) t; >>> >>> >>> And how you implement the UDF? >>> >>> >>> 谢谢 >>> 金杰 (Jie Jin) >>> >>> >>> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) wrote: >>> Recently I developed a Hive Generic UDF *getad*. It accepts a map type and a string type parameter and outputs a string value. But I found the UDF output really confusing in different conditions. Condition A: select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ; The output is right: 'tp', 'p', 'sp'. Condition B: select array(tp, p, sp) as ps from ( select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ) t; The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the same result: select array( getad(map_col, 'tp'), getad(map_col, 'p'), getad(map_col, 'sp') ) as ps from table_name where id = ; Could you please provide me some hints on this? Thanks! -- 丁桂涛 >>> >>> >> >> >> -- >> 丁桂涛 >> > > -- 丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Looks like it's caused by HIVE-7314. Could you try that with "hive.cache.expr.evaluation=false"? Thanks, Navis 2014-07-24 14:34 GMT+09:00 丁桂涛(桂花) : > Yes. The output is correct: ["tp","p","sp"]. > > I developed the UDF using JAVA in eclipse and exported the jar file into > the auxlib directory of hive. Then add the following line into the > ~/.hiverc file. > > create temporary function getad as 'xxx'; > > The hive version is 0.12.0. Perhaps the problem resulted from the > mis-optimization of hive. > > > On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin wrote: > >> Have you tried this query without UDF, say: >> >> select >> array(tp, p, sp) as ps >> from >> ( >> select >> 'tp' as tp, >> 'p' as p, >> 'sp' as sp >> from >> table_name >> where >> id = >> ) t; >> >> >> And how you implement the UDF? >> >> >> 谢谢 >> 金杰 (Jie Jin) >> >> >> On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) wrote: >> >>> Recently I developed a Hive Generic UDF *getad*. It accepts a map type >>> and a string type parameter and outputs a string value. But I found the UDF >>> output really confusing in different conditions. >>> >>> Condition A: >>> >>> select >>> getad(map_col, 'tp') as tp, >>> getad(map_col, 'p') as p, >>> getad(map_col, 'sp') as sp >>> from >>> table_name >>> where >>> id = ; >>> >>> The output is right: 'tp', 'p', 'sp'. >>> >>> Condition B: >>> >>> select >>> array(tp, p, sp) as ps >>> from >>> ( >>> select >>> getad(map_col, 'tp') as tp, >>> getad(map_col, 'p') as p, >>> getad(map_col, 'sp') as sp >>> from >>> table_name >>> where >>> id = >>> ) t; >>> >>> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs >>> the same result: >>> >>> select >>> array( >>> getad(map_col, 'tp'), >>> getad(map_col, 'p'), >>> getad(map_col, 'sp') >>> ) as ps >>> from >>> table_name >>> where >>> id = ; >>> >>> Could you please provide me some hints on this? Thanks! >>> >>> -- >>> 丁桂涛 >>> >> >> > > > -- > 丁桂涛 >
Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Yes. The output is correct: ["tp","p","sp"]. I developed the UDF using JAVA in eclipse and exported the jar file into the auxlib directory of hive. Then add the following line into the ~/.hiverc file. create temporary function getad as 'xxx'; The hive version is 0.12.0. Perhaps the problem resulted from the mis-optimization of hive. On Thu, Jul 24, 2014 at 1:11 PM, Jie Jin wrote: > Have you tried this query without UDF, say: > > select > array(tp, p, sp) as ps > from > ( > select > 'tp' as tp, > 'p' as p, > 'sp' as sp > from > table_name > where > id = > ) t; > > > And how you implement the UDF? > > > 谢谢 > 金杰 (Jie Jin) > > > On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) wrote: > >> Recently I developed a Hive Generic UDF *getad*. It accepts a map type >> and a string type parameter and outputs a string value. But I found the UDF >> output really confusing in different conditions. >> >> Condition A: >> >> select >> getad(map_col, 'tp') as tp, >> getad(map_col, 'p') as p, >> getad(map_col, 'sp') as sp >> from >> table_name >> where >> id = ; >> >> The output is right: 'tp', 'p', 'sp'. >> >> Condition B: >> >> select >> array(tp, p, sp) as ps >> from >> ( >> select >> getad(map_col, 'tp') as tp, >> getad(map_col, 'p') as p, >> getad(map_col, 'sp') as sp >> from >> table_name >> where >> id = >> ) t; >> >> The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs >> the same result: >> >> select >> array( >> getad(map_col, 'tp'), >> getad(map_col, 'p'), >> getad(map_col, 'sp') >> ) as ps >> from >> table_name >> where >> id = ; >> >> Could you please provide me some hints on this? Thanks! >> >> -- >> 丁桂涛 >> > > -- 丁桂涛
Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Have you tried this query without UDF, say: select array(tp, p, sp) as ps from ( select 'tp' as tp, 'p' as p, 'sp' as sp from table_name where id = ) t; And how you implement the UDF? 谢谢 金杰 (Jie Jin) On Wed, Jul 23, 2014 at 1:34 PM, 丁桂涛(桂花) wrote: > Recently I developed a Hive Generic UDF *getad*. It accepts a map type > and a string type parameter and outputs a string value. But I found the UDF > output really confusing in different conditions. > > Condition A: > > select > getad(map_col, 'tp') as tp, > getad(map_col, 'p') as p, > getad(map_col, 'sp') as sp > from > table_name > where > id = ; > > The output is right: 'tp', 'p', 'sp'. > > Condition B: > > select > array(tp, p, sp) as ps > from > ( > select > getad(map_col, 'tp') as tp, > getad(map_col, 'p') as p, > getad(map_col, 'sp') as sp > from > table_name > where > id = > ) t; > > The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the > same result: > > select > array( > getad(map_col, 'tp'), > getad(map_col, 'p'), > getad(map_col, 'sp') > ) as ps > from > table_name > where > id = ; > > Could you please provide me some hints on this? Thanks! > > -- > 丁桂涛 >
Hive UDF gives duplicate result regardless of parameters, when nested in a subquery
Recently I developed a Hive Generic UDF *getad*. It accepts a map type and a string type parameter and outputs a string value. But I found the UDF output really confusing in different conditions. Condition A: select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ; The output is right: 'tp', 'p', 'sp'. Condition B: select array(tp, p, sp) as ps from ( select getad(map_col, 'tp') as tp, getad(map_col, 'p') as p, getad(map_col, 'sp') as sp from table_name where id = ) t; The output is wrong: 'tp', 'tp', 'tp'. And the following query outputs the same result: select array( getad(map_col, 'tp'), getad(map_col, 'p'), getad(map_col, 'sp') ) as ps from table_name where id = ; Could you please provide me some hints on this? Thanks! -- 丁桂涛