[jira] [Updated] (HIVE-7644) hive custom udf cannot be used in the join_condition(on)

Hayok (JIRA) Thu, 07 Aug 2014 02:59:28 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hayok updated HIVE-7644:
------------------------

    Description: 
console:
hive> ADD JAR xxxxx;
Added xxxxx to class path
Added resource: xxxxx
hive> create temporary function func1 as 'xxx';
OK
Time taken: 0.009 seconds
hive> list jars;
xxx.jar
hive> select /*+ MAPJOIN(certain column1) */
    > *
    > from tb1
    > join tb2 on tb1.column2 = func1(tb2.column3)
    > ;
Total MapReduce jobs = 1
14/08/07 17:38:04 WARN conf.Configuration: 
file:/tmp/[username]hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/08/07 17:38:04 WARN conf.Configuration: 
file:/tmp/[username]/hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.input.dir.recursive is 
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.max.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.min.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.min.split.size.per.rack is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.min.split.size.per.node is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.node
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.reduce.tasks is 
deprecated. Instead, use mapreduce.job.reduces
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
14/08/07 17:38:05 WARN conf.HiveConf: DEPRECATED: Configuration property 
hive.metastore.local no longer has any effect. Make sure to provide a valid 
value for hive.metastore.uris if you are connecting to a remote metastore.
Execution log at: 
/tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log
2014-08-07 05:38:05     Starting to launch local task to process map join;      
maximum memory = 2027290624
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-4

Logs:

/tmp/[username]/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

--------------------------------------------------------------------------------------
Then I watch the log named 
/tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log,
 it writes:
2014-08-07 16:46:59,105 INFO  mr.MapredLocalTask 
(SessionState.java:printInfo(417)) - 2014-08-07 04:46:59       Starting to 
launch local task to process map join;      maximum memory = 2027290624
2014-08-07 16:46:59,114 INFO  mr.MapredLocalTask 
(MapredLocalTask.java:initializeOperators(389)) - fetchoperator for tmp_compete 
created
2014-08-07 16:46:59,196 INFO  exec.TableScanOperator 
(Operator.java:initialize(338)) - Initializing Self 0 TS
2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
(Operator.java:initializeChildren(403)) - Operator 0 TS initialized
2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
(Operator.java:initializeChildren(407)) - Initializing children of 0 TS
2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
(Operator.java:initialize(442)) - Initializing child 1 HASHTABLESINK
2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
(Operator.java:initialize(338)) - Initializing Self 1 HASHTABLESINK
2014-08-07 16:46:59,198 INFO  mapjoin.MapJoinMemoryExhaustionHandler 
(MapJoinMemoryExhaustionHandler.java:<init>(72)) - JVM Max Heap Size: 2027290624
2014-08-07 16:46:59,222 ERROR mr.MapredLocalTask 
(MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local 
work failed
org.apache.hadoop.hive.ql.exec.UDFArgumentException: The UDF implementation 
class 'xxx' is not present in the class path
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:142)
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127)
        at 
org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:66)
        at 
org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:140)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
        at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:408)
        at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:302)
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:728)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


------------------------------------------------------------------------------------
I ensure there is no authorization problem with it,and when the udf is not in 
the join-condition such as select udf(column_name) or where udf(column_name) it 
works good.
Anyone else encountered the problem?

  was:
hive> ADD JAR xxxxx;
Added xxxxx to class path
Added resource: xxxxx
hive> create temporary function func1 as 'xxx';
OK
Time taken: 0.009 seconds
hive> list jars;
xxx.jar
hive> select /*+ MAPJOIN(certain column1) */
    > *
    > from tb1
    > join tb2 on tb1.column2 = func1(tb2.column3)
    > ;
Total MapReduce jobs = 1
14/08/07 17:38:04 WARN conf.Configuration: 
file:/tmp/[username]hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/08/07 17:38:04 WARN conf.Configuration: 
file:/tmp/[username]/hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.input.dir.recursive is 
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.max.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.min.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.min.split.size.per.rack is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.min.split.size.per.node is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.node
14/08/07 17:38:04 INFO Configuration.deprecation: mapred.reduce.tasks is 
deprecated. Instead, use mapreduce.job.reduces
14/08/07 17:38:04 INFO Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
14/08/07 17:38:05 WARN conf.HiveConf: DEPRECATED: Configuration property 
hive.metastore.local no longer has any effect. Make sure to provide a valid 
value for hive.metastore.uris if you are connecting to a remote metastore.
Execution log at: 
/tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log
2014-08-07 05:38:05     Starting to launch local task to process map join;      
maximum memory = 2027290624
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-4

Logs:

/tmp/[username]/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask


Then I watch the log named 
/tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log,
 it writes:
2014-08-07 16:46:59,105 INFO  mr.MapredLocalTask 
(SessionState.java:printInfo(417)) - 2014-08-07 04:46:59       Starting to 
launch local task to process map join;      maximum memory = 2027290624
2014-08-07 16:46:59,114 INFO  mr.MapredLocalTask 
(MapredLocalTask.java:initializeOperators(389)) - fetchoperator for tmp_compete 
created
2014-08-07 16:46:59,196 INFO  exec.TableScanOperator 
(Operator.java:initialize(338)) - Initializing Self 0 TS
2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
(Operator.java:initializeChildren(403)) - Operator 0 TS initialized
2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
(Operator.java:initializeChildren(407)) - Initializing children of 0 TS
2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
(Operator.java:initialize(442)) - Initializing child 1 HASHTABLESINK
2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
(Operator.java:initialize(338)) - Initializing Self 1 HASHTABLESINK
2014-08-07 16:46:59,198 INFO  mapjoin.MapJoinMemoryExhaustionHandler 
(MapJoinMemoryExhaustionHandler.java:<init>(72)) - JVM Max Heap Size: 2027290624
2014-08-07 16:46:59,222 ERROR mr.MapredLocalTask 
(MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local 
work failed
org.apache.hadoop.hive.ql.exec.UDFArgumentException: The UDF implementation 
class 'xxx' is not present in the class path
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:142)
        at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127)
        at 
org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:66)
        at 
org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:140)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
        at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:408)
        at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:302)
        at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:728)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


------------------------------------------------------------------------------------
I ensure there is no authorization problem with it,and when the udf is not in 
the join-condition such as select udf(column_name) or where udf(column_name) it 
works good.
Anyone else encountered the problem?


> hive custom udf cannot be used in the join_condition(on)
> --------------------------------------------------------
>
>                 Key: HIVE-7644
>                 URL: https://issues.apache.org/jira/browse/HIVE-7644
>             Project: Hive
>          Issue Type: Bug
>          Components: Clients
>    Affects Versions: 0.12.0
>            Reporter: Hayok
>
> console:
> hive> ADD JAR xxxxx;
> Added xxxxx to class path
> Added resource: xxxxx
> hive> create temporary function func1 as 'xxx';
> OK
> Time taken: 0.009 seconds
> hive> list jars;
> xxx.jar
> hive> select /*+ MAPJOIN(certain column1) */
>     > *
>     > from tb1
>     > join tb2 on tb1.column2 = func1(tb2.column3)
>     > ;
> Total MapReduce jobs = 1
> 14/08/07 17:38:04 WARN conf.Configuration: 
> file:/tmp/[username]hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
>  attempt to override final parameter: 
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 14/08/07 17:38:04 WARN conf.Configuration: 
> file:/tmp/[username]/hive_2014-08-07_17-38-01_048_6199454015323812186-1/-local-10005/jobconf.xml:an
>  attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 14/08/07 17:38:04 INFO Configuration.deprecation: mapred.input.dir.recursive 
> is deprecated. Instead, use 
> mapreduce.input.fileinputformat.input.dir.recursive
> 14/08/07 17:38:04 INFO Configuration.deprecation: mapred.max.split.size is 
> deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
> 14/08/07 17:38:04 INFO Configuration.deprecation: mapred.min.split.size is 
> deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
> 14/08/07 17:38:04 INFO Configuration.deprecation: 
> mapred.min.split.size.per.rack is deprecated. Instead, use 
> mapreduce.input.fileinputformat.split.minsize.per.rack
> 14/08/07 17:38:04 INFO Configuration.deprecation: 
> mapred.min.split.size.per.node is deprecated. Instead, use 
> mapreduce.input.fileinputformat.split.minsize.per.node
> 14/08/07 17:38:04 INFO Configuration.deprecation: mapred.reduce.tasks is 
> deprecated. Instead, use mapreduce.job.reduces
> 14/08/07 17:38:04 INFO Configuration.deprecation: 
> mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> mapreduce.reduce.speculative
> 14/08/07 17:38:05 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: 
> /tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log
> 2014-08-07 05:38:05   Starting to launch local task to process map join;      
> maximum memory = 2027290624
> Execution failed with exit status: 2
> Obtaining error information
> Task failed!
> Task ID:
>   Stage-4
> Logs:
> /tmp/[username]/hive.log
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
> --------------------------------------------------------------------------------------
> Then I watch the log named 
> /tmp/[username]/[username]_20140807173838_d673690f-c452-4ebb-bf53-9d663c49d04e.log,
>  it writes:
> 2014-08-07 16:46:59,105 INFO  mr.MapredLocalTask 
> (SessionState.java:printInfo(417)) - 2014-08-07 04:46:59     Starting to 
> launch local task to process map join;      maximum memory = 2027290624
> 2014-08-07 16:46:59,114 INFO  mr.MapredLocalTask 
> (MapredLocalTask.java:initializeOperators(389)) - fetchoperator for 
> tmp_compete created
> 2014-08-07 16:46:59,196 INFO  exec.TableScanOperator 
> (Operator.java:initialize(338)) - Initializing Self 0 TS
> 2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
> (Operator.java:initializeChildren(403)) - Operator 0 TS initialized
> 2014-08-07 16:46:59,197 INFO  exec.TableScanOperator 
> (Operator.java:initializeChildren(407)) - Initializing children of 0 TS
> 2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
> (Operator.java:initialize(442)) - Initializing child 1 HASHTABLESINK
> 2014-08-07 16:46:59,197 INFO  exec.HashTableSinkOperator 
> (Operator.java:initialize(338)) - Initializing Self 1 HASHTABLESINK
> 2014-08-07 16:46:59,198 INFO  mapjoin.MapJoinMemoryExhaustionHandler 
> (MapJoinMemoryExhaustionHandler.java:<init>(72)) - JVM Max Heap Size: 
> 2027290624
> 2014-08-07 16:46:59,222 ERROR mr.MapredLocalTask 
> (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map 
> local work failed
> org.apache.hadoop.hive.ql.exec.UDFArgumentException: The UDF implementation 
> class 'xxx' is not present in the class path
>       at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:142)
>       at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:66)
>       at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:140)
>       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
>       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
>       at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:408)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:302)
>       at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:728)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> ------------------------------------------------------------------------------------
> I ensure there is no authorization problem with it,and when the udf is not in 
> the join-condition such as select udf(column_name) or where udf(column_name) 
> it works good.
> Anyone else encountered the problem?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7644) hive custom udf cannot be used in the join_condition(on)

Reply via email to