Alex Liu created SPARK-12051: -------------------------------- Summary: Can't register UDF from Hive thrift server Key: SPARK-12051 URL: https://issues.apache.org/jira/browse/SPARK-12051 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Alex Liu
Start thriftserver, then from beeline {code} 0: jdbc:hive2://localhost:10000> create temporary function c_to_string as 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString'; +---------+--+ | result | +---------+--+ +---------+--+ No rows selected (0.483 seconds) 0: jdbc:hive2://localhost:10000> select c_to_string(c4, 'time') from test_table2; Error: org.apache.spark.sql.AnalysisException: undefined function c_to_string; line 1 pos 23 (state=,code=0) {code} log shows {code} OK ERROR 2015-11-30 08:29:37 org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.AnalysisException: undefined function c_to_string; line 1 pos 23 at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) ~[spark-hive_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) ~[spark-hive_2.10-1.5.0.3.jar:1.5.0.3] at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) ~[spark-hive_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) ~[spark-hive_2.10-1.5.0.3.jar:1.5.0.3] at scala.util.Try.getOrElse(Try.scala:77) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) ~[spark-hive_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$23.apply(Analyzer.scala:490) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:489) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:486) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) ~[scala-library-2.10.5.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.to(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:279) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:232) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:76) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:86) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:90) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractTraversable.map(Traversable.scala:105) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:90) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:94) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) ~[scala-library-2.10.5.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.to(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) ~[scala-library-2.10.5.jar:na] at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:94) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:65) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10.applyOrElse(Analyzer.scala:486) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10.applyOrElse(Analyzer.scala:484) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:484) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:483) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) ~[scala-library-2.10.5.jar:na] at scala.collection.immutable.List.foldLeft(List.scala:84) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at scala.collection.immutable.List.foreach(List.scala:318) ~[scala-library-2.10.5.jar:na] at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72) ~[spark-catalyst_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:910) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:910) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:908) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:132) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719) ~[spark-sql_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:224) ~[spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) [spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_66] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_66] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1125) [hadoop-core-1.0.4.18.jar:na] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) [spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] ERROR 2015-11-30 08:29:37 org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: undefined function c_to_string; line 1 pos 23 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) ~[spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) ~[spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_66] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_66] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1125) [hadoop-core-1.0.4.18.jar:na] at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) [spark-hive-thriftserver_2.10-1.5.0.3.jar:1.5.0.3] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa {code} Spark SQL implements its own UDF and registry. beeline sends the UDF registering command to thrift server which register it in hive's UDF registry instead of Spark SQL's UDF registry, so it fails. We need a custom command to register Spark SQL's own UDF -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org