[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

Study Hsueh (JIRA) Thu, 12 Nov 2015 09:22:41 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002462#comment-15002462
 ]


Study Hsueh commented on SPARK-11191:
-------------------------------------

I think this is a Spark ThriftServer issue, not spark SQL.

Currently, by default Spark ThriftServer will create an embedded metastore in 
execution path, and you can also setup local/remote metastore service. For this 
case, Spark ThriftServer should support accessing permanent user-defined 
functions in the metastore.


And the Spark SQL document also says "In addition to the basic SQLContext, you 
can also create a HiveContext, which provides a superset of the functionality 
provided by the basic SQLContext. Additional features include the ability to 
write queries using the more complete HiveQL parser, access to Hive UDFs, and 
the ability to read data from Hive tables"

http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext


> [1.5] Can't create UDF's using hive thrift service
> --------------------------------------------------
>
>                 Key: SPARK-11191
>                 URL: https://issues.apache.org/jira/browse/SPARK-11191
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: David Ross
>            Priority: Blocker
>
> Since upgrading to spark 1.5 we've been unable to create and use UDF's when 
> we run in thrift server mode.
> Our setup:
> We start the thrift-server running against yarn in client mode, (we've also 
> built our own spark from github branch-1.5 with the following args: {{-Pyarn 
> -Phive -Phive-thrifeserver}}
> If i run the following after connecting via JDBC (in this case via beeline):
> {{add jar 'hdfs://path/to/jar"}}
> (this command succeeds with no errors)
> {{CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';}}
> (this command succeeds with no errors)
> {{select testUDF(col1) from table1;}}
> I get the following error in the logs:
> {code}
> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 
> pos 8
>         at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>         at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>         at scala.Option.getOrElse(Option.scala:120)
>         at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
>         at 
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
>         at scala.util.Try.getOrElse(Try.scala:77)
>         at 
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>         at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>         at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>         at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
> {code}
> (cutting the bulk for ease of report, more than happy to send the full output)
> {code}
> 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive 
> query:
> org.apache.hive.service.cli.HiveSQLException: 
> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 
> pos 100
>         at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259)
>         at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> When I ran the same against 1.4 it worked.
> I've also changed the {{spark.sql.hive.metastore.version}} version to be 0.13 
> (similar to what it was in 1.4) and 0.14 but I still get the same errors.
> Also, in 1.5, when you run it against the {{spark-sql}} shell, it works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11191) [1.5] Can't create UDF's using hive thrift service

Reply via email to