Re: Can't create UDF's in spark 1.5 while running using the hive thrift service
It is fixed in 1.5.3 https://issues.apache.org/jira/browse/SPARK-11191 On Wed, Dec 9, 2015 at 12:58 AM, Deenar Toraskar wrote: > Hi Trystan > > I am facing the same issue. It only appears with the thrift server, the > same call works fine via the spark-sql shell. Do you have any workarounds > and have you filed a JIRA/bug for the same? > > Regards > Deenar > > On 12 October 2015 at 18:01, Trystan Leftwich wrote: > >> Hi everyone, >> >> Since upgrading to spark 1.5 I've been unable to create and use UDF's >> when we run in thrift server mode. >> >> Our setup: >> We start the thrift-server running against yarn in client mode, (we've >> also built our own spark from github branch-1.5 with the following args, >> -Pyarn -Phive -Phive-thrifeserver) >> >> if i run the following after connecting via JDBC (in this case via >> beeline): >> >> add jar 'hdfs://path/to/jar" >> (this command succeeds with no errors) >> >> CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF'; >> (this command succeeds with no errors) >> >> select testUDF(col1) from table1; >> >> I get the following error in the logs: >> >> org.apache.spark.sql.AnalysisException: undefined function testUDF; line >> 1 pos 8 >> at >> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) >> at >> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) >> at scala.Option.getOrElse(Option.scala:120) >> at >> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) >> at >> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) >> at scala.util.Try.getOrElse(Try.scala:77) >> at >> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) >> at >> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) >> at >> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) >> . >> . (cutting the bulk for ease of email, more than happy to send the full >> output) >> . >> 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running >> hive query: >> org.apache.hive.service.cli.HiveSQLException: >> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 >> pos 100 >> at >> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) >> at >> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >> at >> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> When I ran the same against 1.4 it worked. >> >> I've also changed the spark.sql.hive.metastore.version version to be 0.13 >> (similar to what it was in 1.4) and 0.14 but I still get the same errors. >> >> >> Any suggestions? >>
Re: Can't create UDF's in spark 1.5 while running using the hive thrift service
Hi Trystan I am facing the same issue. It only appears with the thrift server, the same call works fine via the spark-sql shell. Do you have any workarounds and have you filed a JIRA/bug for the same? Regards Deenar On 12 October 2015 at 18:01, Trystan Leftwich wrote: > Hi everyone, > > Since upgrading to spark 1.5 I've been unable to create and use UDF's when > we run in thrift server mode. > > Our setup: > We start the thrift-server running against yarn in client mode, (we've > also built our own spark from github branch-1.5 with the following args, > -Pyarn -Phive -Phive-thrifeserver) > > if i run the following after connecting via JDBC (in this case via > beeline): > > add jar 'hdfs://path/to/jar" > (this command succeeds with no errors) > > CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF'; > (this command succeeds with no errors) > > select testUDF(col1) from table1; > > I get the following error in the logs: > > org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 > pos 8 > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) > at > org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) > at scala.util.Try.getOrElse(Try.scala:77) > at > org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) > . > . (cutting the bulk for ease of email, more than happy to send the full > output) > . > 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive > query: > org.apache.hive.service.cli.HiveSQLException: > org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 > pos 100 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > When I ran the same against 1.4 it worked. > > I've also changed the spark.sql.hive.metastore.version version to be 0.13 > (similar to what it was in 1.4) and 0.14 but I still get the same errors. > > > Any suggestions? > > Thanks, > Trystan > >
Can't create UDF's in spark 1.5 while running using the hive thrift service
Hi everyone, Since upgrading to spark 1.5 I've been unable to create and use UDF's when we run in thrift server mode. Our setup: We start the thrift-server running against yarn in client mode, (we've also built our own spark from github branch-1.5 with the following args, -Pyarn -Phive -Phive-thrifeserver) if i run the following after connecting via JDBC (in this case via beeline): add jar 'hdfs://path/to/jar" (this command succeeds with no errors) CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF'; (this command succeeds with no errors) select testUDF(col1) from table1; I get the following error in the logs: org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 8 at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57) at org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53) at scala.util.Try.getOrElse(Try.scala:77) at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249) . . (cutting the bulk for ease of email, more than happy to send the full output) . 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive query: org.apache.hive.service.cli.HiveSQLException: org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1 pos 100 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) When I ran the same against 1.4 it worked. I've also changed the spark.sql.hive.metastore.version version to be 0.13 (similar to what it was in 1.4) and 0.14 but I still get the same errors. Any suggestions? Thanks, Trystan
Re: Udf's in spark
Sure and sparksql supports Hive UDFs. ISTM that the UDF 'DATE_FORMAT' is just not registered in your metastore? Did you say 'CREATE FUNCTION' in advance? Thanks, On Tue, Jul 14, 2015 at 6:30 PM, Ravisankar Mani wrote: > Hi Everyone, > > As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs. > I have built the UDF's in hive meta store. It working perfectly in hive > connection. But it is not working in spark ("java.lang.RuntimeException: > Couldn't find function DATE_FORMAT"). > > Could you please help how to use this hive UDF' s in sprak? > > > Regards, > > Ravisankar M R > -- --- Takeshi Yamamuro
Udf's in spark
Hi Everyone, As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs. I have built the UDF's in hive meta store. It working perfectly in hive connection. But it is not working in spark ("java.lang.RuntimeException: Couldn't find function DATE_FORMAT"). Could you please help how to use this hive UDF' s in sprak? Regards, Ravisankar M R