Re: Can't create UDF's in spark 1.5 while running using the hive thrift service

2015-12-08 Thread Deenar Toraskar
Hi Trystan

I am facing the same issue. It only appears with the thrift server, the
same call works fine via the spark-sql shell. Do you have any workarounds
and have you filed a JIRA/bug for the same?

Regards
Deenar

On 12 October 2015 at 18:01, Trystan Leftwich  wrote:

> Hi everyone,
>
> Since upgrading to spark 1.5 I've been unable to create and use UDF's when
> we run in thrift server mode.
>
> Our setup:
> We start the thrift-server running against yarn in client mode, (we've
> also built our own spark from github branch-1.5 with the following args,
> -Pyarn -Phive -Phive-thrifeserver)
>
> if i run the following after connecting via JDBC (in this case via
> beeline):
>
> add jar 'hdfs://path/to/jar"
> (this command succeeds with no errors)
>
> CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';
> (this command succeeds with no errors)
>
> select testUDF(col1) from table1;
>
> I get the following error in the logs:
>
> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1
> pos 8
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
> at scala.Option.getOrElse(Option.scala:120)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
> at scala.util.Try.getOrElse(Try.scala:77)
> at
> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
> at
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
> .
> . (cutting the bulk for ease of email, more than happy to send the full
> output)
> .
> 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive
> query:
> org.apache.hive.service.cli.HiveSQLException:
> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1
> pos 100
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> When I ran the same against 1.4 it worked.
>
> I've also changed the spark.sql.hive.metastore.version version to be 0.13
> (similar to what it was in 1.4) and 0.14 but I still get the same errors.
>
>
> Any suggestions?
>
> Thanks,
> Trystan
>
>


Re: Can't create UDF's in spark 1.5 while running using the hive thrift service

2015-12-08 Thread Jeff Zhang
It is fixed in 1.5.3

https://issues.apache.org/jira/browse/SPARK-11191


On Wed, Dec 9, 2015 at 12:58 AM, Deenar Toraskar 
wrote:

> Hi Trystan
>
> I am facing the same issue. It only appears with the thrift server, the
> same call works fine via the spark-sql shell. Do you have any workarounds
> and have you filed a JIRA/bug for the same?
>
> Regards
> Deenar
>
> On 12 October 2015 at 18:01, Trystan Leftwich  wrote:
>
>> Hi everyone,
>>
>> Since upgrading to spark 1.5 I've been unable to create and use UDF's
>> when we run in thrift server mode.
>>
>> Our setup:
>> We start the thrift-server running against yarn in client mode, (we've
>> also built our own spark from github branch-1.5 with the following args,
>> -Pyarn -Phive -Phive-thrifeserver)
>>
>> if i run the following after connecting via JDBC (in this case via
>> beeline):
>>
>> add jar 'hdfs://path/to/jar"
>> (this command succeeds with no errors)
>>
>> CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';
>> (this command succeeds with no errors)
>>
>> select testUDF(col1) from table1;
>>
>> I get the following error in the logs:
>>
>> org.apache.spark.sql.AnalysisException: undefined function testUDF; line
>> 1 pos 8
>> at
>> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>> at
>> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
>> at scala.Option.getOrElse(Option.scala:120)
>> at
>> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
>> at
>> org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
>> at scala.util.Try.getOrElse(Try.scala:77)
>> at
>> org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
>> at
>> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
>> at
>> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
>> at
>> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
>> at
>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
>> .
>> . (cutting the bulk for ease of email, more than happy to send the full
>> output)
>> .
>> 15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running
>> hive query:
>> org.apache.hive.service.cli.HiveSQLException:
>> org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1
>> pos 100
>> at
>> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259)
>> at
>> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> at
>> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> When I ran the same against 1.4 it worked.
>>
>> I've also changed the spark.sql.hive.metastore.version version to be 0.13
>> (similar to what it was in 1.4) and 0.14 but I still 

Can't create UDF's in spark 1.5 while running using the hive thrift service

2015-10-12 Thread Trystan Leftwich
Hi everyone,

Since upgrading to spark 1.5 I've been unable to create and use UDF's when
we run in thrift server mode.

Our setup:
We start the thrift-server running against yarn in client mode, (we've also
built our own spark from github branch-1.5 with the following args, -Pyarn
-Phive -Phive-thrifeserver)

if i run the following after connecting via JDBC (in this case via beeline):

add jar 'hdfs://path/to/jar"
(this command succeeds with no errors)

CREATE TEMPORARY FUNCTION testUDF AS 'com.foo.class.UDF';
(this command succeeds with no errors)

select testUDF(col1) from table1;

I get the following error in the logs:

org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1
pos 8
at
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
at
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2$$anonfun$1.apply(hiveUDFs.scala:58)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:57)
at
org.apache.spark.sql.hive.HiveFunctionRegistry$$anonfun$lookupFunction$2.apply(hiveUDFs.scala:53)
at scala.util.Try.getOrElse(Try.scala:77)
at
org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUDFs.scala:53)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:506)
at
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:505)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:502)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:227)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:226)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:232)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:249)
.
. (cutting the bulk for ease of email, more than happy to send the full
output)
.
15/10/12 14:34:37 ERROR SparkExecuteStatementOperation: Error running hive
query:
org.apache.hive.service.cli.HiveSQLException:
org.apache.spark.sql.AnalysisException: undefined function testUDF; line 1
pos 100
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:259)
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:182)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



When I ran the same against 1.4 it worked.

I've also changed the spark.sql.hive.metastore.version version to be 0.13
(similar to what it was in 1.4) and 0.14 but I still get the same errors.


Any suggestions?

Thanks,
Trystan