Re: Spark can not access jar from HDFS !!
Hi All, Thanks for suggestions. What I tried is - hiveContext.sql (add jar ) and that helps to complete the create temporary function but while using this function I get ClassNotFound for the class handling this function. The same class is present in the jar added . Please note that the same works fine from the Hive Shell. Is there an issue with Spark while distributing jars across workers? May be that is causing the problem. Also can you please suggest the manual way of copying the jars to the workers, I just want to ascertain my assumption. Thanks, Ravi On Sun, May 10, 2015 at 1:40 AM Michael Armbrust mich...@databricks.com wrote: That code path is entirely delegated to hive. Does hive support this? You might try instead using sparkContext.addJar. On Sat, May 9, 2015 at 12:32 PM, Ravindra ravindra.baj...@gmail.com wrote: Hi All, I am trying to create custom udfs with hiveContext as given below - scala hiveContext.sql (CREATE TEMPORARY FUNCTION sample_to_upper AS 'com.abc.api.udf.MyUpper' USING JAR 'hdfs:///users/ravindra/customUDF2.jar') I have put the udf jar in the hdfs at the path given above. The same command works well in the hive shell but failing here in the spark shell. And it fails as given below. - 15/05/10 00:41:51 ERROR Task: FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar 15/05/10 00:41:51 INFO FunctionTask: create function: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar at org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:18) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:23) at $line17.$read$$iwC$$iwC$$iwC$$iwC.init(console:25) at $line17.$read$$iwC$$iwC$$iwC.init(console:27) at $line17.$read$$iwC$$iwC.init(console:29) at $line17.$read$$iwC.init(console:31) at $line17.$read.init(console:33) at $line17.$read$.init(console:37) at $line17.$read$.clinit(console) at $line17.$eval$.init(console:7) at $line17.$eval$.clinit(console) at $line17.$eval.$print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at
Re: Spark can not access jar from HDFS !!
After upgrading to spark 1.3, these statements on hivecontext are working fine. Thanks On Mon, May 11, 2015, 12:15 Ravindra ravindra.baj...@gmail.com wrote: Hi All, Thanks for suggestions. What I tried is - hiveContext.sql (add jar ) and that helps to complete the create temporary function but while using this function I get ClassNotFound for the class handling this function. The same class is present in the jar added . Please note that the same works fine from the Hive Shell. Is there an issue with Spark while distributing jars across workers? May be that is causing the problem. Also can you please suggest the manual way of copying the jars to the workers, I just want to ascertain my assumption. Thanks, Ravi On Sun, May 10, 2015 at 1:40 AM Michael Armbrust mich...@databricks.com wrote: That code path is entirely delegated to hive. Does hive support this? You might try instead using sparkContext.addJar. On Sat, May 9, 2015 at 12:32 PM, Ravindra ravindra.baj...@gmail.com wrote: Hi All, I am trying to create custom udfs with hiveContext as given below - scala hiveContext.sql (CREATE TEMPORARY FUNCTION sample_to_upper AS 'com.abc.api.udf.MyUpper' USING JAR 'hdfs:///users/ravindra/customUDF2.jar') I have put the udf jar in the hdfs at the path given above. The same command works well in the hive shell but failing here in the spark shell. And it fails as given below. - 15/05/10 00:41:51 ERROR Task: FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar 15/05/10 00:41:51 INFO FunctionTask: create function: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar at org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:18) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:23) at $line17.$read$$iwC$$iwC$$iwC$$iwC.init(console:25) at $line17.$read$$iwC$$iwC$$iwC.init(console:27) at $line17.$read$$iwC$$iwC.init(console:29) at $line17.$read$$iwC.init(console:31) at $line17.$read.init(console:33) at $line17.$read$.init(console:37) at $line17.$read$.clinit(console) at $line17.$eval$.init(console:7) at $line17.$eval$.clinit(console) at $line17.$eval.$print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at
Re: Spark can not access jar from HDFS !!
That code path is entirely delegated to hive. Does hive support this? You might try instead using sparkContext.addJar. On Sat, May 9, 2015 at 12:32 PM, Ravindra ravindra.baj...@gmail.com wrote: Hi All, I am trying to create custom udfs with hiveContext as given below - scala hiveContext.sql (CREATE TEMPORARY FUNCTION sample_to_upper AS 'com.abc.api.udf.MyUpper' USING JAR 'hdfs:///users/ravindra/customUDF2.jar') I have put the udf jar in the hdfs at the path given above. The same command works well in the hive shell but failing here in the spark shell. And it fails as given below. - 15/05/10 00:41:51 ERROR Task: FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar 15/05/10 00:41:51 INFO FunctionTask: create function: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar at org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:18) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:23) at $line17.$read$$iwC$$iwC$$iwC$$iwC.init(console:25) at $line17.$read$$iwC$$iwC$$iwC.init(console:27) at $line17.$read$$iwC$$iwC.init(console:29) at $line17.$read$$iwC.init(console:31) at $line17.$read.init(console:33) at $line17.$read$.init(console:37) at $line17.$read$.clinit(console) at $line17.$eval$.init(console:7) at $line17.$eval$.clinit(console) at $line17.$eval.$print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at
Spark can not access jar from HDFS !!
Hi All, I am trying to create custom udfs with hiveContext as given below - scala hiveContext.sql (CREATE TEMPORARY FUNCTION sample_to_upper AS 'com.abc.api.udf.MyUpper' USING JAR 'hdfs:///users/ravindra/customUDF2.jar') I have put the udf jar in the hdfs at the path given above. The same command works well in the hive shell but failing here in the spark shell. And it fails as given below. - 15/05/10 00:41:51 ERROR Task: FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar 15/05/10 00:41:51 INFO FunctionTask: create function: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR hdfs:///users/ravindra/customUDF2.jar at org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305) at org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:108) at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:18) at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.init(console:23) at $line17.$read$$iwC$$iwC$$iwC$$iwC.init(console:25) at $line17.$read$$iwC$$iwC$$iwC.init(console:27) at $line17.$read$$iwC$$iwC.init(console:29) at $line17.$read$$iwC.init(console:31) at $line17.$read.init(console:33) at $line17.$read$.init(console:37) at $line17.$read$.clinit(console) at $line17.$eval$.init(console:7) at $line17.$eval$.clinit(console) at $line17.$eval.$print(console) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/05/10 00:41:51 ERROR Driver: FAILED: Execution