Hi,
I am playing with the following example code: public class SparkTest { public static void main(String[] args){ String appName= "This is a test application"; String master="spark://lix1.bh.com:7077"; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); //sqlCtx.sql("LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src"); // Queries are expressed in HiveQL. List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect(); //List<Row> rows = sqlCtx.sql("show tables").collect(); System.out.print("I got " + rows.size() + " rows \r\n"); sc.close(); }} With the create table and load data commands commented out, the query command can be executed successfully, but I come to ClassNotFoundExceptions if these two commands are executed inside HiveContext, even with different error messages, The create table command will cause the following: Exception in thread "main" org.apache.spark.sql.execution.QueryExecutionException: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.api.java.JavaSchemaRDD.<init>(JavaSchemaRDD.scala:42) at org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37) at com.blackhorse.SparkTest.main(SparkTest.java:24) [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - Shutdown hook called The load data command will cause the following: Exception in thread "main" org.apache.spark.sql.execution.QueryExecutionException: FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.api.java.JavaSchemaRDD.<init>(JavaSchemaRDD.scala:42) at org.apache.spark.sql.hive.api.java.JavaHiveContext.sql(JavaHiveContext.scala:37) at com.blackhorse.SparkTest.main(SparkTest.java:25) [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - Shutdown hook called [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called