Re: Can't access remote Hive table from spark

Zhan Zhang Thu, 05 Feb 2015 22:57:07 -0800

Not sure spark standalone mode. But on spark-on-yarn, it should work. You can 
check following link:


 http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

Thanks.

Zhan Zhang

On Feb 5, 2015, at 5:02 PM, Cheng Lian 
<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other 
versions are supported.

Best,
Cheng

On 1/25/15 12:18 AM, guxiaobo1982 wrote:


Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


public class SparkTest {

public static void main(String[] args)

{

String appName= "This is a test application";

String master="spark://lix1.bh.com:7077";


SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);

JavaSparkContext sc = new JavaSparkContext(conf);


JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);

//sqlCtx.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");

//sqlCtx.sql("LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src");

// Queries are expressed in HiveQL.

List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect();

System.out.print("I got " + rows.size() + " rows \r\n");

sc.close();}

}


Exception in thread "main" 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$anon$2.org<http://2.org>$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)

at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59)

at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)

at scala.collection.immutable.List.foldLeft(List.scala:84)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at com.blackhorse.SparkTest.main(SparkTest.java:27)

[delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook 
called


[delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - 
Shutdown hook calle



But if I change the query to "show tables", the program can run and got 0 rows 
through I have many tables inside Hive, so I come to doubt that my program or 
the spark instance did not connect to my Hive instance, maybe it started a 
local hive. I have put the hive-site.xml file from Hive installation into 
spark's conf directory. Can you help figure out what's wrong here, thanks.

Re: Can't access remote Hive table from spark

Reply via email to