Re: Can't access remote Hive table from spark

guxiaobo1982 Sun, 08 Feb 2015 17:53:51 -0800

Hi Lian,
Will the latest 0.14.0 version of Hive,which is installed by ambari 1.7.0 by 
default, be supported by the next release of Spark?



Regards,




------------------ Original ------------------
From:  "Cheng Lian";<lian.cs....@gmail.com>;
Send time: Friday, Feb 6, 2015 9:02 AM
To: ""<guxiaobo1...@qq.com>; "user@spark.apache.org"<user@spark.apache.org>; 

Subject:  Re: Can't access remote Hive table from spark



                          
Please note that Spark         1.2.0 only support Hive 0.13.1 or 0.12.0,        
 none of other versions are supported.
       
Best,
         Cheng
       
On 1/25/15 12:18 AM,         guxiaobo1982 wrote:
       

                

                    Hi,
           I built and started a single node standalone Spark 1.2.0             
cluster along with a single node Hive 0.14.0 instance             installed by 
Ambari 1.17.0. On the Spark and Hive node I can             create and query 
tables inside Hive, and on remote machines             I can submit the SparkPi 
example               to the Spark master. But I failed to run the following    
           example code :
           
             
                        
public class SparkTest {
             
 public static               void main(String[] args)
             
 {
             
                  String appName= "This               is a test application";
             
                  String master="spark://lix1.bh.com:7077";
             
  
             
  SparkConf conf = new               
SparkConf().setAppName(appName).setMaster(master);
             
  JavaSparkContext sc = new               JavaSparkContext(conf);
             
  
             
  JavaHiveContext sqlCtx = new               
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
             
                  //sqlCtx.sql("CREATE               TABLE IF NOT EXISTS src 
(key INT,               value STRING)");
             
                  //sqlCtx.sql("LOAD               DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src");
             
                  //               Queries are expressed in HiveQL.
             
List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect();
             
System.out.print("I got " + rows.size() + " rows \r\n");
             
  sc.close();}
             
}
             

             
             
Exception in thread "main" 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src
             
 at               
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
             
 at               
org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
             
 at               
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)
             
 at 
org.apache.spark.sql.hive.HiveContext$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)
             
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
             
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)
             
 at               scala.Option.getOrElse(Option.scala:120)
             
 at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)
             
 at               
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)
             
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)
             
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)
             
 at               
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
             
 at               
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)
             
 at               scala.collection.Iterator$anon$11.next(Iterator.scala:328)
             
 at               scala.collection.Iterator$class.foreach(Iterator.scala:727)
             
 at               scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
             
 at               
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
             
 at               
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
             
 at               
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
             
 at               
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
             
 at               scala.collection.AbstractIterator.to(Iterator.scala:1157)
             
 at               
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
             
 at               
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
             
 at               
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
             
 at               scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
             
 at               
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)
             
 at               
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)
             
 at               
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
             
 at               
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)
             
 at               
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)
             
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61)
             
 at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59)
             
 at               
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
             
 at               scala.collection.immutable.List.foldLeft(List.scala:84)
             
 at               
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59)
             
 at               
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:51)
             
 at               scala.collection.immutable.List.foreach(List.scala:318)
             
 at               
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
             
 at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
             
 at               
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
             
 at               org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
             
 at               
org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)
             
 at               com.blackhorse.SparkTest.main(SparkTest.java:27)
             
[delete Spark temp dirs] DEBUG               org.apache.spark.util.Utils - 
Shutdown hook called
             
             
             
[delete Spark local dirs] DEBUG               
org.apache.spark.storage.DiskBlockManager - Shutdown hook               calle
             

             
             

             
             
But if I change the query to "show tables",               the program can run 
and got 0 rows through I have many               tables inside Hive, so I come 
to doubt that my program or               the spark instance did not connect to 
my Hive instance,               maybe it started a local hive. I have put the   
            hive-site.xml file from Hive installation into spark's              
 conf directory. Can you help figure out what's wrong here,               
thanks.

Re: Can't access remote Hive table from spark

Reply via email to