Back to the first question, this will mandate that hive is up and running? When I try it, I get the following exception. The documentation says that this method works only on SchemaRDD. I though that countries.saveAsTable did not work for that a reason so I created a tmp that contains the results from the registered temp table. Which I could validate that it's a SchemaRDD as shown below.
*@Judy,* I do really appreciate your kind support and I want to understand and off course don't want to wast your time. If you can direct me the documentation describing this details, this will be great. scala> val tmp = sqlContext.sql("select * from countries") tmp: org.apache.spark.sql.SchemaRDD = SchemaRDD[12] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == PhysicalRDD [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 scala> tmp.saveAsTable("Countries") org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan found, tree: 'CreateTableAsSelect None, Countries, false, None Project [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29] Subquery countries LogicalRDD [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126) at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27) at $iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC.<init>(<console>:31) at $iwC.<init>(<console>:33) at <init>(<console>:35) at .<init>(<console>:39) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > How did you register the table under spark-shell? Two things to notice: > > 1. To interact with Hive, HiveContext instead of SQLContext must be used. > 2. `registerTempTable` doesn't persist the table into Hive metastore, and > the table is lost after quitting spark-shell. Instead, you must use > `saveAsTable`. > > > On 12/9/14 5:27 PM, Anas Mosaad wrote: > > Thanks Cheng, > > I thought spark-sql is using the same exact metastore, right? However, > it didn't work as expected. Here's what I did. > > In spark-shell, I loaded a csv files and registered the table, say > countries. > Started the thrift server. > Connected using beeline. When I run show tables or !tables, I get empty > list of tables as follow: > > *0: jdbc:hive2://localhost:10000> !tables* > > *+------------+--------------+-------------+-------------+----------+* > > *| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |* > > *+------------+--------------+-------------+-------------+----------+* > > *+------------+--------------+-------------+-------------+----------+* > > *0: jdbc:hive2://localhost:10000> show tables ;* > > *+---------+* > > *| result |* > > *+---------+* > > *+---------+* > > *No rows selected (0.106 seconds)* > > *0: jdbc:hive2://localhost:10000> * > > > > Kindly advice, what am I missing? I want to read the RDD using SQL from > outside spark-shell (i.e. like any other relational database) > > > On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > >> Essentially, the Spark SQL JDBC Thrift server is just a Spark port of >> HiveServer2. You don't need to run Hive, but you do need a working >> Metastore. >> >> >> On 12/9/14 3:59 PM, Anas Mosaad wrote: >> >> Thanks Judy, this is exactly what I'm looking for. However, and plz >> forgive me if it's a dump question is: It seems to me that thrift is the >> same as hive2 JDBC driver, does this mean that starting thrift will start >> hive as well on the server? >> >> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash < >> judyn...@exchange.microsoft.com> wrote: >> >>> You can use thrift server for this purpose then test it with beeline. >>> >>> >>> >>> See doc: >>> >>> >>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server >>> >>> >>> >>> >>> >>> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com] >>> *Sent:* Monday, December 8, 2014 11:01 AM >>> *To:* user@spark.apache.org >>> *Subject:* Spark-SQL JDBC driver >>> >>> >>> >>> Hello Everyone, >>> >>> >>> >>> I'm brand new to spark and was wondering if there's a JDBC driver to >>> access spark-SQL directly. I'm running spark in standalone mode and don't >>> have hadoop in this environment. >>> >>> >>> >>> -- >>> >>> >>> >>> *Best Regards/أطيب المنى,* >>> >>> >>> >>> *Anas Mosaad* >>> >>> >>> >> >> >> >> -- >> >> *Best Regards/أطيب المنى,* >> >> *Anas Mosaad* >> *Incorta Inc.* >> *+20-100-743-4510* >> >> >> > > > -- > > *Best Regards/أطيب المنى,* > > *Anas Mosaad* > *Incorta Inc.* > *+20-100-743-4510* > > > -- *Best Regards/أطيب المنى,* *Anas Mosaad* *Incorta Inc.* *+20-100-743-4510*