Re: Spark-SQL JDBC driver

Anas Mosaad Tue, 09 Dec 2014 02:28:59 -0800

Back to the first question, this will mandate that hive is up and running?

When I try it, I get the following exception. The documentation says that
this method works only on SchemaRDD. I though that countries.saveAsTable
did not work for that a reason so I created a tmp that contains the results
from the registered temp table. Which I could validate that it's a
SchemaRDD as shown below.



*@Judy,* I do really appreciate your kind support and I want to understand
and off course don't want to wast your time. If you can direct me the
documentation describing this details, this will be great.

scala> val tmp = sqlContext.sql("select * from countries")

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==

PhysicalRDD
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36


scala> tmp.saveAsTable("Countries")

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
plan found, tree:

'CreateTableAsSelect None, Countries, false, None

 Project
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

  Subquery countries

   LogicalRDD
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36


 at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)

at
org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)

at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)

at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)

at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27)

at $iwC$$iwC$$iwC.<init>(<console>:29)

at $iwC$$iwC.<init>(<console>:31)

at $iwC.<init>(<console>:33)

at <init>(<console>:35)

at .<init>(<console>:39)

at .<clinit>(<console>)

at .<init>(<console>:7)

at .<clinit>(<console>)

at $print(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)

at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)

at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)

at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)

at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)

at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)

at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)

at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com> wrote:

>  How did you register the table under spark-shell? Two things to notice:
>
> 1. To interact with Hive, HiveContext instead of SQLContext must be used.
> 2. `registerTempTable` doesn't persist the table into Hive metastore, and
> the table is lost after quitting spark-shell. Instead, you must use
> `saveAsTable`.
>
>
> On 12/9/14 5:27 PM, Anas Mosaad wrote:
>
> Thanks Cheng,
>
>  I thought spark-sql is using the same exact metastore, right? However,
> it didn't work as expected. Here's what I did.
>
>  In spark-shell, I loaded a csv files and registered the table, say
> countries.
> Started the thrift server.
> Connected using beeline. When I run show tables or !tables, I get empty
> list of tables as follow:
>
>  *0: jdbc:hive2://localhost:10000> !tables*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *0: jdbc:hive2://localhost:10000> show tables ;*
>
> *+---------+*
>
> *| result  |*
>
> *+---------+*
>
> *+---------+*
>
> *No rows selected (0.106 seconds)*
>
> *0: jdbc:hive2://localhost:10000> *
>
>
>
>  Kindly advice, what am I missing? I want to read the RDD using SQL from
> outside spark-shell (i.e. like any other relational database)
>
>
> On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <lian.cs....@gmail.com> wrote:
>
>>  Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
>> HiveServer2. You don't need to run Hive, but you do need a working
>> Metastore.
>>
>>
>> On 12/9/14 3:59 PM, Anas Mosaad wrote:
>>
>> Thanks Judy, this is exactly what I'm looking for. However, and plz
>> forgive me if it's a dump question is: It seems to me that thrift is the
>> same as hive2 JDBC driver, does this mean that starting thrift will start
>> hive as well on the server?
>>
>> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash <
>> judyn...@exchange.microsoft.com> wrote:
>>
>>>  You can use thrift server for this purpose then test it with beeline.
>>>
>>>
>>>
>>> See doc:
>>>
>>>
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
>>>
>>>
>>>
>>>
>>>
>>> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
>>> *Sent:* Monday, December 8, 2014 11:01 AM
>>> *To:* user@spark.apache.org
>>> *Subject:* Spark-SQL JDBC driver
>>>
>>>
>>>
>>> Hello Everyone,
>>>
>>>
>>>
>>> I'm brand new to spark and was wondering if there's a JDBC driver to
>>> access spark-SQL directly. I'm running spark in standalone mode and don't
>>> have hadoop in this environment.
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>> *Best Regards/أطيب المنى,*
>>>
>>>
>>>
>>> *Anas Mosaad*
>>>
>>>
>>>
>>
>>
>>
>>  --
>>
>> *Best Regards/أطيب المنى,*
>>
>>  *Anas Mosaad*
>> *Incorta Inc.*
>> *+20-100-743-4510*
>>
>>
>>
>
>
>  --
>
> *Best Regards/أطيب المنى,*
>
>  *Anas Mosaad*
> *Incorta Inc.*
> *+20-100-743-4510*
>
>
>


-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

Reply via email to