Re: Spark SQL JDBC teradata syntax error

2019-05-03 Thread Gourav Sengupta
What is the query

On Fri, May 3, 2019 at 5:28 PM KhajaAsmath Mohammed 
wrote:

> Hi
>
> I have followed link
> https://community.teradata.com/t5/Connectivity/Teradata-JDBC-Driver-returns-the-wrong-schema-column-nullability/m-p/77824
>  to
> connect teradata from spark.
>
> I was able to print schema if I give table name instead of sql query.
>
> I am getting below error if I give query(code snippet from above link).
> any help is appreciated?
>
> Exception in thread "main" java.sql.SQLException: [Teradata Database]
> [TeraJDBC 16.20.00.10] [Error 3707] [SQLState 42000] Syntax error, expected
> something like an 'EXCEPT' keyword or an 'UNION' keyword or a 'MINUS'
> keyword between the word 'VEHP91_BOM' and '?'.
> at
> com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:309)
> at
> com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:103)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:311)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:200)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:137)
> at
> com.teradata.jdbc.jdbc_4.statemachine.StatementController.run(StatementController.java:128)
> at
> com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:389)
> at
> com.teradata.jdbc.jdbc_4.TDStatement.prepareRequest(TDStatement.java:576)
> at
> com.teradata.jdbc.jdbc_4.TDPreparedStatement.(TDPreparedStatement.java:131)
> at
> com.teradata.jdbc.jdk6.JDK6_SQL_PreparedStatement.(JDK6_SQL_PreparedStatement.java:30)
> at
> com.teradata.jdbc.jdk6.JDK6_SQL_Connection.constructPreparedStatement(JDK6_SQL_Connection.java:82)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1337)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1381)
> at com.teradata.jdbc.jdbc_4.TDSession.prepareStatement(TDSession.java:1367)
>
>
> Thanks,
> Asmath
>


Re: spark-sql jdbc dataframe mysql data type issue

2016-06-25 Thread Mich Talebzadeh
select 10 sample rows for columns id, ctime from each (MySQL and spark)
tables and post the output please.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 June 2016 at 13:36, 刘虓  wrote:

> Hi,
> I came across this strange behavior of Apache Spark 1.6.1:
> when I was reading mysql table into spark dataframe ,a column of data type
> float got mapped into double.
>
> dataframe schema:
>
> root
>
>  |-- id: long (nullable = true)
>
>  |-- ctime: double (nullable = true)
>
>  |-- atime: double (nullable = true)
>
> mysql schema:
>
> mysql> desc test.user_action_2;
>
> +---+--+--+-+-+---+
>
> | Field | Type | Null | Key | Default | Extra |
>
> +---+--+--+-+-+---+
>
> | id| int(10) unsigned | YES  | | NULL|   |
>
> | ctime | float| YES  | | NULL|   |
>
> | atime | double   | YES  | | NULL|   |
>
> +---+--+--+-+-+---+
> I wonder if anyone have seen this behavior before.
>


Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-06 Thread Richard Hillegas

Hi Rajesh,

The 1.6 schedule is available on the front page of the Spark wiki:
https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage. I don't
know of any workarounds for this problem.

Thanks,
Rick


Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
06:35:22 PM:

> From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> To: Richard Hillegas/San Francisco/IBM@IBMUS
> Cc: "u...@spark.incubator.apache.org"
> <u...@spark.incubator.apache.org>, "user@spark.apache.org"
> <user@spark.apache.org>
> Date: 11/05/2015 06:35 PM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Richard,

> Thank you for the updates. Do you know tentative timeline for 1.6
> release? Mean while, any workaround solution for this issue?

> Regards,
> Rajesh
>

>
> On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:
> Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648
> . That issue has a couple pull requests but I think that the limited
> bandwidth of the committers still applies.
>
> Thanks,
> Rick
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > Cc: "user@spark.apache.org" <user@spark.apache.org>,
> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > Date: 11/05/2015 09:17 AM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> >
> > Hi Rajesh,
> >
> > I think that you may be referring to https://issues.apache.org/jira/
> > browse/SPARK-10909. A pull request on that issue was submitted more
> > than a month ago but it has not been committed. I think that the
> > committers are busy working on issues which were targeted for 1.6
> > and I doubt that they will have the spare cycles to vet that pull
request.
> >
> > Thanks,
> > Rick
> >
> >
> > Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
> > 05:51:29 AM:
> >
> > > From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > > To: "user@spark.apache.org" <user@spark.apache.org>,
> > > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > > Date: 11/05/2015 05:51 AM
> > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > >
> > > Hi,
> >
> > > Is this issue fixed in 1.5.1 version?
> >
> > > Regards,
> > > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas

Or you may be referring to
https://issues.apache.org/jira/browse/SPARK-10648. That issue has a couple
pull requests but I think that the limited bandwidth of the committers
still applies.

Thanks,
Rick


Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:

> From: Richard Hillegas/San Francisco/IBM@IBMUS
> To: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>,
> "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> Date: 11/05/2015 09:17 AM
> Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi Rajesh,
>
> I think that you may be referring to https://issues.apache.org/jira/
> browse/SPARK-10909. A pull request on that issue was submitted more
> than a month ago but it has not been committed. I think that the
> committers are busy working on issues which were targeted for 1.6
> and I doubt that they will have the spare cycles to vet that pull
request.
>
> Thanks,
> Rick
>
>
> Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
> 05:51:29 AM:
>
> > From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > To: "user@spark.apache.org" <user@spark.apache.org>,
> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > Date: 11/05/2015 05:51 AM
> > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> >
> > Hi,
>
> > Is this issue fixed in 1.5.1 version?
>
> > Regards,
> > Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Richard Hillegas

Hi Rajesh,

I think that you may be referring to
https://issues.apache.org/jira/browse/SPARK-10909. A pull request on that
issue was submitted more than a month ago but it has not been committed. I
think that the committers are busy working on issues which were targeted
for 1.6 and I doubt that they will have the spare cycles to vet that pull
request.

Thanks,
Rick


Madabhattula Rajesh Kumar  wrote on 11/05/2015
05:51:29 AM:

> From: Madabhattula Rajesh Kumar 
> To: "user@spark.apache.org" ,
> "u...@spark.incubator.apache.org" 
> Date: 11/05/2015 05:51 AM
> Subject: Spark sql jdbc fails for Oracle NUMBER type columns
>
> Hi,

> Is this issue fixed in 1.5.1 version?

> Regards,
> Rajesh

Re: Spark sql jdbc fails for Oracle NUMBER type columns

2015-11-05 Thread Madabhattula Rajesh Kumar
Hi Richard,

Thank you for the updates. Do you know tentative timeline for 1.6 release?
Mean while, any workaround solution for this issue?

Regards,
Rajesh



On Thu, Nov 5, 2015 at 10:57 PM, Richard Hillegas <rhil...@us.ibm.com>
wrote:

> Or you may be referring to
> https://issues.apache.org/jira/browse/SPARK-10648. That issue has a
> couple pull requests but I think that the limited bandwidth of the
> committers still applies.
>
> Thanks,
> Rick
>
>
> Richard Hillegas/San Francisco/IBM@IBMUS wrote on 11/05/2015 09:16:42 AM:
>
> > From: Richard Hillegas/San Francisco/IBM@IBMUS
> > To: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > Cc: "user@spark.apache.org" <user@spark.apache.org>,
> > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > Date: 11/05/2015 09:17 AM
> > Subject: Re: Spark sql jdbc fails for Oracle NUMBER type columns
>
> >
> > Hi Rajesh,
> >
> > I think that you may be referring to https://issues.apache.org/jira/
> > browse/SPARK-10909. A pull request on that issue was submitted more
> > than a month ago but it has not been committed. I think that the
> > committers are busy working on issues which were targeted for 1.6
> > and I doubt that they will have the spare cycles to vet that pull
> request.
> >
> > Thanks,
> > Rick
> >
> >
> > Madabhattula Rajesh Kumar <mrajaf...@gmail.com> wrote on 11/05/2015
> > 05:51:29 AM:
> >
> > > From: Madabhattula Rajesh Kumar <mrajaf...@gmail.com>
> > > To: "user@spark.apache.org" <user@spark.apache.org>,
> > > "u...@spark.incubator.apache.org" <u...@spark.incubator.apache.org>
> > > Date: 11/05/2015 05:51 AM
> > > Subject: Spark sql jdbc fails for Oracle NUMBER type columns
> > >
> > > Hi,
> >
> > > Is this issue fixed in 1.5.1 version?
> >
> > > Regards,
> > > Rajesh
>
>


Re: Spark SQL JDBC Source data skew

2015-06-25 Thread Sathish Kumaran Vairavelu
Can some one help me here? Please
On Sat, Jun 20, 2015 at 9:54 AM Sathish Kumaran Vairavelu 
vsathishkuma...@gmail.com wrote:

 Hi,

 In Spark SQL JDBC data source there is an option to specify upper/lower
 bound and num of partitions. How Spark handles data distribution, if we do
 not give the upper/lower/num of parititons ? Will all data from the
 external data source skewed up in one executor?

 In many situations, we do not know the upper/lower bound of the underlying
 dataset until the query is executed, so it is not possible to pass
 upper/lower bound values.


 Thanks

 Sathish



Re: Spark SQL JDBC Source Join Error

2015-06-14 Thread Michael Armbrust
Sounds like SPARK-5456 https://issues.apache.org/jira/browse/SPARK-5456.
Which is fixed in Spark 1.4.

On Sun, Jun 14, 2015 at 11:57 AM, Sathish Kumaran Vairavelu 
vsathishkuma...@gmail.com wrote:

 Hello Everyone,

 I pulled 2 different tables from the JDBC source and then joined them
 using the cust_id *decimal* column. A simple join like as below. This
 simple join works perfectly in the database but not in Spark SQL. I am
 importing 2 tables as a data frame/registertemptable and firing sql on top
 of it. Please let me know what could be the error..

 select b.customer_type, sum(a.amount) total_amount from
 customer_activity a,
 account b
 where
 a.cust_id = b.cust_id
 group by b.customer_type

 CastException: java.math.BigDecimal cannot be cast to
 org.apache.spark.sql.types.Decimal

 at
 org.apache.spark.sql.types.Decimal$DecimalIsFractional$.plus(Decimal.scala:330)

 at
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)

 at
 org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)

 at
 org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:83)

 at
 org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)

 at
 org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163)

 at
 org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:147)

 at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

 at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

 at org.apache.spark.scheduler.Task.run(Task.scala:64)

 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)



Re: Spark SQL JDBC Source Join Error

2015-06-14 Thread Sathish Kumaran Vairavelu
Thank you.. it works in Spark 1.4.

On Sun, Jun 14, 2015 at 3:51 PM Michael Armbrust mich...@databricks.com
wrote:

 Sounds like SPARK-5456 https://issues.apache.org/jira/browse/SPARK-5456.
 Which is fixed in Spark 1.4.

 On Sun, Jun 14, 2015 at 11:57 AM, Sathish Kumaran Vairavelu 
 vsathishkuma...@gmail.com wrote:

 Hello Everyone,

 I pulled 2 different tables from the JDBC source and then joined them
 using the cust_id *decimal* column. A simple join like as below. This
 simple join works perfectly in the database but not in Spark SQL. I am
 importing 2 tables as a data frame/registertemptable and firing sql on top
 of it. Please let me know what could be the error..

 select b.customer_type, sum(a.amount) total_amount from
 customer_activity a,
 account b
 where
 a.cust_id = b.cust_id
 group by b.customer_type

 CastException: java.math.BigDecimal cannot be cast to
 org.apache.spark.sql.types.Decimal

 at
 org.apache.spark.sql.types.Decimal$DecimalIsFractional$.plus(Decimal.scala:330)

 at
 org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)

 at
 org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)

 at
 org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:83)

 at
 org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)

 at
 org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:163)

 at
 org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:147)

 at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

 at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)

 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

 at
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)

 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)

 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

 at org.apache.spark.scheduler.Task.run(Task.scala:64)

 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)

 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)





Re: Spark-SQL JDBC driver

2014-12-14 Thread Michael Armbrust
I'll add that there is an experimental method that allows you to start the
JDBC server with an existing HiveContext (which might have registered
temporary tables).

https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42


On Thu, Dec 11, 2014 at 6:52 AM, Denny Lee denny.g@gmail.com wrote:

 Yes, that is correct. A quick reference on this is the post
 https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
 with the pertinent section being:

 It is important to note that when you create Spark tables (for example,
 via the .registerTempTable) these are operating within the Spark
 environment which resides in a separate process than the Hive Metastore.
 This means that currently tables that are created within the Spark context
 are not available through the Thrift server. To achieve this, within the
 Spark context save your temporary table into Hive - then the Spark Thrift
 Server will be able to see the table.

 HTH!


 On Thu, Dec 11, 2014 at 04:09 Anas Mosaad anas.mos...@incorta.com wrote:

 Actually I came to a conclusion that RDDs has to be persisted in hive in
 order to be able to access through thrift.
 Hope I didn't end up with incorrect conclusion.
 Please someone correct me if I am wrong.
 On Dec 11, 2014 8:53 AM, Judy Nash judyn...@exchange.microsoft.com
 wrote:

  Looks like you are wondering why you cannot see the RDD table you have
 created via thrift?



 Based on my own experience with spark 1.1, RDD created directly via
 Spark SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift,
 since thrift has its own session containing its own RDD.

 Spark SQL experts on the forum can confirm on this though.



 *From:* Cheng Lian [mailto:lian.cs@gmail.com]
 *Sent:* Tuesday, December 9, 2014 6:42 AM
 *To:* Anas Mosaad
 *Cc:* Judy Nash; user@spark.apache.org
 *Subject:* Re: Spark-SQL JDBC driver



 According to the stacktrace, you were still using SQLContext rather than
 HiveContext. To interact with Hive, HiveContext *must* be used.

 Please refer to this page
 http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

  On 12/9/14 6:26 PM, Anas Mosaad wrote:

  Back to the first question, this will mandate that hive is up and
 running?



 When I try it, I get the following exception. The documentation says
 that this method works only on SchemaRDD. I though that
 countries.saveAsTable did not work for that a reason so I created a tmp
 that contains the results from the registered temp table. Which I could
 validate that it's a SchemaRDD as shown below.




 * @Judy,* I do really appreciate your kind support and I want to
 understand and off course don't want to wast your time. If you can direct
 me the documentation describing this details, this will be great.



 scala val tmp = sqlContext.sql(select * from countries)

 tmp: org.apache.spark.sql.SchemaRDD =

 SchemaRDD[12] at RDD at SchemaRDD.scala:108

 == Query Plan ==

 == Physical Plan ==

 PhysicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 scala tmp.saveAsTable(Countries)

 org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
 Unresolved plan found, tree:

 'CreateTableAsSelect None, Countries, false, None

  Project
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

   Subquery countries

LogicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

 at
 scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51

RE: Spark-SQL JDBC driver

2014-12-11 Thread Anas Mosaad
Actually I came to a conclusion that RDDs has to be persisted in hive in
order to be able to access through thrift.
Hope I didn't end up with incorrect conclusion.
Please someone correct me if I am wrong.
On Dec 11, 2014 8:53 AM, Judy Nash judyn...@exchange.microsoft.com
wrote:

  Looks like you are wondering why you cannot see the RDD table you have
 created via thrift?



 Based on my own experience with spark 1.1, RDD created directly via Spark
 SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since
 thrift has its own session containing its own RDD.

 Spark SQL experts on the forum can confirm on this though.



 *From:* Cheng Lian [mailto:lian.cs@gmail.com]
 *Sent:* Tuesday, December 9, 2014 6:42 AM
 *To:* Anas Mosaad
 *Cc:* Judy Nash; user@spark.apache.org
 *Subject:* Re: Spark-SQL JDBC driver



 According to the stacktrace, you were still using SQLContext rather than
 HiveContext. To interact with Hive, HiveContext *must* be used.

 Please refer to this page
 http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

  On 12/9/14 6:26 PM, Anas Mosaad wrote:

  Back to the first question, this will mandate that hive is up and
 running?



 When I try it, I get the following exception. The documentation says that
 this method works only on SchemaRDD. I though that countries.saveAsTable
 did not work for that a reason so I created a tmp that contains the results
 from the registered temp table. Which I could validate that it's a
 SchemaRDD as shown below.




 * @Judy,* I do really appreciate your kind support and I want to
 understand and off course don't want to wast your time. If you can direct
 me the documentation describing this details, this will be great.



 scala val tmp = sqlContext.sql(select * from countries)

 tmp: org.apache.spark.sql.SchemaRDD =

 SchemaRDD[12] at RDD at SchemaRDD.scala:108

 == Query Plan ==

 == Physical Plan ==

 PhysicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 scala tmp.saveAsTable(Countries)

 org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
 plan found, tree:

 'CreateTableAsSelect None, Countries, false, None

  Project
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

   Subquery countries

LogicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

 at
 scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

 at
 scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

 at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

 at scala.collection.immutable.List.foreach(List.scala:318)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

 at
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

 at
 org.apache.spark.sql.SQLContext

Re: Spark-SQL JDBC driver

2014-12-11 Thread Denny Lee
Yes, that is correct. A quick reference on this is the post
https://www.linkedin.com/pulse/20141007143323-732459-an-absolutely-unofficial-way-to-connect-tableau-to-sparksql-spark-1-1?_mSplash=1
with the pertinent section being:

It is important to note that when you create Spark tables (for example, via
the .registerTempTable) these are operating within the Spark environment
which resides in a separate process than the Hive Metastore. This means
that currently tables that are created within the Spark context are not
available through the Thrift server. To achieve this, within the Spark
context save your temporary table into Hive - then the Spark Thrift Server
will be able to see the table.

HTH!

On Thu, Dec 11, 2014 at 04:09 Anas Mosaad anas.mos...@incorta.com wrote:

 Actually I came to a conclusion that RDDs has to be persisted in hive in
 order to be able to access through thrift.
 Hope I didn't end up with incorrect conclusion.
 Please someone correct me if I am wrong.
 On Dec 11, 2014 8:53 AM, Judy Nash judyn...@exchange.microsoft.com
 wrote:

  Looks like you are wondering why you cannot see the RDD table you have
 created via thrift?



 Based on my own experience with spark 1.1, RDD created directly via Spark
 SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since
 thrift has its own session containing its own RDD.

 Spark SQL experts on the forum can confirm on this though.



 *From:* Cheng Lian [mailto:lian.cs@gmail.com]
 *Sent:* Tuesday, December 9, 2014 6:42 AM
 *To:* Anas Mosaad
 *Cc:* Judy Nash; user@spark.apache.org
 *Subject:* Re: Spark-SQL JDBC driver



 According to the stacktrace, you were still using SQLContext rather than
 HiveContext. To interact with Hive, HiveContext *must* be used.

 Please refer to this page
 http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

  On 12/9/14 6:26 PM, Anas Mosaad wrote:

  Back to the first question, this will mandate that hive is up and
 running?



 When I try it, I get the following exception. The documentation says that
 this method works only on SchemaRDD. I though that countries.saveAsTable
 did not work for that a reason so I created a tmp that contains the results
 from the registered temp table. Which I could validate that it's a
 SchemaRDD as shown below.




 * @Judy,* I do really appreciate your kind support and I want to
 understand and off course don't want to wast your time. If you can direct
 me the documentation describing this details, this will be great.



 scala val tmp = sqlContext.sql(select * from countries)

 tmp: org.apache.spark.sql.SchemaRDD =

 SchemaRDD[12] at RDD at SchemaRDD.scala:108

 == Query Plan ==

 == Physical Plan ==

 PhysicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 scala tmp.saveAsTable(Countries)

 org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
 Unresolved plan found, tree:

 'CreateTableAsSelect None, Countries, false, None

  Project
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

   Subquery countries

LogicalRDD
 [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

 at
 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

 at
 org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

 at
 scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

 at
 scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

 at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

 at
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

 at scala.collection.immutable.List.foreach(List.scala:318

RE: Spark-SQL JDBC driver

2014-12-10 Thread Judy Nash
Looks like you are wondering why you cannot see the RDD table you have created 
via thrift?

Based on my own experience with spark 1.1, RDD created directly via Spark SQL 
(i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since thrift has 
its own session containing its own RDD.
Spark SQL experts on the forum can confirm on this though.

From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Tuesday, December 9, 2014 6:42 AM
To: Anas Mosaad
Cc: Judy Nash; user@spark.apache.org
Subject: Re: Spark-SQL JDBC driver

According to the stacktrace, you were still using SQLContext rather than 
HiveContext. To interact with Hive, HiveContext *must* be used.

Please refer to this page 
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

On 12/9/14 6:26 PM, Anas Mosaad wrote:
Back to the first question, this will mandate that hive is up and running?

When I try it, I get the following exception. The documentation says that this 
method works only on SchemaRDD. I though that countries.saveAsTable did not 
work for that a reason so I created a tmp that contains the results from the 
registered temp table. Which I could validate that it's a SchemaRDD as shown 
below.


@Judy, I do really appreciate your kind support and I want to understand and 
off course don't want to wast your time. If you can direct me the documentation 
describing this details, this will be great.


scala val tmp = sqlContext.sql(select * from countries)

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==

PhysicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



scala tmp.saveAsTable(Countries)

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved plan 
found, tree:

'CreateTableAsSelect None, Countries, false, None

 Project 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

  Subquery countries

   LogicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
 MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)

at org.apache.spark.sql.SQLContext

Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Thanks Judy, this is exactly what I'm looking for. However, and plz forgive
me if it's a dump question is: It seems to me that thrift is the same as
hive2 JDBC driver, does this mean that starting thrift will start hive as
well on the server?

On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash judyn...@exchange.microsoft.com
wrote:

  You can use thrift server for this purpose then test it with beeline.



 See doc:


 https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server





 *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
 *Sent:* Monday, December 8, 2014 11:01 AM
 *To:* user@spark.apache.org
 *Subject:* Spark-SQL JDBC driver



 Hello Everyone,



 I'm brand new to spark and was wondering if there's a JDBC driver to
 access spark-SQL directly. I'm running spark in standalone mode and don't
 have hadoop in this environment.



 --



 *Best Regards/أطيب المنى,*



 *Anas Mosaad*






-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*


Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian
Essentially, the Spark SQL JDBC Thrift server is just a Spark port of 
HiveServer2. You don't need to run Hive, but you do need a working 
Metastore.


On 12/9/14 3:59 PM, Anas Mosaad wrote:
Thanks Judy, this is exactly what I'm looking for. However, and plz 
forgive me if it's a dump question is: It seems to me that thrift is 
the same as hive2 JDBC driver, does this mean that starting thrift 
will start hive as well on the server?


On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash 
judyn...@exchange.microsoft.com 
mailto:judyn...@exchange.microsoft.com wrote:


You can use thrift server for this purpose then test it with beeline.

See doc:


https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

*From:*Anas Mosaad [mailto:anas.mos...@incorta.com
mailto:anas.mos...@incorta.com]
*Sent:* Monday, December 8, 2014 11:01 AM
*To:* user@spark.apache.org mailto:user@spark.apache.org
*Subject:* Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC driver
to access spark-SQL directly. I'm running spark in standalone mode
and don't have hadoop in this environment.

-- 


*Best Regards/أطيب المنى,*

*Anas Mosaad*




--

*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*




Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Thanks Cheng,

I thought spark-sql is using the same exact metastore, right? However, it
didn't work as expected. Here's what I did.

In spark-shell, I loaded a csv files and registered the table, say
countries.
Started the thrift server.
Connected using beeline. When I run show tables or !tables, I get empty
list of tables as follow:

*0: jdbc:hive2://localhost:1 !tables*

*++--+-+-+--+*

*| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |*

*++--+-+-+--+*

*++--+-+-+--+*

*0: jdbc:hive2://localhost:1 show tables ;*

*+-+*

*| result  |*

*+-+*

*+-+*

*No rows selected (0.106 seconds)*

*0: jdbc:hive2://localhost:1 *



Kindly advice, what am I missing? I want to read the RDD using SQL from
outside spark-shell (i.e. like any other relational database)


On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian lian.cs@gmail.com wrote:

  Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
 HiveServer2. You don't need to run Hive, but you do need a working
 Metastore.


 On 12/9/14 3:59 PM, Anas Mosaad wrote:

 Thanks Judy, this is exactly what I'm looking for. However, and plz
 forgive me if it's a dump question is: It seems to me that thrift is the
 same as hive2 JDBC driver, does this mean that starting thrift will start
 hive as well on the server?

 On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash judyn...@exchange.microsoft.com
  wrote:

  You can use thrift server for this purpose then test it with beeline.



 See doc:


 https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server





 *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
 *Sent:* Monday, December 8, 2014 11:01 AM
 *To:* user@spark.apache.org
 *Subject:* Spark-SQL JDBC driver



 Hello Everyone,



 I'm brand new to spark and was wondering if there's a JDBC driver to
 access spark-SQL directly. I'm running spark in standalone mode and don't
 have hadoop in this environment.



 --



 *Best Regards/أطيب المنى,*



 *Anas Mosaad*






  --

 *Best Regards/أطيب المنى,*

  *Anas Mosaad*
 *Incorta Inc.*
 *+20-100-743-4510*





-- 

*Best Regards/أطيب المنى,*

*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*


Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian

How did you register the table under spark-shell? Two things to notice:

1. To interact with Hive, HiveContext instead of SQLContext must be used.
2. `registerTempTable` doesn't persist the table into Hive metastore, 
and the table is lost after quitting spark-shell. Instead, you must use 
`saveAsTable`.


On 12/9/14 5:27 PM, Anas Mosaad wrote:

Thanks Cheng,

I thought spark-sql is using the same exact metastore, right? However, 
it didn't work as expected. Here's what I did.


In spark-shell, I loaded a csv files and registered the table, say 
countries.

Started the thrift server.
Connected using beeline. When I run show tables or !tables, I get 
empty list of tables as follow:


/0: jdbc:hive2://localhost:1 !tables/

/++--+-+-+--+/

/| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |/

/++--+-+-+--+/

/++--+-+-+--+/

/0: jdbc:hive2://localhost:1 show tables ;/

/+-+/

/| result  |/

/+-+/

/+-+/

/No rows selected (0.106 seconds)/

/0: jdbc:hive2://localhost:1 /



Kindly advice, what am I missing? I want to read the RDD using SQL 
from outside spark-shell (i.e. like any other relational database)



On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian lian.cs@gmail.com 
mailto:lian.cs@gmail.com wrote:


Essentially, the Spark SQL JDBC Thrift server is just a Spark port
of HiveServer2. You don't need to run Hive, but you do need a
working Metastore.


On 12/9/14 3:59 PM, Anas Mosaad wrote:

Thanks Judy, this is exactly what I'm looking for. However, and
plz forgive me if it's a dump question is: It seems to me that
thrift is the same as hive2 JDBC driver, does this mean that
starting thrift will start hive as well on the server?

On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash
judyn...@exchange.microsoft.com
mailto:judyn...@exchange.microsoft.com wrote:

You can use thrift server for this purpose then test it with
beeline.

See doc:


https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

*From:*Anas Mosaad [mailto:anas.mos...@incorta.com
mailto:anas.mos...@incorta.com]
*Sent:* Monday, December 8, 2014 11:01 AM
*To:* user@spark.apache.org mailto:user@spark.apache.org
*Subject:* Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC
driver to access spark-SQL directly. I'm running spark in
standalone mode and don't have hadoop in this environment.

-- 


*Best Regards/أطيب المنى,*

*Anas Mosaad*




-- 


*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*





--

*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*




Re: Spark-SQL JDBC driver

2014-12-09 Thread Anas Mosaad
Back to the first question, this will mandate that hive is up and running?

When I try it, I get the following exception. The documentation says that
this method works only on SchemaRDD. I though that countries.saveAsTable
did not work for that a reason so I created a tmp that contains the results
from the registered temp table. Which I could validate that it's a
SchemaRDD as shown below.


*@Judy,* I do really appreciate your kind support and I want to understand
and off course don't want to wast your time. If you can direct me the
documentation describing this details, this will be great.

scala val tmp = sqlContext.sql(select * from countries)

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==

PhysicalRDD
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36


scala tmp.saveAsTable(Countries)

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
plan found, tree:

'CreateTableAsSelect None, Countries, false, None

 Project
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]

  Subquery countries

   LogicalRDD
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36


 at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)

at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)

at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)

at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)

at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)

at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)

at
org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)

at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)

at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)

at $iwC$$iwC$$iwC$$iwC.init(console:27)

at $iwC$$iwC$$iwC.init(console:29)

at $iwC$$iwC.init(console:31)

at $iwC.init(console:33)

at init(console:35)

at .init(console:39)

at .clinit(console)

at .init(console:7)

at .clinit(console)

at $print(console)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at 

Re: Spark-SQL JDBC driver

2014-12-09 Thread Cheng Lian
According to the stacktrace, you were still using SQLContext rather than 
HiveContext. To interact with Hive, HiveContext *must* be used.


Please refer to this page 
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables



On 12/9/14 6:26 PM, Anas Mosaad wrote:
Back to the first question,**this will mandate that hive is up and 
running?


When I try it, I get the following exception. The documentation says 
that this method works only on SchemaRDD. I though that 
countries.saveAsTable did not work for that a reason so I created a 
tmp that contains the results from the registered temp table. Which I 
could validate that it's a SchemaRDD as shown below.


*
@Judy,* I do really appreciate your kind support and I want to 
understand and off course don't want to wast your time. If you can 
direct me the documentation describing this details, this will be great.


scala val tmp = sqlContext.sql(select * from countries)

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==

PhysicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], 
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



scala tmp.saveAsTable(Countries)

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Unresolved plan found, tree:


'CreateTableAsSelect None, Countries, false, None

 Project 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]


  Subquery countries

   LogicalRDD 
[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], 
MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36



at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)


at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)


at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)


at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)


at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)


at 
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)


at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)


at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)


at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)


at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)


at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)


at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)


at scala.collection.immutable.List.foreach(List.scala:318)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)


at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)


at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)


at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)


at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)


at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)


at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)


at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)


at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)


at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)


at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)


at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)


at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)


at 
org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)


at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)

at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:22)

at $iwC$$iwC$$iwC$$iwC.init(console:27)

at $iwC$$iwC$$iwC.init(console:29)

at $iwC$$iwC.init(console:31)

at $iwC.init(console:33)

at init(console:35)

at .init(console:39)

at .clinit(console)

at .init(console:7)

at .clinit(console)

at $print(console)

RE: Spark-SQL JDBC driver

2014-12-08 Thread Judy Nash
You can use thrift server for this purpose then test it with beeline.

See doc:
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server


From: Anas Mosaad [mailto:anas.mos...@incorta.com]
Sent: Monday, December 8, 2014 11:01 AM
To: user@spark.apache.org
Subject: Spark-SQL JDBC driver

Hello Everyone,

I'm brand new to spark and was wondering if there's a JDBC driver to access 
spark-SQL directly. I'm running spark in standalone mode and don't have hadoop 
in this environment.

--

Best Regards/أطيب المنى,

Anas Mosaad



Re: Spark SQL JDBC

2014-09-11 Thread alexandria1101
Even when I comment out those 3 lines, I still get the same error.  Did
someone solve this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that
the datanucleus jars were generated within lib_managed?  I remembered
having to do that when I was working testing out different configs.

On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 
alexandria.shea...@gmail.com wrote:

 Even when I comment out those 3 lines, I still get the same error.  Did
 someone solve this?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: Spark SQL JDBC

2014-09-11 Thread Cheng, Hao
I copied the 3 datanucleus jars (datanucleus-api-jdo-3.2.1.jar, 
datanucleus-core-3.2.2.jar, datanucleus-rdbms-3.2.1.jar) to the fold lib/ 
manually, and it works for me.

From: Denny Lee [mailto:denny.g@gmail.com]
Sent: Friday, September 12, 2014 11:28 AM
To: alexandria1101
Cc: u...@spark.incubator.apache.org
Subject: Re: Spark SQL JDBC

When you re-ran sbt did you clear out the packages first and ensure that the 
datanucleus jars were generated within lib_managed?  I remembered having to do 
that when I was working testing out different configs.

On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 
alexandria.shea...@gmail.commailto:alexandria.shea...@gmail.com wrote:
Even when I comment out those 3 lines, I still get the same error.  Did
someone solve this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



Re: Spark SQL JDBC

2014-08-13 Thread Cheng Lian
Oh, thanks for reporting this. This should be a bug since SPARK_HIVE was
deprecated, we shouldn’t rely on it any more.
​


On Wed, Aug 13, 2014 at 1:23 PM, ZHENG, Xu-dong dong...@gmail.com wrote:

 Just find this is because below lines in make_distribution.sh doesn't work:

 if [ $SPARK_HIVE == true ]; then
   cp $FWDIR/lib_managed/jars/datanucleus*.jar $DISTDIR/lib/
 fi

 There is no definition of $SPARK_HIVE in make_distribution.sh. I should
 set it explicitly.



 On Wed, Aug 13, 2014 at 1:10 PM, ZHENG, Xu-dong dong...@gmail.com wrote:

 Hi Cheng,

 I also meet some issues when I try to start ThriftServer based a build
 from master branch (I could successfully run it from the branch-1.0-jdbc
 branch). Below is my build command:

 ./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
 -Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver

 And below are the printed errors:

 ERROR CompositeService: Error starting services HiveServer2
 org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
 at
 org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
 at
 org.apache.hive.service.CompositeService.start(CompositeService.java:70)
 at
 org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
 at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
 at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
  at
 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: javax.jdo.JDOFatalUserException: Class
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
 NestedThrowables:
 java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory
 at
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
 at
 javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
 at
 javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at
 org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:64)
 at
 org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:286)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:121)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:104)
 at
 org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
 ... 11 more
 Caused by: java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at 

Re: Spark SQL JDBC

2014-08-12 Thread John Omernik
Yin helped me with that, and I appreciate the onlist followup.  A few
questions: Why is this the case?  I guess, does building it with
thriftserver add much more time/size to the final build? It seems that
unless documented well, people will miss that and this situation would
occur, why would we not just build the thrift server in? (I am not a
programming expert, and not trying to judge the decision to have it in a
separate profile, I would just like to understand why it'd done that way)




On Mon, Aug 11, 2014 at 11:47 AM, Cheng Lian lian.cs@gmail.com wrote:

 Hi John, the JDBC Thrift server resides in its own build profile and need
 to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.
 ​


 On Tue, Aug 5, 2014 at 4:54 AM, John Omernik j...@omernik.com wrote:

 I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with
 the JDBC thrift server.  I have everything compiled correctly, I can access
 data in spark-shell on yarn from my hive installation. Cached tables, etc
 all work.

 When I execute ./sbin/start-thriftserver.sh

 I get the error below. Shouldn't it just ready my spark-env? I guess I am
 lost on how to make this work.

 Thanks1

 $ ./start-thriftserver.sh


 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath

 Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:270)

 at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)





Re: Spark SQL JDBC

2014-08-12 Thread Michael Armbrust
Hive pulls in a ton of dependencies that we were afraid would break
existing spark applications.  For this reason all hive submodules are
optional.


On Tue, Aug 12, 2014 at 7:43 AM, John Omernik j...@omernik.com wrote:

 Yin helped me with that, and I appreciate the onlist followup.  A few
 questions: Why is this the case?  I guess, does building it with
 thriftserver add much more time/size to the final build? It seems that
 unless documented well, people will miss that and this situation would
 occur, why would we not just build the thrift server in? (I am not a
 programming expert, and not trying to judge the decision to have it in a
 separate profile, I would just like to understand why it'd done that way)




 On Mon, Aug 11, 2014 at 11:47 AM, Cheng Lian lian.cs@gmail.com
 wrote:

 Hi John, the JDBC Thrift server resides in its own build profile and need
 to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.
 ​


 On Tue, Aug 5, 2014 at 4:54 AM, John Omernik j...@omernik.com wrote:

 I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar
 with the JDBC thrift server.  I have everything compiled correctly, I can
 access data in spark-shell on yarn from my hive installation. Cached
 tables, etc all work.

 When I execute ./sbin/start-thriftserver.sh

 I get the error below. Shouldn't it just ready my spark-env? I guess I
 am lost on how to make this work.

 Thanks1

 $ ./start-thriftserver.sh


 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath

 Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:270)

 at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)






Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
Hi Cheng,

I also meet some issues when I try to start ThriftServer based a build from
master branch (I could successfully run it from the branch-1.0-jdbc
branch). Below is my build command:

./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
-Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver

And below are the printed errors:

ERROR CompositeService: Error starting services HiveServer2
org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
at org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
at
org.apache.hive.service.CompositeService.start(CompositeService.java:70)
at
org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
at
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: javax.jdo.JDOFatalUserException: Class
org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at
javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
at
org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
at
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
at
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at
org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:64)
at
org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:286)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at
org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:121)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:104)
at org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
... 11 more
Caused by: java.lang.ClassNotFoundException:
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
at
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1162)
... 32 more
14/08/13 13:08:48 INFO AbstractService: Service:OperationManager is stopped.
14/08/13 13:08:48 INFO AbstractService: Service:SessionManager is stopped.
14/08/13 13:08:48 INFO AbstractService: Service:CLIService is stopped.
14/08/13 13:08:48 ERROR 

Re: Spark SQL JDBC

2014-08-12 Thread ZHENG, Xu-dong
Just find this is because below lines in make_distribution.sh doesn't work:

if [ $SPARK_HIVE == true ]; then
  cp $FWDIR/lib_managed/jars/datanucleus*.jar $DISTDIR/lib/
fi

There is no definition of $SPARK_HIVE in make_distribution.sh. I should set
it explicitly.



On Wed, Aug 13, 2014 at 1:10 PM, ZHENG, Xu-dong dong...@gmail.com wrote:

 Hi Cheng,

 I also meet some issues when I try to start ThriftServer based a build
 from master branch (I could successfully run it from the branch-1.0-jdbc
 branch). Below is my build command:

 ./make-distribution.sh --skip-java-test -Phadoop-2.4 -Phive -Pyarn
 -Dyarn.version=2.4.0 -Dhadoop.version=2.4.0 -Phive-thriftserver

 And below are the printed errors:

 ERROR CompositeService: Error starting services HiveServer2
 org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
 at org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
 at
 org.apache.hive.service.CompositeService.start(CompositeService.java:70)
 at
 org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
 at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:71)
 at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:314)
  at
 org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: javax.jdo.JDOFatalUserException: Class
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
 NestedThrowables:
 java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory
 at
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
 at
 javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
 at
 javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at
 org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:64)
 at
 org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:286)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:121)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:104)
 at org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
 ... 11 more
 Caused by: java.lang.ClassNotFoundException:
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
 at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
 at java.security.AccessController.doPrivileged(Native 

Re: Spark SQL JDBC

2014-08-11 Thread Cheng Lian
Hi John, the JDBC Thrift server resides in its own build profile and need
to be enabled explicitly by ./sbt/sbt -Phive-thriftserver assembly.
​


On Tue, Aug 5, 2014 at 4:54 AM, John Omernik j...@omernik.com wrote:

 I am using spark-1.1.0-SNAPSHOT right now and trying to get familiar with
 the JDBC thrift server.  I have everything compiled correctly, I can access
 data in spark-shell on yarn from my hive installation. Cached tables, etc
 all work.

 When I execute ./sbin/start-thriftserver.sh

 I get the error below. Shouldn't it just ready my spark-env? I guess I am
 lost on how to make this work.

 Thanks1

 $ ./start-thriftserver.sh


 Spark assembly has been built with Hive, including Datanucleus jars on
 classpath

 Exception in thread main java.lang.ClassNotFoundException:
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2

 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

 at java.security.AccessController.doPrivileged(Native Method)

 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 at java.lang.Class.forName0(Native Method)

 at java.lang.Class.forName(Class.java:270)

 at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:311)

 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:73)

 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Venkat Subramanian
For the time being, we decided to take a different route. We created a Rest
API layer in our app and allowed SQL query passing via the Rest. Internally
we pass that query to the SparkSQL layer on the RDD and return back the
results. With this Spark SQL is supported for our RDDs via this rest API
now. It is easy to do this and took a just a few hours and it works for our
use case. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Michael Armbrust
Very cool.  Glad you found a solution that works.


On Wed, Jul 30, 2014 at 1:04 PM, Venkat Subramanian vsubr...@gmail.com
wrote:

 For the time being, we decided to take a different route. We created a Rest
 API layer in our app and allowed SQL query passing via the Rest. Internally
 we pass that query to the SparkSQL layer on the RDD and return back the
 results. With this Spark SQL is supported for our RDDs via this rest API
 now. It is easy to do this and took a just a few hours and it works for our
 use case.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Venkat Subramanian
1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed.

This is not possible out of the box with Shark.  If you look at the code for
SharkServer2 though, you'll see that its just a standard HiveContext under
the covers.  If you modify this startup code, any SchemaRDD you register as
a table in this context will be exposed over JDBC.

[Venkat] Are you saying - pull in the SharkServer2 code in my standalone
spark application (as a part of the standalone application process), pass in
the spark context of the standalone app to SharkServer2 Sparkcontext at
startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
Standalone app that are exposed as tables? Thanks for the clarification.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p7264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Michael Armbrust
 [Venkat] Are you saying - pull in the SharkServer2 code in my standalone
  spark application (as a part of the standalone application process), pass
 in
 the spark context of the standalone app to SharkServer2 Sparkcontext at
 startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
 Standalone app that are exposed as tables? Thanks for the clarification.


Yeah, thats should work although it is pretty hacky and is not officially
supported.  It might be interesting to augment Shark to allow the user to
invoke custom applications using the same SQLContext.  If this is something
you'd have time to implement I'd be happy to discuss the design further.


Re: Spark SQL JDBC Connectivity

2014-05-29 Thread Michael Armbrust
On Wed, May 28, 2014 at 11:39 PM, Venkat Subramanian vsubr...@gmail.comwrote:

 We are planning to use the latest Spark SQL on RDDs. If a third party
 application wants to connect to Spark via JDBC, does Spark SQL have
 support?
 (We want to avoid going though Shark/Hive JDBC layer as we need good
 performance).


 We don't have a full release yet, but there is a branch on the Shark
github repository that has a version of SharkServer2 that uses Spark SQL.
 We also plan to port the Shark CLI, but this is not yet finished.  You can
find this branch along with documentation here:
https://github.com/amplab/shark/tree/sparkSql

Note that this version has not yet received much testing (outside of the
integration tests that are run on Spark SQL).  That said, I would love for
people to test it out and report any problems or missing features.  Any
help here would be greatly appreciated!


 BTW, we also want to do the same for Spark Streaming - With Spark SQL work
 on DStreams (since the underlying structure is RDD anyway) and can we
 expose
 the streaming DStream RDD through JDBC via Spark SQL for Realtime
 analytics.


 We have talked about doing this, but this is not currently on the near
term road map.


Re: Spark SQL JDBC Connectivity and more

2014-05-29 Thread Venkat Subramanian
Thanks Michael.
OK will try SharkServer2..

But I have some basic questions on a related area:

1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed. 

 I have stylized the real problem we have which is, we have a standalone
spark application that is processing DStreams and producing output Dstreams.
I want to expose that near real-time Dstream data to a 3 rd party app via
JDBC and allow SharkServer2 CLI to operate and query on the Dstreams
real-time all from memory. Currently we are writing output stream to
Cassandra and exposing it to 3 rd party app through it via JDBC, but want to
avoid that extra disk write which increases latency.

2) I have two applications, one used for processing and computing output RDD
from an input and another for post processing the resultant RDD into
multiple persistent stores + doing other things with it.  These are split in
to separate processes intentionally. How do we share the output RDD from
first application to second application without writing to disk (thinking of
serializing the RDD and streaming through Kafka, but then we loose time and
all the fault tolerance that RDD brings in)? Is Tachyon the only other way?
Are there other models/design patterns for applications that share RDDs, as
this may be a very common use case?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p6543.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.