Spark SQL -JDBC connectivity

2016-08-09 Thread Soni spark
Hi,

I would to know the steps to connect SPARK SQL from spring framework
(Web-UI).
also how to run and deploy the web application?


Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Venkat Subramanian
For the time being, we decided to take a different route. We created a Rest
API layer in our app and allowed SQL query passing via the Rest. Internally
we pass that query to the SparkSQL layer on the RDD and return back the
results. With this Spark SQL is supported for our RDDs via this rest API
now. It is easy to do this and took a just a few hours and it works for our
use case. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark SQL JDBC Connectivity

2014-07-30 Thread Michael Armbrust
Very cool.  Glad you found a solution that works.


On Wed, Jul 30, 2014 at 1:04 PM, Venkat Subramanian vsubr...@gmail.com
wrote:

 For the time being, we decided to take a different route. We created a Rest
 API layer in our app and allowed SQL query passing via the Rest. Internally
 we pass that query to the SparkSQL layer on the RDD and return back the
 results. With this Spark SQL is supported for our RDDs via this rest API
 now. It is easy to do this and took a just a few hours and it works for our
 use case.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p10986.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Venkat Subramanian
1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed.

This is not possible out of the box with Shark.  If you look at the code for
SharkServer2 though, you'll see that its just a standard HiveContext under
the covers.  If you modify this startup code, any SchemaRDD you register as
a table in this context will be exposed over JDBC.

[Venkat] Are you saying - pull in the SharkServer2 code in my standalone
spark application (as a part of the standalone application process), pass in
the spark context of the standalone app to SharkServer2 Sparkcontext at
startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
Standalone app that are exposed as tables? Thanks for the clarification.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p7264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark SQL JDBC Connectivity and more

2014-06-09 Thread Michael Armbrust
 [Venkat] Are you saying - pull in the SharkServer2 code in my standalone
  spark application (as a part of the standalone application process), pass
 in
 the spark context of the standalone app to SharkServer2 Sparkcontext at
 startup and viola we get a SQL/JDBC interfaces for the RDDs   of the
 Standalone app that are exposed as tables? Thanks for the clarification.


Yeah, thats should work although it is pretty hacky and is not officially
supported.  It might be interesting to augment Shark to allow the user to
invoke custom applications using the same SQLContext.  If this is something
you'd have time to implement I'd be happy to discuss the design further.


Spark SQL JDBC Connectivity

2014-05-29 Thread Venkat Subramanian
We are planning to use the latest Spark SQL on RDDs. If a third party
application wants to connect to Spark via JDBC, does Spark SQL have support?
(We want to avoid going though Shark/Hive JDBC layer as we need good
performance).

BTW, we also want to do the same for Spark Streaming - With Spark SQL work
on DStreams (since the underlying structure is RDD anyway) and can we expose
the streaming DStream RDD through JDBC via Spark SQL for Realtime analytics.

Any pointers on this will greatly help.

Regards,

Venkat



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark SQL JDBC Connectivity

2014-05-29 Thread Michael Armbrust
On Wed, May 28, 2014 at 11:39 PM, Venkat Subramanian vsubr...@gmail.comwrote:

 We are planning to use the latest Spark SQL on RDDs. If a third party
 application wants to connect to Spark via JDBC, does Spark SQL have
 support?
 (We want to avoid going though Shark/Hive JDBC layer as we need good
 performance).


 We don't have a full release yet, but there is a branch on the Shark
github repository that has a version of SharkServer2 that uses Spark SQL.
 We also plan to port the Shark CLI, but this is not yet finished.  You can
find this branch along with documentation here:
https://github.com/amplab/shark/tree/sparkSql

Note that this version has not yet received much testing (outside of the
integration tests that are run on Spark SQL).  That said, I would love for
people to test it out and report any problems or missing features.  Any
help here would be greatly appreciated!


 BTW, we also want to do the same for Spark Streaming - With Spark SQL work
 on DStreams (since the underlying structure is RDD anyway) and can we
 expose
 the streaming DStream RDD through JDBC via Spark SQL for Realtime
 analytics.


 We have talked about doing this, but this is not currently on the near
term road map.


Re: Spark SQL JDBC Connectivity and more

2014-05-29 Thread Venkat Subramanian
Thanks Michael.
OK will try SharkServer2..

But I have some basic questions on a related area:

1) If I have a standalone spark application that has already built a RDD,
how can SharkServer2 or for that matter Shark access 'that' RDD and do
queries on it. All the examples I have seen for Shark, the RDD (tables) are
created within Shark's spark context and processed. 

 I have stylized the real problem we have which is, we have a standalone
spark application that is processing DStreams and producing output Dstreams.
I want to expose that near real-time Dstream data to a 3 rd party app via
JDBC and allow SharkServer2 CLI to operate and query on the Dstreams
real-time all from memory. Currently we are writing output stream to
Cassandra and exposing it to 3 rd party app through it via JDBC, but want to
avoid that extra disk write which increases latency.

2) I have two applications, one used for processing and computing output RDD
from an input and another for post processing the resultant RDD into
multiple persistent stores + doing other things with it.  These are split in
to separate processes intentionally. How do we share the output RDD from
first application to second application without writing to disk (thinking of
serializing the RDD and streaming through Kafka, but then we loose time and
all the fault tolerance that RDD brings in)? Is Tachyon the only other way?
Are there other models/design patterns for applications that share RDDs, as
this may be a very common use case?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p6543.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.