Re: Spark as sql engine on S3

2016-07-08 Thread Mich Talebzadeh
You can have two approaches here.

Use Hive as it is and replace Hive execution engine with Spark. You can
beeline with Hive thrift server to access your Hive tables.

beeline connects to the thrift server (either Hive or Spark). If you use
spark thrift server with beeline then you are going to take advantage of
Spark SQL.

If you are going to use beeline with Hive thrift server with Hive using
Spark or Tez (well I don't use Tez) then you will use the Hive CBO + Spark.
Hive SQL is a superset of Spark SQL. So you can try either.

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 8 July 2016 at 10:49, Ashok Kumar  wrote:

> Hi
>
> As I said we have using Hive as our SQL engine for the datasets but we are
> storing data externally in amazon S3,
>
> Now you suggested Spark thrift server.
>
> Started Spark thrift server on port 10001 and I have used beeline that
> accesses thrift server.
>
> Connecting to jdbc:hive2://,host>:10001
> Connected to: Spark SQL (version 1.6.1)
> Driver: Spark Project Core (version 1.6.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.6.1 by Apache Hive
>
> Now I just need to access my external tables on S3 as I do it on Hive with
> beeline connected to Hive thrift server?
>
> The advantage is that using Spark SQL will be much faster?
>
> regards
>
>
>
>
> On Friday, 8 July 2016, 6:30, ayan guha  wrote:
>
>
> Yes, it can.
>
> On Fri, Jul 8, 2016 at 3:03 PM, Ashok Kumar  wrote:
>
> thanks so basically Spark Thrift Server runs on a port much like beeline
> that uses JDBC to connect to Hive?
>
> Can Spark thrift server access Hive tables?
>
> regards
>
>
> On Friday, 8 July 2016, 5:27, ayan guha  wrote:
>
>
> Spark Thrift Server..works as jdbc server. you can connect to it from
> any jdbc tool like squirrel
>
> On Fri, Jul 8, 2016 at 3:50 AM, Ashok Kumar 
> wrote:
>
> Hello gurus,
>
> We are storing data externally on Amazon S3
>
> What is the optimum or best way to use Spark as SQL engine to access data
> on S3?
>
> Any info/write up will be greatly appreciated.
>
> Regards
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>


Re: Spark as sql engine on S3

2016-07-08 Thread Ashok Kumar
Hi
As I said we have using Hive asour SQL engine for the datasets but we are 
storing data externally in amazonS3, 
Now you suggested Spark thrift server.

Started Spark thrift server on port 10001 and I have used beeline that accesses 
thrift server. 
Connecting to jdbc:hive2://,host>:10001Connected to: Spark SQL (version 
1.6.1)Driver: Spark Project Core (version 1.6.1)Transaction isolation: 
TRANSACTION_REPEATABLE_READBeeline version 1.6.1 by Apache Hive
Now I just need to access my external tables on S3 as I do it on Hive with 
beeline connected to Hive thrift server?
The advantage is that using Spark SQL will be much faster?
regards

 

On Friday, 8 July 2016, 6:30, ayan guha  wrote:
 

 Yes, it can. 
On Fri, Jul 8, 2016 at 3:03 PM, Ashok Kumar  wrote:

thanks so basically Spark Thrift Server runs on a port much like beeline that 
uses JDBC to connect to Hive?
Can Spark thrift server access Hive tables?
regards 

On Friday, 8 July 2016, 5:27, ayan guha  wrote:
 

 Spark Thrift Server..works as jdbc server. you can connect to it from any 
jdbc tool like squirrel
On Fri, Jul 8, 2016 at 3:50 AM, Ashok Kumar  
wrote:

Hello gurus,
We are storing data externally on Amazon S3
What is the optimum or best way to use Spark as SQL engine to access data on S3?
Any info/write up will be greatly appreciated.
Regards



-- 
Best Regards,
Ayan Guha


   



-- 
Best Regards,
Ayan Guha


  

Re: Spark as sql engine on S3

2016-07-07 Thread ayan guha
Yes, it can.

On Fri, Jul 8, 2016 at 3:03 PM, Ashok Kumar  wrote:

> thanks so basically Spark Thrift Server runs on a port much like beeline
> that uses JDBC to connect to Hive?
>
> Can Spark thrift server access Hive tables?
>
> regards
>
>
> On Friday, 8 July 2016, 5:27, ayan guha  wrote:
>
>
> Spark Thrift Server..works as jdbc server. you can connect to it from
> any jdbc tool like squirrel
>
> On Fri, Jul 8, 2016 at 3:50 AM, Ashok Kumar 
> wrote:
>
> Hello gurus,
>
> We are storing data externally on Amazon S3
>
> What is the optimum or best way to use Spark as SQL engine to access data
> on S3?
>
> Any info/write up will be greatly appreciated.
>
> Regards
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>


-- 
Best Regards,
Ayan Guha


Re: Spark as sql engine on S3

2016-07-07 Thread Ashok Kumar
thanks so basically Spark Thrift Server runs on a port much like beeline that 
uses JDBC to connect to Hive?
Can Spark thrift server access Hive tables?
regards 

On Friday, 8 July 2016, 5:27, ayan guha  wrote:
 

 Spark Thrift Server..works as jdbc server. you can connect to it from any 
jdbc tool like squirrel
On Fri, Jul 8, 2016 at 3:50 AM, Ashok Kumar  
wrote:

Hello gurus,
We are storing data externally on Amazon S3
What is the optimum or best way to use Spark as SQL engine to access data on S3?
Any info/write up will be greatly appreciated.
Regards



-- 
Best Regards,
Ayan Guha


   

Re: Spark as sql engine on S3

2016-07-07 Thread ayan guha
Spark Thrift Server..works as jdbc server. you can connect to it from
any jdbc tool like squirrel

On Fri, Jul 8, 2016 at 3:50 AM, Ashok Kumar 
wrote:

> Hello gurus,
>
> We are storing data externally on Amazon S3
>
> What is the optimum or best way to use Spark as SQL engine to access data
> on S3?
>
> Any info/write up will be greatly appreciated.
>
> Regards
>



-- 
Best Regards,
Ayan Guha


Spark as sql engine on S3

2016-07-07 Thread Ashok Kumar
Hello gurus,
We are storing data externally on Amazon S3
What is the optimum or best way to use Spark as SQL engine to access data on S3?
Any info/write up will be greatly appreciated.
Regards