Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
I dint mean that. When you try the above approach only one client will have
access to the cached data.

But when you expose your data through a thrift server the case is quite
different.

In the case of thrift server all the request goes to the thrift server and
spark will be able to take the advantage of caching.

That is Thrift server be your sole client to the spark cluster.

check this link
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server

Your applications can connect to your spark cluster through jdbc driver.It
works similar to your hive thrift server.

Thanks,
Vishnu

On Wed, Feb 11, 2015 at 10:31 PM, Ashish Mukherjee <
ashish.mukher...@gmail.com> wrote:

> Thanks for your reply, Vishnu.
>
> I assume you are suggesting I build Hive tables and cache them in memory
> and query on top of that for fast, real-time querying.
>
> Perhaps, I should write a generic piece of code like this and submit this
> as a Spark job with the SQL clause as an argument based on user selections
> on the Web interface -
>
> String sqlClause = args[0];
> ...
> JavaHiveContext sqlContext = new 
> org.apache.spark.sql.hive.api.java.HiveContext(sc);// Queries are expressed 
> in HiveQL.Row[] results = sqlContext.sql(sqlClause).collect();
>
>
> Is my understanding right?
>
> Regards,
> Ashish
>
> On Wed, Feb 11, 2015 at 4:42 PM, VISHNU SUBRAMANIAN <
> johnfedrickena...@gmail.com> wrote:
>
>> Hi Ashish,
>>
>> In order to answer your question , I assume that you are planning to
>> process data and cache them in the memory.If you are using a thrift server
>> that comes with Spark then you can query on top of it. And multiple
>> applications can use the cached data as internally all the requests go to
>> thrift server.
>>
>> Spark exposes hive query language and allows you access its data through
>> spark .So you can consider using HiveQL for querying .
>>
>> Thanks,
>> Vishnu
>>
>> On Wed, Feb 11, 2015 at 4:12 PM, Ashish Mukherjee <
>> ashish.mukher...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am planning to use Spark for a Web-based adhoc reporting tool on
>>> massive date-sets on S3. Real-time queries with filters, aggregations and
>>> joins could be constructed from UI selections.
>>>
>>> Online documentation seems to suggest that SharkQL is deprecated and
>>> users should move away from it.  I understand Hive is generally not used
>>> for real-time querying and for Spark SQL to work with other data stores,
>>> tables need to be registered explicitly in code. Also, the This would not
>>> be suitable for a dynamic query construction scenario.
>>>
>>> For a real-time , dynamic querying scenario like mine what is the proper
>>> tool to be used with Spark SQL?
>>>
>>> Regards,
>>> Ashish
>>>
>>
>>
>


Re: Question related to Spark SQL

2015-02-11 Thread VISHNU SUBRAMANIAN
Hi Ashish,

In order to answer your question , I assume that you are planning to
process data and cache them in the memory.If you are using a thrift server
that comes with Spark then you can query on top of it. And multiple
applications can use the cached data as internally all the requests go to
thrift server.

Spark exposes hive query language and allows you access its data through
spark .So you can consider using HiveQL for querying .

Thanks,
Vishnu

On Wed, Feb 11, 2015 at 4:12 PM, Ashish Mukherjee <
ashish.mukher...@gmail.com> wrote:

> Hi,
>
> I am planning to use Spark for a Web-based adhoc reporting tool on massive
> date-sets on S3. Real-time queries with filters, aggregations and joins
> could be constructed from UI selections.
>
> Online documentation seems to suggest that SharkQL is deprecated and users
> should move away from it.  I understand Hive is generally not used for
> real-time querying and for Spark SQL to work with other data stores, tables
> need to be registered explicitly in code. Also, the This would not be
> suitable for a dynamic query construction scenario.
>
> For a real-time , dynamic querying scenario like mine what is the proper
> tool to be used with Spark SQL?
>
> Regards,
> Ashish
>


Re: Question related to Spark SQL

2015-02-11 Thread Arush Kharbanda
I am implementing this approach currently.

A
1.Create data tables in spark-sql and cache them.
2. Configure the hive metastore to read the cached tables and share the
same metastore as spark-sql (You get the spark caching advantage)
3.Run spark code to fetch form the cached tables. In the spark code you can
genrate queries at runtime.


On Wed, Feb 11, 2015 at 4:12 PM, Ashish Mukherjee <
ashish.mukher...@gmail.com> wrote:

> Hi,
>
> I am planning to use Spark for a Web-based adhoc reporting tool on massive
> date-sets on S3. Real-time queries with filters, aggregations and joins
> could be constructed from UI selections.
>
> Online documentation seems to suggest that SharkQL is deprecated and users
> should move away from it.  I understand Hive is generally not used for
> real-time querying and for Spark SQL to work with other data stores, tables
> need to be registered explicitly in code. Also, the This would not be
> suitable for a dynamic query construction scenario.
>
> For a real-time , dynamic querying scenario like mine what is the proper
> tool to be used with Spark SQL?
>
> Regards,
> Ashish
>



-- 

[image: Sigmoid Analytics] 

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com


Question related to Spark SQL

2015-02-11 Thread Ashish Mukherjee
Hi,

I am planning to use Spark for a Web-based adhoc reporting tool on massive
date-sets on S3. Real-time queries with filters, aggregations and joins
could be constructed from UI selections.

Online documentation seems to suggest that SharkQL is deprecated and users
should move away from it.  I understand Hive is generally not used for
real-time querying and for Spark SQL to work with other data stores, tables
need to be registered explicitly in code. Also, the This would not be
suitable for a dynamic query construction scenario.

For a real-time , dynamic querying scenario like mine what is the proper
tool to be used with Spark SQL?

Regards,
Ashish