Re: [shark-users] SQL on Spark - Shark or SparkSQL

Matei Zaharia Sun, 30 Mar 2014 19:36:12 -0700

Hi Manoj,

At the current time, for drop-in replacement of Hive, it will be best to stick 
with Shark. Over time, Shark will use the Spark SQL backend, but should remain 
deployable the way it is today (including launching the SharkServer, using the 
Hive CLI, etc). Spark SQL is better for accessing Hive data within a Spark 
program though, where its APIs are richer and easier to link to than the 
SharkContext.sql2rdd we had previously provided in Shark.


So in a nutshell, if you have a Shark deployment today, or need the HiveServer, 
then going with Shark will be fine and we will switch out the backend in a 
future release (we’ll probably create preview of this even before we’re ready 
to fully switch). If you just want to run SQL queries or load SQL data within a 
Spark program, try out Spark SQL.

Matei

On Mar 30, 2014, at 4:46 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote:

> +1 Have done a few installations of Shark with customers using Hive, they 
> love it. Would be good to maintain compatibility with Metastore & QL till we 
> have substantial reason to break off (like BlinkDB). 
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas 
> <nicholas.cham...@gmail.com> wrote:
> This is a great question. We are in the same position, having not invested in 
> Hive yet and looking at various options for SQL-on-Hadoop.
> 
> 
> On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> Hi,
> 
> In context of the recent Spark SQL announcement 
> (http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html).
> 
> If there is no existing investment in Hive/Shark, would it be worth starting 
> a new SQL work using SparkSQL rather than Shark ?
> 
> * It seems Shark SQL core will use more and more of SparkSQL
> * From the blog, it seems Shark has baggage from Hive, that is not needed in 
> this case
> 
> On the other hand, there seems to be two shortcomings of SparkSQL (from a 
> quick scan of blog and doc) 
> 
> * SparkSQL will have less features than Shark/Hive QL, at least for now.
> * The standalone SharkServer feature will not be available in SparkSQL.
> 
> Can someone from Databricks shed light on what is the long term roadmap? It 
> will help in avoiding investing in older/two technologies for work with no 
> Hive needs.
> 
> Thanks,
> 
> PS: Great work on SparkSQL
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "shark-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to shark-users+unsubscr...@googlegroups.com.
> To post to this group, send email to shark-us...@googlegroups.com.
> Visit this group at http://groups.google.com/group/shark-users.
> For more options, visit https://groups.google.com/d/optout.

Re: [shark-users] SQL on Spark - Shark or SparkSQL

Reply via email to