Re: Announcing Spark SQL
On Fri, Mar 28, 2014 at 9:53 PM, Rohit Rai ro...@tuplejump.com wrote: Upon discussion with couple of our clients, it seems the reason they would prefer using hive is that they have already invested a lot in it. Mostly in UDFs and HiveQL. 1. Are there any plans to develop the SQL Parser to handdle more complex queries like HiveQL? Can we just plugin a custom parser instead of bringing in the whole hive deps? We definitely want to have a more complete SQL parser without having to pull in all of hive. I think there are a couple of ways to do this. 1. Using a SQL-92 parser from something like optiq or writing our own 2. I haven't fully investigated the hive published artifacts, but if there is some way to depend on only the parser that would be great. If someone has resources to investigate using the Hive parser without needing to depend on all of hive this is a place where we would certainly welcome contributions. We could then consider making hiveql an option in a standard SQLContext. 2. Is there any way we can support UDFs in Catalyst without using Hive? It will bee fine if we don't support Hive UDFs as is and need minor porting effort. All of the execution support for native scala udfs is already there, and in fact when you use the DSL where clausehttp://people.apache.org/~pwendell/catalyst-docs/api/sql/core/index.html#org.apache.spark.sql.SchemaRDDyou are using this machinery. For Spark 1.1 we will find a more general way to expose this to users.
Re: Announcing Spark SQL
Thanks Patrick, I was thinking about that... Upon analysis I realized (on date) it would be something similar to the way Hive Context using CustomCatalog stuff. I will review it again, on the lines of implementing SchemaRDD with Cassandra. Thanks for the pointer. Upon discussion with couple of our clients, it seems the reason they would prefer using hive is that they have already invested a lot in it. Mostly in UDFs and HiveQL. 1. Are there any plans to develop the SQL Parser to handdle more complex queries like HiveQL? Can we just plugin a custom parser instead of bringing in the whole hive deps? 2. Is there any way we can support UDFs in Catalyst without using Hive? It will bee fine if we don't support Hive UDFs as is and need minor porting effort. Regards, Rohit *Founder CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Fri, Mar 28, 2014 at 12:48 AM, Patrick Wendell pwend...@gmail.comwrote: Hey Rohit, I think external tables based on Cassandra or other datastores will work out-of-the box if you build Catalyst with Hive support. Michael may have feelings about this but I'd guess the longer term design for having schema support for Cassandra/HBase etc likely wouldn't rely on hive external tables because it's an unnecessary layer of indirection. Spark should be able to directly load an SchemaRDD from Cassandra by just letting the user give relevant information about the Cassandra schema. And it should let you write-back to Cassandra by giving a mapping of fields to the respective cassandra columns. I think all of this would be fairly easy to implement on SchemaRDD and likely will make it into Spark 1.1 - Patrick On Wed, Mar 26, 2014 at 10:59 PM, Rohit Rai ro...@tuplejump.com wrote: Great work guys! Have been looking forward to this . . . In the blog it mentions support for reading from Hbase/Avro... What will be the recommended approach for this? Will it be writing custom wrappers for SQLContext like in HiveContext or using Hive's EXTERNAL TABLE support? I ask this because a few days back (based on your pull request in github) I started analyzing what it would take to support Spark SQL on Cassandra. One obvious approach will be to use Hive External Table support with our cassandra-hive handler. But second approach sounds tempting as it will give more fidelity. Regards, Rohit *Founder CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Thu, Mar 27, 2014 at 9:12 AM, Michael Armbrust mich...@databricks.com wrote: Any plans to make the SQL typesafe using something like Slick ( http://slick.typesafe.com/) I would really like to do something like that, and maybe we will in a couple of months. However, in the near term, I think the top priorities are going to be performance and stability. Michael
Re: Announcing Spark SQL
Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
On Thu, Mar 27, 2014 at 11:08 AM, andy petrella andy.petre...@gmail.comwrote: nope (what I said :-P) That's also my answer to my own question :D but I didn't understand that in your sentence: my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Does Shark not suit your needs? That's what we use at the moment and it's been good Sent from my Samsung Galaxy S®4 Original message From: andy petrella andy.petre...@gmail.com Date:03/27/2014 6:08 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Announcing Spark SQL nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.com wrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Yes it could, of course. I didn't say that there is no tool to do it, though ;-). Andy On Thu, Mar 27, 2014 at 12:49 PM, yana yana.kadiy...@gmail.com wrote: Does Shark not suit your needs? That's what we use at the moment and it's been good Sent from my Samsung Galaxy S®4 Original message From: andy petrella Date:03/27/2014 6:08 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Announcing Spark SQL nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
when there is something new, it's also cool to let imagination fly far away ;) On Thu, Mar 27, 2014 at 2:20 PM, andy petrella andy.petre...@gmail.comwrote: Yes it could, of course. I didn't say that there is no tool to do it, though ;-). Andy On Thu, Mar 27, 2014 at 12:49 PM, yana yana.kadiy...@gmail.com wrote: Does Shark not suit your needs? That's what we use at the moment and it's been good Sent from my Samsung Galaxy S®4 Original message From: andy petrella Date:03/27/2014 6:08 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Announcing Spark SQL nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.com wrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Hey Rohit, I think external tables based on Cassandra or other datastores will work out-of-the box if you build Catalyst with Hive support. Michael may have feelings about this but I'd guess the longer term design for having schema support for Cassandra/HBase etc likely wouldn't rely on hive external tables because it's an unnecessary layer of indirection. Spark should be able to directly load an SchemaRDD from Cassandra by just letting the user give relevant information about the Cassandra schema. And it should let you write-back to Cassandra by giving a mapping of fields to the respective cassandra columns. I think all of this would be fairly easy to implement on SchemaRDD and likely will make it into Spark 1.1 - Patrick On Wed, Mar 26, 2014 at 10:59 PM, Rohit Rai ro...@tuplejump.com wrote: Great work guys! Have been looking forward to this . . . In the blog it mentions support for reading from Hbase/Avro... What will be the recommended approach for this? Will it be writing custom wrappers for SQLContext like in HiveContext or using Hive's EXTERNAL TABLE support? I ask this because a few days back (based on your pull request in github) I started analyzing what it would take to support Spark SQL on Cassandra. One obvious approach will be to use Hive External Table support with our cassandra-hive handler. But second approach sounds tempting as it will give more fidelity. Regards, Rohit *Founder CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Thu, Mar 27, 2014 at 9:12 AM, Michael Armbrust mich...@databricks.comwrote: Any plans to make the SQL typesafe using something like Slick ( http://slick.typesafe.com/) I would really like to do something like that, and maybe we will in a couple of months. However, in the near term, I think the top priorities are going to be performance and stability. Michael
Re: Announcing Spark SQL
This is so, so COOL. YES. I'm excited about using this once I'm a bit more comfortable with Spark. Nice work, people! On Wed, Mar 26, 2014 at 5:58 PM, Michael Armbrust mich...@databricks.comwrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
RE: Announcing Spark SQL
Fantastic! Although, I think they missed an obvious name choice: SparkQL (pronounced sparkle) :) Skyler From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, March 26, 2014 3:58 PM To: user@spark.apache.org Subject: Announcing Spark SQL Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Congrats Michael co for putting this together — this is probably the neatest piece of technology added to Spark in the past few months, and it will greatly change what users can do as more data sources are added. Matei On Mar 26, 2014, at 3:22 PM, Ognen Duzlevski og...@plainvanillagames.com wrote: Wow! Ognen On 3/26/14, 4:58 PM, Michael Armbrust wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
+1 Michael, Reynold et al. This is key to some of the things we're doing. -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen On Wed, Mar 26, 2014 at 2:58 PM, Michael Armbrust mich...@databricks.comwrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Very nice. Any plans to make the SQL typesafe using something like Slick ( http://slick.typesafe.com/) Thanks ! On Wed, Mar 26, 2014 at 5:58 PM, Michael Armbrust mich...@databricks.comwrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
Any plans to make the SQL typesafe using something like Slick ( http://slick.typesafe.com/) I would really like to do something like that, and maybe we will in a couple of months. However, in the near term, I think the top priorities are going to be performance and stability. Michael