Hi Everyone, I'm very excited about merging this new feature into Spark! We have a lot of cool things in the pipeline, including: porting Shark's in-memory columnar format to Spark SQL, code-generation for expression evaluation and improved support for complex types in parquet.
I would love to hear feedback on the interfaces, and what is missing. In particular, while we have pretty good test coverage for Hive, there has not been a lot of testing with real Hive deployments and there is certainly a lot more work to do. So, please test it out and if there are any missing features let me know! Michael On Thu, Mar 20, 2014 at 6:11 PM, Reynold Xin <r...@databricks.com> wrote: > Hi All, > > I'm excited to announce a new module in Spark (SPARK-1251). After an > initial review we've merged this as Spark as an alpha component to be > included in Spark 1.0. This new component adds some exciting features, > including: > > - schema-aware RDD programming via an experimental DSL > - native Parquet support > - support for executing SQL against RDDs > > The pull request itself contains more information: > https://github.com/apache/spark/pull/146 > > You can also find the documentation for this new component here: > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html > > > This contribution was lead by Michael Ambrust with work from several other > contributors who I'd like to highlight here: Yin Huai, Cheng Lian, Andre > Schumacher, Timothy Chen, Henry Cook, and Mark Hamstra. > > > - Reynold > >