Re: new Catalyst/SQL component merged into master

Michael Armbrust Thu, 20 Mar 2014 18:25:55 -0700

Hi Everyone,

I'm very excited about merging this new feature into Spark!  We have a lot
of cool things in the pipeline, including: porting Shark's in-memory
columnar format to Spark SQL, code-generation for expression evaluation and
improved support for complex types in parquet.

I would love to hear feedback on the interfaces, and what is missing.  In
particular, while we have pretty good test coverage for Hive, there has not
been a lot of testing with real Hive deployments and there is certainly a
lot more work to do.  So, please test it out and if there are any missing
features let me know!

Michael

On Thu, Mar 20, 2014 at 6:11 PM, Reynold Xin <r...@databricks.com> wrote:

> Hi All,
>
> I'm excited to announce a new module in Spark (SPARK-1251). After an
> initial review we've merged this as Spark as an alpha component to be
> included in Spark 1.0. This new component adds some exciting features,
> including:
>
> - schema-aware RDD programming via an experimental DSL
> - native Parquet support
> - support for executing SQL against RDDs
>
> The pull request itself contains more information:
> https://github.com/apache/spark/pull/146
>
> You can also find the documentation for this new component here:
> http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
>
>
> This contribution was lead by Michael Ambrust with work from several other
> contributors who I'd like to highlight here: Yin Huai, Cheng Lian, Andre
> Schumacher, Timothy Chen, Henry Cook, and Mark Hamstra.
>
>
> - Reynold
>
>

Re: new Catalyst/SQL component merged into master

Reply via email to