Re: Are there any plans to run Spark on top of Succinct

2015-01-26 Thread Michael Armbrust
There was work being done at Berkeley on prototyping support for Succinct in Spark SQL. Rachit might have more information. On Thu, Jan 22, 2015 at 7:04 AM, Dean Wampler deanwamp...@gmail.com wrote: Interesting. I was wondering recently if anyone has explored working with compressed data

[VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec The release files, including signatures, digests, etc.

Re: renaming SchemaRDD - DataFrame

2015-01-26 Thread Matei Zaharia
(Actually when we designed Spark SQL we thought of giving it another name, like Spark Schema, but we decided to stick with SQL since that was the most obvious use case to many users.) Matei On Jan 26, 2015, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote: While it might be possible

Re: renaming SchemaRDD - DataFrame

2015-01-26 Thread Matei Zaharia
While it might be possible to move this concept to Spark Core long-term, supporting structured data efficiently does require quite a bit of the infrastructure in Spark SQL, such as query planning and columnar storage. The intent of Spark SQL though is to be more than a SQL server -- it's meant

Re: renaming SchemaRDD - DataFrame

2015-01-26 Thread Kushal Datta
I want to address the issue that Matei raised about the heavy lifting required for a full SQL support. It is amazing that even after 30 years of research there is not a single good open source columnar database like Vertica. There is a column store option in MySQL, but it is not nearly as

Re: talk on interface design

2015-01-26 Thread Andrew Ash
In addition to the references you have at the end of the presentation, there's a great set of practical examples based on the learnings from Qt posted here: http://www21.in.tum.de/~blanchet/api-design.pdf Chapter 4's way of showing a principle and then an example from Qt is particularly

renaming SchemaRDD - DataFrame

2015-01-26 Thread Reynold Xin
Hi, We are considering renaming SchemaRDD - DataFrame in 1.3, and wanted to get the community's opinion. The context is that SchemaRDD is becoming a common data format used for bringing data into Spark from external systems, and used for various components of Spark, e.g. MLlib's new pipeline

Re: renaming SchemaRDD - DataFrame

2015-01-26 Thread Koert Kuipers
The context is that SchemaRDD is becoming a common data format used for bringing data into Spark from external systems, and used for various components of Spark, e.g. MLlib's new pipeline API. i agree. this to me also implies it belongs in spark core, not sql On Mon, Jan 26, 2015 at 6:11 PM,