There was work being done at Berkeley on prototyping support for Succinct
in Spark SQL. Rachit might have more information.
On Thu, Jan 22, 2015 at 7:04 AM, Dean Wampler deanwamp...@gmail.com wrote:
Interesting. I was wondering recently if anyone has explored working with
compressed data
Please vote on releasing the following candidate as Apache Spark version 1.2.1!
The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec
The release files, including signatures, digests, etc.
(Actually when we designed Spark SQL we thought of giving it another name, like
Spark Schema, but we decided to stick with SQL since that was the most obvious
use case to many users.)
Matei
On Jan 26, 2015, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
While it might be possible
While it might be possible to move this concept to Spark Core long-term,
supporting structured data efficiently does require quite a bit of the
infrastructure in Spark SQL, such as query planning and columnar storage. The
intent of Spark SQL though is to be more than a SQL server -- it's meant
I want to address the issue that Matei raised about the heavy lifting
required for a full SQL support. It is amazing that even after 30 years of
research there is not a single good open source columnar database like
Vertica. There is a column store option in MySQL, but it is not nearly as
In addition to the references you have at the end of the presentation,
there's a great set of practical examples based on the learnings from Qt
posted here: http://www21.in.tum.de/~blanchet/api-design.pdf
Chapter 4's way of showing a principle and then an example from Qt is
particularly
Hi,
We are considering renaming SchemaRDD - DataFrame in 1.3, and wanted to
get the community's opinion.
The context is that SchemaRDD is becoming a common data format used for
bringing data into Spark from external systems, and used for various
components of Spark, e.g. MLlib's new pipeline
The context is that SchemaRDD is becoming a common data format used for
bringing data into Spark from external systems, and used for various
components of Spark, e.g. MLlib's new pipeline API.
i agree. this to me also implies it belongs in spark core, not sql
On Mon, Jan 26, 2015 at 6:11 PM,