Re: Spark SQL API changes and stabilization

Alessandro Baretta Thu, 15 Jan 2015 07:55:17 -0800

Reynold,

Thanks for the heads up. In general, I strongly oppose the use of "private"
to restrict access to certain parts of the API, the reason being that I
might find the need to use some of the internals of a library from my own
project. I find that a @DeveloperAPI annotation serves the same purpose as
"private" without imposing unnecessary restrictions: it discourages people
from using the annotated API and reserves the right for the core developers
to change it suddenly in backwards incompatible ways.


In particular, I would like to express the desire that the APIs to
programmatically construct SchemaRDDs from an RDD[Row] and a StructType
remain public. All the SparkSQL data type objects should be exposed by the
API, and the jekyll build should not hide the docs as it does now.

Thanks.

Alex

On Wed, Jan 14, 2015 at 9:45 PM, Reynold Xin <r...@databricks.com> wrote:

> Hi Spark devs,
>
> Given the growing number of developers that are building on Spark SQL, we
> would like to stabilize the API in 1.3 so users and developers can be
> confident to build on it. This also gives us a chance to improve the API.
>
> In particular, we are proposing the following major changes. This should
> have no impact for most users (i.e. those running SQL through the JDBC
> client or SQLContext.sql method).
>
> 1. Everything in sql.catalyst package is private to the project.
>
> 2. Redesign SchemaRDD DSL (SPARK-5097): We initially added the DSL for
> SchemaRDD and logical plans in order to construct test cases. We have
> received feedback from a lot of users that the DSL can be incredibly
> powerful. In 1.3, we’d like to refactor the DSL to make it suitable for not
> only constructing test cases, but also in everyday data pipelines. The new
> SchemaRDD API is inspired by the data frame concept in Pandas and R.
>
> 3. Reconcile Java and Scala APIs (SPARK-5193): We would like to expose one
> set of APIs that will work for both Java and Scala. The current Java API
> (sql.api.java) does not share any common ancestor with the Scala API. This
> led to high maintenance burden for us as Spark developers and for library
> developers. We propose to eliminate the Java specific API, and simply work
> on the existing Scala API to make it also usable for Java. This will make
> Java a first class citizen as Scala. This effectively means that all public
> classes should be usable for both Scala and Java, including SQLContext,
> HiveContext, SchemaRDD, data types, and the aforementioned DSL.
>
>
> Again, this should have no impact on most users since the existing DSL is
> rarely used by end users. However, library developers might need to change
> the import statements because we are moving certain classes around. We will
> keep you posted as patches are merged.
>

Re: Spark SQL API changes and stabilization

Reply via email to