Re: renaming SchemaRDD -> DataFrame

Dmitriy Lyubimov Tue, 27 Jan 2015 12:07:39 -0800

It has been pretty evident for some time that's what it is, hasn't it?

Yes that's a better name IMO.


On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin <r...@databricks.com> wrote:

> Hi,
>
> We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to
> get the community's opinion.
>
> The context is that SchemaRDD is becoming a common data format used for
> bringing data into Spark from external systems, and used for various
> components of Spark, e.g. MLlib's new pipeline API. We also expect more and
> more users to be programming directly against SchemaRDD API rather than the
> core RDD API. SchemaRDD, through its less commonly used DSL originally
> designed for writing test cases, always has the data-frame like API. In
> 1.3, we are redesigning the API to make the API usable for end users.
>
>
> There are two motivations for the renaming:
>
> 1. DataFrame seems to be a more self-evident name than SchemaRDD.
>
> 2. SchemaRDD/DataFrame is actually not going to be an RDD anymore (even
> though it would contain some RDD functions like map, flatMap, etc), and
> calling it Schema*RDD* while it is not an RDD is highly confusing. Instead.
> DataFrame.rdd will return the underlying RDD for all RDD methods.
>
>
> My understanding is that very few users program directly against the
> SchemaRDD API at the moment, because they are not well documented. However,
> oo maintain backward compatibility, we can create a type alias DataFrame
> that is still named SchemaRDD. This will maintain source compatibility for
> Scala. That said, we will have to update all existing materials to use
> DataFrame rather than SchemaRDD.
>

Re: renaming SchemaRDD -> DataFrame

Reply via email to