For DataFrame, there are also transformations and actions. And
transformations are also lazily evaluated. However, DataFrame
transformations like filter(), select(), agg() return a DataFrame rather
than an RDD. Other methods like show() and collect() are actions.
Cheng
On 6/8/15 1:33 PM,
I would think DF=RDD+Schema+some additional methods. In fact, a DF object
has a DF.rdd in it so you can (if needed) convert DF=RDD really easily.
On Mon, Jun 8, 2015 at 5:41 PM, kiran lonikar loni...@gmail.com wrote:
Thanks. Can you point me to a place in the documentation of SQL
programming
Thanks. Can you point me to a place in the documentation of SQL programming
guide or DataFrame scaladoc where this transformation and actions are
grouped like in the case of RDD?
Also if you can tell me if sqlContext.load and unionAll are transformations
or actions...
I answered a question on
You may refer to DataFrame Scaladoc
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame
Methods listed in Language Integrated Queries and RDD Options can be
viewed as transformations, and those listed in Actions are, of
course, actions. As for
Hi Cheng, Ayan,
Thanks for the answers. I like the rule of thumb. I cursorily went through
the DataFrame, SQLContext and sql.execution.basicOperators.scala code. It
is apparent that these functions are lazily evaluated. The SQLContext.load
functions are similar to SparkContext.textFile kind of
Interesting, just posted on another thread asking exactly the same
question :) My answer there quoted below:
For the following code:
val df = sqlContext.parquetFile(path)
`df` remains columnar (actually it just reads from the columnar
Parquet file on disk). For the following code:
Thanks for replying twice :) I think I sent this question by email and
somehow thought I did not sent it, hence created the other one on the web
interface. Lets retain this thread since you have provided more details
here.
Great, it confirms my intuition about DataFrame. It's similar to Shark
When spark reads parquet files (sqlContext.parquetFile), it creates a
DataFrame RDD. I would like to know if the resulting DataFrame has columnar
structure (many rows of a column coalesced together in memory) or its a row
wise structure that a spark RDD has. The section Spark SQL and DataFrames