Re: Quick one on evaluation

2017-08-04 Thread Daniel Darabos
On Fri, Aug 4, 2017 at 4:36 PM, Jean Georges Perrin  wrote:

> Thanks Daniel,
>
> I like your answer for #1. It makes sense.
>
> However, I don't get why you say that there are always pending
> transformations... After you call an action, you should be "clean" from
> pending transformations, no?
>

Nope. Say you have val df = spark.read.csv("data.csv"); println(df.count +
df.count). The first "df.count" reads in the file and counts the rows. The
action was executed, but "df" still represents the same pending
transformations. The second "df.count" again reads in the file and counts
the rows. Actions do not modify DataFrames/RDDs. (The only exception is
"cache()".)


Re: Quick one on evaluation

2017-08-04 Thread Jean Georges Perrin
Thanks Daniel,

I like your answer for #1. It makes sense.

However, I don't get why you say that there are always pending 
transformations... After you call an action, you should be "clean" from pending 
transformations, no?

> On Aug 3, 2017, at 5:53 AM, Daniel Darabos  
> wrote:
> 
> 
> On Wed, Aug 2, 2017 at 2:16 PM, Jean Georges Perrin  > wrote:
> Hi Sparkians,
> 
> I understand the lazy evaluation mechanism with transformations and actions. 
> My question is simpler: 1) are show() and/or printSchema() actions? I would 
> assume so...
> 
> show() is an action (it prints data) but printSchema() is not an action. 
> Spark can tell you the schema of the result without computing the result.
> 
> and optional question: 2) is there a way to know if there are transformations 
> "pending"?
>  
> There are always transformations pending :). An RDD or DataFrame is a series 
> of pending transformations. If you say val df = spark.read.csv("foo.csv"), 
> that is a pending transformation. Even spark.emptyDataFrame is best 
> understood as a pending transformation: it does not do anything on the 
> cluster, but records locally what it will have to do on the cluster.



Re: Quick one on evaluation

2017-08-03 Thread Daniel Darabos
On Wed, Aug 2, 2017 at 2:16 PM, Jean Georges Perrin  wrote:

> Hi Sparkians,
>
> I understand the lazy evaluation mechanism with transformations and
> actions. My question is simpler: 1) are show() and/or printSchema()
> actions? I would assume so...
>

show() is an action (it prints data) but printSchema() is not an action.
Spark can tell you the schema of the result without computing the result.

and optional question: 2) is there a way to know if there are
> transformations "pending"?
>

There are always transformations pending :). An RDD or DataFrame is a
series of pending transformations. If you say val df =
spark.read.csv("foo.csv"), that is a pending transformation. Even
spark.emptyDataFrame is best understood as a pending transformation: it
does not do anything on the cluster, but records locally what it will have
to do on the cluster.


Re: Quick one on evaluation

2017-08-02 Thread Jean Georges Perrin
Hey Jörn,

The "pending" was more something like a flag like myDf.hasCatalystWorkToDo() or 
myDf.isPendingActions(). Maybe an access to the DAG?

I just did that:
ordersDf = ordersDf.withColumn(
"time_to_ship", 
datediff(ordersDf.col("ship_date"), ordersDf.col("order_date")));

ordersDf.printSchema();
ordersDf.show();

and the schema and data is correct, so I was wondering what triggered 
Catalyst...

jg



> On Aug 2, 2017, at 8:29 AM, Jörn Franke  wrote:
> 
> I assume printschema would not trigger an evaluation. Show might partially 
> triggger an evaluation (not all data is shown only a certain number of rows 
> by default).
> Keep in mind that even a count might not trigger evaluation of all rows 
> (especially in the future) due to updates on the optimizer. 
> 
> What do you mean by pending ? You can see the status of the job in the UI. 
> 
>> On 2. Aug 2017, at 14:16, Jean Georges Perrin  wrote:
>> 
>> Hi Sparkians,
>> 
>> I understand the lazy evaluation mechanism with transformations and actions. 
>> My question is simpler: 1) are show() and/or printSchema() actions? I would 
>> assume so...
>> 
>> and optional question: 2) is there a way to know if there are 
>> transformations "pending"?
>> 
>> Thanks!
>> 
>> jg
>> 
>> 
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 



Re: Quick one on evaluation

2017-08-02 Thread Jörn Franke
I assume printschema would not trigger an evaluation. Show might partially 
triggger an evaluation (not all data is shown only a certain number of rows by 
default).
Keep in mind that even a count might not trigger evaluation of all rows 
(especially in the future) due to updates on the optimizer. 

What do you mean by pending ? You can see the status of the job in the UI. 

> On 2. Aug 2017, at 14:16, Jean Georges Perrin  wrote:
> 
> Hi Sparkians,
> 
> I understand the lazy evaluation mechanism with transformations and actions. 
> My question is simpler: 1) are show() and/or printSchema() actions? I would 
> assume so...
> 
> and optional question: 2) is there a way to know if there are transformations 
> "pending"?
> 
> Thanks!
> 
> jg
> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Quick one on evaluation

2017-08-02 Thread Jean Georges Perrin
Hi Sparkians,

I understand the lazy evaluation mechanism with transformations and actions. My 
question is simpler: 1) are show() and/or printSchema() actions? I would assume 
so...

and optional question: 2) is there a way to know if there are transformations 
"pending"?

Thanks!

jg

 
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org