Ok, I see, thanks for the correction, but this should be optimized.
From: Shixiong Zhu [mailto:zsxw...@gmail.com]
Sent: Tuesday, August 25, 2015 2:08 PM
To: Cheng, Hao
Cc: Jeff Zhang; user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
That's two jobs. `SparkPlan.executeTake
, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3
jobs
, not 2 tasks.
*From:* Shixiong Zhu [mailto:zsxw...@gmail.com]
*Sent:* Tuesday, August 25, 2015 1:29 PM
*To:* Cheng, Hao
*Cc:* Jeff Zhang; user@spark.apache.org
*Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
Hao,
I can reproduce it using the master branch. I'm curious why you cannot
@spark.apache.org
Subject: DataFrame#show cost 2 Spark Jobs ?
It's weird to me that the simple show function will cost 2 spark jobs.
DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs.
== Parsed Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop
It's weird to me that the simple show function will cost 2 spark jobs.
DataFrame#explain shows it is a very simple operation, not sure why need 2
jobs.
== Parsed Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]
==
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the schema.
What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3
jobs.
Here's the command I use:
val df =
sqlContext.read.json(file:///Users/hadoop/github/spark/examples/src/main/resources
*Subject:* Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the
schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it
would cost 3 jobs.
Here's the command I use:
val df = sqlContext.read.json(
file
loading the data for JSON, it’s probably causes longer time for ramp up with
large number of files/partitions.
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Tuesday, August 25, 2015 8:11 AM
To: Cheng, Hao
Cc: user@spark.apache.org
Subject: Re: DataFrame#show cost 2 Spark Jobs ?
Hi Cheng,
I
/org/apache/spark/sql/execution/SparkPlan.scala#L185
Best Regards,
Shixiong Zhu
2015-08-25 8:11 GMT+08:00 Jeff Zhang zjf...@gmail.com:
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the
schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it
would cost