RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-25 Thread Cheng, Hao
Ok, I see, thanks for the correction, but this should be optimized. From: Shixiong Zhu [mailto:zsxw...@gmail.com] Sent: Tuesday, August 25, 2015 2:08 PM To: Cheng, Hao Cc: Jeff Zhang; user@spark.apache.org Subject: Re: DataFrame#show cost 2 Spark Jobs ? That's two jobs. `SparkPlan.executeTake

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-25 Thread Cheng, Hao
, 2015 8:11 AM To: Cheng, Hao Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: DataFrame#show cost 2 Spark Jobs ? Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3 jobs

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-25 Thread Shixiong Zhu
, not 2 tasks. *From:* Shixiong Zhu [mailto:zsxw...@gmail.com] *Sent:* Tuesday, August 25, 2015 1:29 PM *To:* Cheng, Hao *Cc:* Jeff Zhang; user@spark.apache.org *Subject:* Re: DataFrame#show cost 2 Spark Jobs ? Hao, I can reproduce it using the master branch. I'm curious why you cannot

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Cheng, Hao
@spark.apache.org Subject: DataFrame#show cost 2 Spark Jobs ? It's weird to me that the simple show function will cost 2 spark jobs. DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs. == Parsed Logical Plan == Relation[age#0L,name#1] JSONRelation[file:/Users/hadoop

DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Jeff Zhang
It's weird to me that the simple show function will cost 2 spark jobs. DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs. == Parsed Logical Plan == Relation[age#0L,name#1] JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json] ==

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Jeff Zhang
Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3 jobs. Here's the command I use: val df = sqlContext.read.json(file:///Users/hadoop/github/spark/examples/src/main/resources

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Shixiong Zhu
*Subject:* Re: DataFrame#show cost 2 Spark Jobs ? Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost 3 jobs. Here's the command I use: val df = sqlContext.read.json( file

RE: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Cheng, Hao
loading the data for JSON, it’s probably causes longer time for ramp up with large number of files/partitions. From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Tuesday, August 25, 2015 8:11 AM To: Cheng, Hao Cc: user@spark.apache.org Subject: Re: DataFrame#show cost 2 Spark Jobs ? Hi Cheng, I

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Shixiong Zhu
/org/apache/spark/sql/execution/SparkPlan.scala#L185 Best Regards, Shixiong Zhu 2015-08-25 8:11 GMT+08:00 Jeff Zhang zjf...@gmail.com: Hi Cheng, I know that sqlContext.read will trigger one spark job to infer the schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it would cost