The first job is to infer the json schema, and the second one is what you mean of the query. You can provide the schema while loading the json file, like below:
sqlContext.read.schema(xxx).json(“…”)? Hao From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Monday, August 24, 2015 6:20 PM To: user@spark.apache.org Subject: DataFrame#show cost 2 Spark Jobs ? It's weird to me that the simple show function will cost 2 spark jobs. DataFrame#explain shows it is a very simple operation, not sure why need 2 jobs. == Parsed Logical Plan == Relation[age#0L,name#1] JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json] == Analyzed Logical Plan == age: bigint, name: string Relation[age#0L,name#1] JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json] == Optimized Logical Plan == Relation[age#0L,name#1] JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json] == Physical Plan == Scan JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1] -- Best Regards Jeff Zhang