Thank you Takeshi. After executing df3.explain(true) I realised that the Optimiser batches are being performed and also the predicate push down.
I think that only the analiser batches are executed when creating the data frame by the context.sql(query). It seems that the optimiser batches are executed when some action like collect or explain takes place. scala> d3.explain(true) 16/05/13 02:08:12 DEBUG Analyzer$ResolveReferences: Resolving 't.id to id#0L 16/05/13 02:08:12 DEBUG Analyzer$ResolveReferences: Resolving 'u.id to id#1L 16/05/13 02:08:12 DEBUG Analyzer$ResolveReferences: Resolving 't.id to id#0L 16/05/13 02:08:12 DEBUG SQLContext$$anon$1: == Parsed Logical Plan == 'Project [unresolvedalias(*)] +- 'Filter ('t.id = 1) +- 'Join Inner, Some(('t.id = 'u.id)) :- 'UnresolvedRelation `t`, None +- 'UnresolvedRelation `u`, None == Analyzed Logical Plan == id: bigint, id: bigint Project [id#0L,id#1L] +- Filter (id#0L = cast(1 as bigint)) +- Join Inner, Some((id#0L = id#1L)) :- Subquery t : +- Relation[id#0L] JSONRelation +- Subquery u +- Relation[id#1L] JSONRelation == Optimized Logical Plan == Project [id#0L,id#1L] +- Join Inner, Some((id#0L = id#1L)) :- Filter (id#0L = 1) : +- Relation[id#0L] JSONRelation +- Relation[id#1L] JSONRelation == Physical Plan == Project [id#0L,id#1L] +- BroadcastHashJoin [id#0L], [id#1L], BuildRight :- Filter (id#0L = 1) : +- Scan JSONRelation[id#0L] InputPaths: file:/persons.json, PushedFilters: [EqualTo(id,1)] +- Scan JSONRelation[id#1L] InputPaths: file:/cars.json 2016-05-12 16:34 GMT+01:00 Takeshi Yamamuro <linguin....@gmail.com>: > Hi, > > What's the result of `df3.explain(true)`? > > // maropu > > On Thu, May 12, 2016 at 10:04 AM, Telmo Rodrigues < > telmo.galante.rodrig...@gmail.com> wrote: > >> I'm building spark from branch-1.6 source with mvn -DskipTests package >> and I'm running the following code with spark shell. >> >> *val* sqlContext *=* *new* org.apache.spark.sql.*SQLContext*(sc) >> >> *import* *sqlContext.implicits._* >> >> >> *val df = sqlContext.read.json("persons.json")* >> >> *val df2 = sqlContext.read.json("cars.json")* >> >> >> *df.registerTempTable("t")* >> >> *df2.registerTempTable("u")* >> >> >> *val d3 =sqlContext.sql("select * from t join u on t.id <http://t.id> = >> u.id <http://u.id> where t.id <http://t.id> = 1")* >> >> With the log4j root category level WARN, the last printed messages refers >> to the Batch Resolution rules execution. >> >> === Result of Batch Resolution === >> !'Project [unresolvedalias(*)] Project [id#0L,id#1L] >> !+- 'Filter ('t.id = 1) +- Filter (id#0L = cast(1 as >> bigint)) >> ! +- 'Join Inner, Some(('t.id = 'u.id)) +- Join Inner, >> Some((id#0L = id#1L)) >> ! :- 'UnresolvedRelation `t`, None :- Subquery t >> ! +- 'UnresolvedRelation `u`, None : +- Relation[id#0L] >> JSONRelation >> ! +- Subquery u >> ! +- Relation[id#1L] >> JSONRelation >> >> >> I think that only the analyser rules are being executed. >> >> The optimiser rules should not to run in this case? >> >> 2016-05-11 19:24 GMT+01:00 Michael Armbrust <mich...@databricks.com>: >> >>> >>>> logical plan after optimizer execution: >>>> >>>> Project [id#0L,id#1L] >>>> !+- Filter (id#0L = cast(1 as bigint)) >>>> ! +- Join Inner, Some((id#0L = id#1L)) >>>> ! :- Subquery t >>>> ! : +- Relation[id#0L] JSONRelation >>>> ! +- Subquery u >>>> ! +- Relation[id#1L] JSONRelation >>>> >>> >>> I think you are mistaken. If this was the optimized plan there would be >>> no subqueries. >>> >> >> > > > -- > --- > Takeshi Yamamuro >