Re: Spark 1.6 Catalyst optimizer

2016-05-12 Thread Telmo Rodrigues
Thank you Takeshi. After executing df3.explain(true) I realised that the Optimiser batches are being performed and also the predicate push down. I think that only the analiser batches are executed when creating the data frame by the context.sql(query). It seems that the optimiser batches are

Re: Spark 1.6 Catalyst optimizer

2016-05-12 Thread Takeshi Yamamuro
Hi, What's the result of `df3.explain(true)`? // maropu On Thu, May 12, 2016 at 10:04 AM, Telmo Rodrigues < telmo.galante.rodrig...@gmail.com> wrote: > I'm building spark from branch-1.6 source with mvn -DskipTests package and > I'm running the following code with spark shell. > > *val*

Re: Spark 1.6 Catalyst optimizer

2016-05-11 Thread Telmo Rodrigues
I'm building spark from branch-1.6 source with mvn -DskipTests package and I'm running the following code with spark shell. *val* sqlContext *=* *new* org.apache.spark.sql.*SQLContext*(sc) *import* *sqlContext.implicits._* *val df = sqlContext.read.json("persons.json")* *val df2 =

Re: Spark 1.6 Catalyst optimizer

2016-05-11 Thread Michael Armbrust
> > > logical plan after optimizer execution: > > Project [id#0L,id#1L] > !+- Filter (id#0L = cast(1 as bigint)) > ! +- Join Inner, Some((id#0L = id#1L)) > ! :- Subquery t > ! : +- Relation[id#0L] JSONRelation > ! +- Subquery u > ! +- Relation[id#1L] JSONRelation >

Re: Spark 1.6 Catalyst optimizer

2016-05-11 Thread Rishi Mishra
Will try with JSON relation, but with Spark's temp tables (Spark version 1.6 ) I get an optimized plan as you have mentioned. Should not be much different though. Query : "select t1.col2, t1.col3 from t1, t2 where t1.col1=t2.col1 and t1.col3=7" Plan : Project [COL2#1,COL3#2] +- Join Inner,

Re: Spark 1.6 Catalyst optimizer

2016-05-11 Thread Telmo Rodrigues
In this case, isn't better to perform the filter earlier as possible even there could be unhandled predicates? Telmo Rodrigues No dia 11/05/2016, às 09:49, Rishi Mishra escreveu: > It does push the predicate. But as a relations are generic and might or might > not

Re: Spark 1.6 Catalyst optimizer

2016-05-11 Thread Rishi Mishra
It does push the predicate. But as a relations are generic and might or might not handle some of the predicates , it needs to apply filter of un-handled predicates. Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Wed, May 11,

Spark 1.6 Catalyst optimizer

2016-05-10 Thread Telmo Rodrigues
Hello, I have a question about the Catalyst optimizer in Spark 1.6. initial logical plan: !'Project [unresolvedalias(*)] !+- 'Filter ('t.id = 1) ! +- 'Join Inner, Some(('t.id = 'u.id)) ! :- 'UnresolvedRelation `t`, None ! +- 'UnresolvedRelation `u`, None logical plan after