RE: Data frames select and where clause dependency

2015-07-20 Thread Mohammed Guller
anywhere else)? Mohammed From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Friday, July 17, 2015 1:39 PM To: Mike Trienis Cc: user@spark.apache.org Subject: Re: Data frames select and where clause dependency Each operation on a dataframe is completely independent and doesn't know what

Re: Data frames select and where clause dependency

2015-07-20 Thread Harish Butani
)? Mohammed *From:* Michael Armbrust [mailto:mich...@databricks.com] *Sent:* Friday, July 17, 2015 1:39 PM *To:* Mike Trienis *Cc:* user@spark.apache.org *Subject:* Re: Data frames select and where clause dependency Each operation on a dataframe is completely independent and doesn't

RE: Data frames select and where clause dependency

2015-07-20 Thread Mohammed Guller
@spark.apache.org Subject: Re: Data frames select and where clause dependency Yes via: org.apache.spark.sql.catalyst.optimizer.ColumnPruning See DefaultOptimizer.batches for list of logical rewrites. You can see the optimized plan by printing: df.queryExecution.optimizedPlan On Mon, Jul 20, 2015

Re: Data frames select and where clause dependency

2015-07-20 Thread Mike Trienis
[mailto:rhbutani.sp...@gmail.com] *Sent:* Monday, July 20, 2015 5:37 PM *To:* Mohammed Guller *Cc:* Michael Armbrust; Mike Trienis; user@spark.apache.org *Subject:* Re: Data frames select and where clause dependency Yes via: org.apache.spark.sql.catalyst.optimizer.ColumnPruning See

Re: Data frames select and where clause dependency

2015-07-17 Thread Michael Armbrust
Each operation on a dataframe is completely independent and doesn't know what operations happened before it. When you do a selection, you are removing other columns from the dataframe and so the filter has nothing to operate on. On Fri, Jul 17, 2015 at 11:55 AM, Mike Trienis