I don't think Spark optimizer supports something like statement cache where
plan is cached and bind variables (like RDBMS) are used for different
values, thus saving the parsing.

What you re stating is that the source and tempTable change but the plan
itself remains the same. I have not seen this in 1.6.1 and as I understand
Spark does yet support CBO yet not even in 2.0


HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 30 June 2016 at 22:53, Darshan Singh <darshan.m...@gmail.com> wrote:

> I am using 1.5.2.
>
> I have a data-frame with 10 column and then I pivot 1 column and generate
> the 700 columns.
>
> it is like
>
> val df1 = sqlContext.read.parquet("file1")
> df1.registerTempTable("df1")
> val df2= sqlContext.sql("select col1, col2, sum(case when col3 = 1 then
> col4 else 0.0 end ) as col4_1,....,sum(case when col3 = 700 then col4 else
> 0.0 end ) as col4_700 from df1 group by col1, col2")
>
> Now this last statement takes around 20-30 seconds. I run this a number of
> times only difference is that for df1 file is different. Everything else is
> same.
>
> The actual statement takes 2-3 seconds so it is bit frustrating that just
> generating plan for df2 is taking too much time.Worse thing is that this
> run on driver so it is not palatalized.
>
> I have similar issue in another query where from these 700 columns we
> generate more columns by adding or subtracting these and it again takes
> lots of time.
>
> Not sure what could be done here.
>
> Thanks
>
> On Thu, Jun 30, 2016 at 10:10 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> Which version are you using here? If the underlying files change,
>> technically we should go through optimization again.
>>
>> Perhaps the real "fix" is to figure out why is logical plan creation so
>> slow for 700 columns.
>>
>>
>> On Thu, Jun 30, 2016 at 1:58 PM, Darshan Singh <darshan.m...@gmail.com>
>> wrote:
>>
>>> Is there a way I can use same Logical plan for a query. Everything will
>>> be same except underlying file will be different.
>>>
>>> Issue is that my query has around 700 columns and Generating logical
>>> plan takes 20 seconds and it happens every 2 minutes but every time
>>> underlying file is different.
>>>
>>> I do not know these files in advance so I cant create the table on
>>> directory level. These files are created and then used in the final query.
>>>
>>> Thanks
>>>
>>
>>
>

Reply via email to