Can you try your query using Spark 1.4.0 RC2 ?

There have been some fixes since 1.2.0
e.g.
SPARK-7233 ClosureCleaner#clean blocks concurrent job submitter threads

Cheers

On Wed, May 27, 2015 at 10:38 AM, Nitin Goyal <nitin2go...@gmail.com> wrote:

> Hi All,
>
> I am running a SQL query (spark version 1.2) on a table created from
> unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at
> driver and roughly 200ms at executors).
>
> If I run same query on a table created from unionAll of 27 schema RDDS, I
> see that executors time is same(because of concurrency and nature of my
> query) but driver time shoots to 600ms (and total query time being = 600 +
> 200 = 800ms).
>
> I attached JProfiler and found that ClosureCleaner clean method is taking
> time at driver(some issue related to URLClassLoader) and it linearly
> increases with number of RDDs being union-ed on which query is getting
> fired. This is causing my query to take a huge amount of time where I
> expect
> the query to be executed within 400ms irrespective of number of RDDs (since
> I have executors available to cater my need). PFB the links of screenshots
> from Jprofiler :-
>
> http://pasteboard.co/MnQtB4o.png
>
> http://pasteboard.co/MnrzHwJ.png
>
> Any help/suggestion to fix this will be highly appreciated since this needs
> to be fixed for production
>
> Thanks in Advance,
> Nitin
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to