Can you try your query using Spark 1.4.0 RC2 ? There have been some fixes since 1.2.0 e.g. SPARK-7233 ClosureCleaner#clean blocks concurrent job submitter threads
Cheers On Wed, May 27, 2015 at 10:38 AM, Nitin Goyal <[email protected]> wrote: > Hi All, > > I am running a SQL query (spark version 1.2) on a table created from > unionAll of 3 schema RDDs which gets executed in roughly 400ms (200ms at > driver and roughly 200ms at executors). > > If I run same query on a table created from unionAll of 27 schema RDDS, I > see that executors time is same(because of concurrency and nature of my > query) but driver time shoots to 600ms (and total query time being = 600 + > 200 = 800ms). > > I attached JProfiler and found that ClosureCleaner clean method is taking > time at driver(some issue related to URLClassLoader) and it linearly > increases with number of RDDs being union-ed on which query is getting > fired. This is causing my query to take a huge amount of time where I > expect > the query to be executed within 400ms irrespective of number of RDDs (since > I have executors available to cater my need). PFB the links of screenshots > from Jprofiler :- > > http://pasteboard.co/MnQtB4o.png > > http://pasteboard.co/MnrzHwJ.png > > Any help/suggestion to fix this will be highly appreciated since this needs > to be fixed for production > > Thanks in Advance, > Nitin > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tp12466.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
