looking at the cached rdd i see a similar story: with useLegacyMode = true the cached rdd is spread out across 10 executors, but with useLegacyMode = false the data for the cached rdd sits on only 3 executors (the rest all show 0s). my cached RDD is a key-value RDD that got partitioned (hash partitioner, 50 partitions) before being cached.
On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> wrote: > hello all, > we are just testing a semi-realtime application (it should return results > in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it > used to run on spark 1.5.1 > > in spark 1.6.0 the performance is similar to 1.5.1 if i set > spark.memory.useLegacyMode = true, however if i switch to > spark.memory.useLegacyMode = false the queries take about 50% to 100% more > time. > > the issue becomes clear when i focus on a single stage: the individual > tasks are not slower at all, but they run on less executors. > in my test query i have 50 tasks and 10 executors. both with useLegacyMode > = true and useLegacyMode = false the tasks finish in about 3 seconds and > show as running PROCESS_LOCAL. however when useLegacyMode = false the > tasks run on just 3 executors out of 10, while with useLegacyMode = true > they spread out across 10 executors. all the tasks running on just a few > executors leads to the slower results. > > any idea why this would happen? > thanks! koert > > >