looking at the cached rdd i see a similar story:
with useLegacyMode = true the cached rdd is spread out across 10 executors,
but with useLegacyMode = false the data for the cached rdd sits on only 3
executors (the rest all show 0s). my cached RDD is a key-value RDD that got
partitioned (hash partitioner, 50 partitions) before being cached.

On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hello all,
> we are just testing a semi-realtime application (it should return results
> in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
> used to run on spark 1.5.1
>
> in spark 1.6.0 the performance is similar to 1.5.1 if i set
> spark.memory.useLegacyMode = true, however if i switch to
> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
> time.
>
> the issue becomes clear when i focus on a single stage: the individual
> tasks are not slower at all, but they run on less executors.
> in my test query i have 50 tasks and 10 executors. both with useLegacyMode
> = true and useLegacyMode = false the tasks finish in about 3 seconds and
> show as running PROCESS_LOCAL. however when  useLegacyMode = false the
> tasks run on just 3 executors out of 10, while with useLegacyMode = true
> they spread out across 10 executors. all the tasks running on just a few
> executors leads to the slower results.
>
> any idea why this would happen?
> thanks! koert
>
>
>

Reply via email to