Re: RDD Partitions not distributed evenly to executors

2016-04-06 Thread Mike Hynes
Hello All (and Devs in particular), Thank you again for your further responses. Please find a detailed email below which identifies the cause (I believe) of the partition imbalance problem, which occurs in spark 1.5, 1.6, and a 2.0-SNAPSHOT. This is followed by follow-up questions for the dev

Re: RDD Partitions not distributed evenly to executors

2016-04-05 Thread Khaled Ammar
I have a similar experience. Using 32 machines, I can see than number of tasks (partitions) assigned to executors (machines) is not even. Moreover, the distribution change every stage (iteration). I wonder why Spark needs to move partitions around any way, should not the scheduler reduce network

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Koert Kuipers
can you try: spark.shuffle.reduceLocality.enabled=false On Mon, Apr 4, 2016 at 8:17 PM, Mike Hynes <91m...@gmail.com> wrote: > Dear all, > > Thank you for your responses. > > Michael Slavitch: > > Just to be sure: Has spark-env.sh and spark-defaults.conf been > correctly propagated to all

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
Dear all, Thank you for your responses. Michael Slavitch: > Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly > propagated to all nodes? Are they identical? Yes; these files are stored on a shared memory directory accessible to all nodes. Koert Kuipers: > we ran into

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Koert Kuipers
we ran into similar issues and it seems related to the new memory management. can you try: spark.memory.useLegacyMode = true On Mon, Apr 4, 2016 at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Ted Yu
bq. the modifications do not touch the scheduler If the changes can be ported over to 1.6.1, do you mind reproducing the issue there ? I ask because master branch changes very fast. It would be good to narrow the scope where the behavior you observed started showing. On Mon, Apr 4, 2016 at 6:12

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly propagated to all nodes? Are they identical? > On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o

RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
[ CC'ing dev list since nearly identical questions have occurred in user list recently w/o resolution; c.f.: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html