[ https://issues.apache.org/jira/browse/SPARK-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171879#comment-15171879 ]
Lior Chaga commented on SPARK-10567: ------------------------------------ Hi, I have a spark job that does the following: Given a list of days (usually 1 day), extract from an index table in cassandra a list of file keys, and then repartition with HashPartitioner according to file name. Then it flat maps each filename with a function that extract the relevant file from cassandra (stored as blob), and breaks it into protostuff messages. With spark 1.4 the tasks were distributed beautifully among executors, but after upgrading to 1.6 I noticed that most of the tasks are offered to a single executor. After digging a lot I found this ticket, and by changing spark.shuffle.reduceLocality.enabled to false the tasks were evenly distributed between executors again. I think this ticket should be reopened. Also opened a ticket on missing documentation -SPARK-13567 > Reducer locality follow-up for Spark 1.6 > ---------------------------------------- > > Key: SPARK-10567 > URL: https://issues.apache.org/jira/browse/SPARK-10567 > Project: Spark > Issue Type: Bug > Components: Scheduler > Reporter: Yin Huai > Assignee: Matei Zaharia > Priority: Blocker > Fix For: 1.6.0 > > > For Spark 1.6, let's check the issue mentioned in > https://github.com/apache/spark/pull/8280 is fixed when > spark.shuffle.reduceLocality.enabled is true. > Otherwise, we should disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org