[ 
https://issues.apache.org/jira/browse/SPARK-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171879#comment-15171879
 ] 

Lior Chaga commented on SPARK-10567:
------------------------------------

Hi,
I have a spark job that does the following:

Given a list of days (usually 1 day), extract from an index table in cassandra 
a list of file keys, and then repartition with HashPartitioner according to 
file name.
Then it flat maps each filename with a function that extract the relevant file 
from cassandra (stored as blob), and breaks it into protostuff messages.

With spark 1.4 the tasks were distributed beautifully among executors, but 
after upgrading to 1.6 I noticed that most of the tasks are offered to a single 
executor. 
After digging a lot I found this ticket, and by changing 
spark.shuffle.reduceLocality.enabled to false the tasks were evenly distributed 
between executors again.

I think this ticket should be reopened.
Also opened a ticket on missing documentation -SPARK-13567

> Reducer locality follow-up for Spark 1.6
> ----------------------------------------
>
>                 Key: SPARK-10567
>                 URL: https://issues.apache.org/jira/browse/SPARK-10567
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Yin Huai
>            Assignee: Matei Zaharia
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> For Spark 1.6, let's check the issue mentioned in 
> https://github.com/apache/spark/pull/8280 is fixed when 
> spark.shuffle.reduceLocality.enabled is true.
> Otherwise, we should disable it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to