GitHub user shivaram opened a pull request:

    https://github.com/apache/spark/pull/1697

    [SPARK-2774 - Set preferred locations for reduce tasks

    Motivation for the change is in JIRA. There are a couple of things that I 
would like feedback about 
    
    1. Should we sort the map outputs by size for every task -- This could be 
expensive if we have a large number of map outputs.
    
    2. The number of preferred locations to use. Technically we could set this 
to a larger number, but I am not sure how it will affect the locality / delay 
scheduling in TaskSetManager.
    
    cc @rxin

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shivaram/spark-1 reducer-locality

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1697.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1697
    
----
commit 3fe76f7a22c663ae1f28585b6e2c15696f46e0d9
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2014-07-31T04:15:53Z

    Set preferred locations for reduce tasks

commit f8390ddb0e9f50ec734735ea4ec80f5af419d359
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2014-07-31T04:55:12Z

    Amortize traversing array by doing it once per RDD.

commit 6782dea8aa36628b5319191e6f52a8f62fa6f98c
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2014-07-31T18:04:00Z

    Merge branch 'master' of https://github.com/apache/spark into 
reducer-locality

commit 3666ff57fd1d25761edaadbf34d8287ff6df25f8
Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu>
Date:   2014-07-31T18:29:57Z

    Add a unit test that checks for reducer locality

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to