GitHub user shubhamchopra opened a pull request:

    https://github.com/apache/spark/pull/13152

    [SPARK-15353] [CORE] Making peer selection for block replication pluggable

    ## What changes were proposed in this pull request?
    
    This PR makes block replication strategies pluggable. It provides two trait 
that can be implemented, one that maps a host to its topology and is used in 
the master, and the second that helps prioritize a list of peers for block 
replication and would run in the executors.
    
    This patch contains default implementations of these traits that make sure 
current Spark behavior is unchanged.
    
    
    ## How was this patch tested?
    
    This patch should not change Spark behavior in any way, and was tested with 
unit tests for storage.
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shubhamchopra/spark RackAwareBlockReplication

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13152
    
----
commit 779ce27dbeedd4d5c72e28782c9d38af51d2060c
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-05T22:06:14Z

    Adding capability to prioritize peer executors based on rack awareness 
while replicating blocks.

commit d0b6747f1fc9a0b701ab41fe5cf67939ed36cb9e
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-06T17:40:47Z

    Minor modifications to get past the style check errors.

commit 942908ac060fbdd29d0efd1f8541436bf9cd46d8
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-06T20:31:22Z

    Using blockId hashcode as a source of randomness, so we don't keep choosing 
the same peers for replication.

commit 0902e39fc7a2526539013e67c48bc13b6991bf07
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-09T20:36:53Z

    Several changes:
    1. Adding rack attribute to hashcode and equals to block manager id.
    2. Removing boolean check for rack awareness. Asking master for rack info, 
and master uses topology mapper.
    3. Adding a topology mapper trait and a default implementation that block 
manager master endpoint uses to discern topology information.

commit 86e1e0212b0dae0d598f0128c6a7b8f33429dc27
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-09T20:58:21Z

    Adding null check so a Block Manager can be initiaziled without the master.

commit a3b50ae9bcca7e871d384fa4614b2c77ac5ff5ad
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-12T21:09:16Z

    Renaming classes/variables from rack to a more general topology.

commit 1ee7948ce3994df08119418b779f8cc2e5aaca86
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-12T21:15:46Z

    Renaming classes/variables from rack to a more general topology.

commit 8de5c6e39cd0a868094803a0f53b3b50b7ed90d5
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-12T21:27:29Z

    We continue to randomly choose peers, so there is no change in current 
behavior.

commit 72ae37d64724423c65d3a23559a5f46649ffa4c3
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-13T15:36:17Z

    Spelling correction and minor changes in comments to use a more general 
topology instead of rack.

commit e071ca3a838193efad715764cc654507ee254e44
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-13T20:32:13Z

    Minor change. Changing replication info message to debug level.

commit 96aaf6ec50ae943c1345966cfc11fd4180ddfa3a
Author: Shubham Chopra <schopr...@bloomberg.net>
Date:   2016-05-16T21:47:33Z

    Providing peersReplicateTo to the prioritizer.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to