Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17519#discussion_r123424505 --- Diff: docs/configuration.md --- @@ -1004,14 +1004,48 @@ Apart from these, the following properties are also available, and may be useful </td> </tr> <tr> - <td><code>spark.storage.replication.proactive<code></td> + <td><code>spark.storage.replication.proactive</code></td> <td>false</td> <td> Enables proactive block replication for RDD blocks. Cached RDD block replicas lost due to executor failures are replenished if there are any existing available replicas. This tries to get the replication level of the block to the initial number. </td> </tr> +<tr> + <td><code>spark.storage.replication.policy</code></td> + <td> + org.apache.spark.storage.<br />RandomBlockReplicationPolicy + </td> + <td> + The policy to use for choosing peers when replicating blocks. The default policy would randomly + choose the peers to replicate to. A more resilient replication policy is provided by + <code>org.apache.spark.storage.BasicBlockReplicationPolicy</code>, which makes use of the + topology information of the hosts to choose the peers, much like the HDFS blocks replication + strategy: it would try to choose the first replica within the same rack, and a third replica on + a different rack. See <code>spark.storage.replication.topologyMapper</code> below for how to + provide the topology information for the hosts. + </td> +</tr> +<tr> + <td><code>spark.storage.replication.topologyMapper</code></td> + <td> + org.apache.spark.storage.<br />DefaultTopologyMapper + </td> + <td> + The topology information of a host is determined by a topology mapping service defined by the + abstract class <code>org.apache.spark.storage.TopologyMapper</code>, which can be configured by + this property. A default implementation that assumes all hosts are in the same rack is provided + by <code>org.apache.spark.storage.DefaultTopologyMapper</code>. A file-based implementation is + provided by <code>org.apache.spark.storage.FileBasedTopologyMapper</code>, which reads the + topology information from the file <code>org.apache.spark.storage.topologyFile</code>. Each line + of this file is of the format of <code>host1 = /rack1</code> and provides a mapping from a host + name to its rack information. <em>Note:</em> This configuration only takes effect when + <code>spark.storage.replication.policy</code> is set to a a policy that takes the topology --- End diff -- nit: double `a`
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org