Add snitch and range movements section on Operations

Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/4d444022
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/4d444022
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/4d444022

Branch: refs/heads/trunk
Commit: 4d444022c3ace5928a15b2a54af0f95f4191c0d7
Parents: cad277b
Author: Paulo Motta <pauloricard...@gmail.com>
Authored: Tue Jun 14 17:54:38 2016 -0300
Committer: Sylvain Lebresne <sylv...@datastax.com>
Committed: Wed Jun 15 10:31:17 2016 +0200

----------------------------------------------------------------------
 doc/source/operations.rst | 164 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 160 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/4d444022/doc/source/operations.rst
----------------------------------------------------------------------
diff --git a/doc/source/operations.rst b/doc/source/operations.rst
index 8228746..40b3e09 100644
--- a/doc/source/operations.rst
+++ b/doc/source/operations.rst
@@ -24,15 +24,171 @@ Replication Strategies
 
 .. todo:: todo
 
-Snitches
---------
+Snitch
+------
 
-.. todo:: todo
+In cassandra, the snitch has two functions:
+
+- it teaches Cassandra enough about your network topology to route requests 
efficiently.
+- it allows Cassandra to spread replicas around your cluster to avoid 
correlated failures. It does this by grouping
+  machines into "datacenters" and "racks."  Cassandra will do its best not to 
have more than one replica on the same
+  "rack" (which may not actually be a physical location).
+
+Dynamic snitching
+^^^^^^^^^^^^^^^^^
+
+The dynamic snitch monitor read latencies to avoid reading from hosts that 
have slowed down. The dynamic snitch is
+configured with the following properties on ``cassandra.yaml``:
+
+- ``dynamic_snitch``: whether the dynamic snitch should be enabled or disabled.
+- ``dynamic_snitch_update_interval_in_ms``: controls how often to perform the 
more expensive part of host score
+  calculation.
+- ``dynamic_snitch_reset_interval_in_ms``: if set greater than zero and 
read_repair_chance is < 1.0, this will allow
+  'pinning' of replicas to hosts in order to increase cache capacity.
+- ``dynamic_snitch_badness_threshold:``: The badness threshold will control 
how much worse the pinned host has to be
+  before the dynamic snitch will prefer other replicas over it.  This is 
expressed as a double which represents a
+  percentage.  Thus, a value of 0.2 means Cassandra would continue to prefer 
the static snitch values until the pinned
+  host was 20% worse than the fastest.
+
+Snitch classes
+^^^^^^^^^^^^^^
+
+The ``endpoint_snitch`` parameter in ``cassandra.yaml`` should be set to the 
class the class that implements
+``IEndPointSnitch`` which will be wrapped by the dynamic snitch and decide if 
two endpoints are in the same data center
+or on the same rack. Out of the box, Cassandra provides the snitch 
implementations:
+
+GossipingPropertyFileSnitch
+    This should be your go-to snitch for production use. The rack and 
datacenter for the local node are defined in
+    cassandra-rackdc.properties and propagated to other nodes via gossip. If 
``cassandra-topology.properties`` exists,
+    it is used as a fallback, allowing migration from the PropertyFileSnitch.
+
+SimpleSnitch
+    Treats Strategy order as proximity. This can improve cache locality when 
disabling read repair. Only appropriate for
+    single-datacenter deployments.
+
+PropertyFileSnitch
+    Proximity is determined by rack and data center, which are explicitly 
configured in
+    ``cassandra-topology.properties``.
+
+Ec2Snitch
+    Appropriate for EC2 deployments in a single Region. Loads Region and 
Availability Zone information from the EC2 API.
+    The Region is treated as the datacenter, and the Availability Zone as the 
rack. Only private IPs are used, so this
+    will not work across multiple regions.
+
+Ec2MultiRegionSnitch
+    Uses public IPs as broadcast_address to allow cross-region connectivity 
(thus, you should set seed addresses to the
+    public IP as well). You will need to open the ``storage_port`` or 
``ssl_storage_port`` on the public IP firewall
+    (For intra-Region traffic, Cassandra will switch to the private IP after 
establishing a connection).
+
+RackInferringSnitch
+    Proximity is determined by rack and data center, which are assumed to 
correspond to the 3rd and 2nd octet of each
+    node's IP address, respectively.  Unless this happens to match your 
deployment conventions, this is best used as an
+    example of writing a custom Snitch class and is provided in that spirit.
 
 Adding, replacing, moving and removing nodes
 --------------------------------------------
 
-.. todo:: todo
+Bootstrap
+^^^^^^^^^
+
+Adding new nodes is called "bootstrapping". The ``num_tokens`` parameter will 
define the amount of virtual nodes
+(tokens) the joining node will be assigned during bootstrap. The tokens define 
the sections of the ring (token ranges)
+the node will become responsible for.
+
+Token allocation
+~~~~~~~~~~~~~~~~
+
+With the default token allocation algorithm the new node will pick 
``num_tokens`` random tokens to become responsible
+for. Since tokens are distributed randomly, load distribution improves with a 
higher amount of virtual nodes, but it
+also increases token management overhead. The default of 256 virtual nodes 
should provide a reasonable load balance with
+acceptable overhead.
+
+On 3.0+ a new token allocation algorithm was introduced to allocate tokens 
based on the load of existing virtual nodes
+for a given keyspace, and thus yield an improved load distribution with a 
lower number of tokens. To use this approach,
+the new node must be started with the JVM option 
``-Dcassandra.allocate_tokens_for_keyspace=<keyspace>``, where
+``<keyspace>`` is the keyspace from which the algorithm can find the load 
information to optimize token assignment for.
+
+Manual token assignment
+"""""""""""""""""""""""
+
+You may specify a comma-separated list of tokens manually with the 
``initial_token`` ``cassandra.yaml`` parameter, and
+if that is specified Cassandra will skip the token allocation process. This 
may be useful when doing token assignment
+with an external tool or when restoring a node with its previous tokens.
+
+Range streaming
+~~~~~~~~~~~~~~~~
+
+After the tokens are allocated, the joining node will pick current replicas of 
the token ranges it will become
+responsible for to stream data from. By default it will stream from the 
primary replica of each token range in order to
+guarantee data in the new node will be consistent with the current state.
+
+In the case of any unavailable replica, the consistent bootstrap process will 
fail. To override this behavior and
+potentially miss data from an unavailable replica, set the JVM flag 
``-Dcassandra.consistent.rangemovement=false``.
+
+Resuming failed/hanged bootstrap
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On 2.2+, if the bootstrap process fails, it's possible to resume bootstrap 
from the previous saved state by calling
+``nodetool bootstrap resume``. If for some reason the bootstrap hangs or 
stalls, it may also be resumed by simply
+restarting the node. In order to cleanup bootstrap state and start fresh, you 
may set the JVM startup flag
+``-Dcassandra.reset_bootstrap_progress=true``.
+
+On lower versions, when the bootstrap proces fails it is recommended to wipe 
the node (remove all the data), and restart
+the bootstrap process again.
+
+Manual bootstrapping
+~~~~~~~~~~~~~~~~~~~~
+
+It's possible to skip the bootstrapping process entirely and join the ring 
straight away by setting the hidden parameter
+``auto_bootstrap: false``. This may be useful when restoring a node from a 
backup or creating a new data-center.
+
+Removing nodes
+^^^^^^^^^^^^^^
+
+You can take a node out of the cluster with ``nodetool decommission`` to a 
live node, or ``nodetool removenode`` (to any
+other machine) to remove a dead one. This will assign the ranges the old node 
was responsible for to other nodes, and
+replicate the appropriate data there. If decommission is used, the data will 
stream from the decommissioned node. If
+removenode is used, the data will stream from the remaining replicas.
+
+No data is removed automatically from the node being decommissioned, so if you 
want to put the node back into service at
+a different token on the ring, it should be removed manually.
+
+Moving nodes
+^^^^^^^^^^^^
+
+When ``num_tokens: 1`` it's possible to move the node position in the ring 
with ``nodetool move``. Moving is both a
+convenience over and more efficient than decommission + bootstrap. After 
moving a node, ``nodetool cleanup`` should be
+run to remove any unnecessary data.
+
+Replacing a dead node
+^^^^^^^^^^^^^^^^^^^^^
+
+In order to replace a dead node, start cassandra with the JVM startup flag
+``-Dcassandra.replace_address_first_boot=<dead_node_ip>``. Once this property 
is enabled the node starts in a hibernate
+state, during which all the other nodes will see this node to be down.
+
+The replacing node will now start to bootstrap the data from the rest of the 
nodes in the cluster. The main difference
+between normal bootstrapping of a new node is that this new node will not 
accept any writes during this phase.
+
+Once the bootstrapping is complete the node will be marked "UP", we rely on 
the hinted handoff's for making this node
+consistent (since we don't accept writes since the start of the bootstrap).
+
+.. Note:: If the replacement process takes longer than 
``max_hint_window_in_ms`` you **MUST** run repair to make the
+   replaced node consistent again, since it missed ongoing writes during 
bootstrapping.
+
+Monitoring progress
+^^^^^^^^^^^^^^^^^^^
+
+Bootstrap, replace, move and remove progress can be monitored using ``nodetool 
netstats`` which will show the progress
+of the streaming operations.
+
+Cleanup data after range movements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As a safety measure, Cassandra does not automatically remove data from nodes 
that "lose" part of their token range due
+to a range movement operation (bootstrap, move, replace). Run ``nodetool 
cleanup`` on the nodes that lost ranges to the
+joining node when you are satisfied the new node is up and working. If you do 
not do this the old data will still be
+counted against the load on that node.
 
 Repair
 ------

Reply via email to