[2/6] accumulo git commit: ACCUMULO-4072 Document how to run multiple tservers on one host in user manual
ACCUMULO-4072 Document how to run multiple tservers on one host in user manual Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/97a92a09 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/97a92a09 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/97a92a09 Branch: refs/heads/1.7 Commit: 97a92a0917fa9e55fb8b033846705c24f870e764 Parents: da7aaea Author: Josh Elser Authored: Mon Dec 28 13:28:56 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:28:56 2015 -0500 -- .../chapters/administration.tex | 48 +++- 1 file changed, 46 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/97a92a09/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex -- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex index 120e88b..73044c1 100644 --- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex +++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex @@ -342,7 +342,7 @@ $ACCUMULO_HOME/bin/accumulo admin start { ...} Alternatively, you can ssh to each of the hosts you want to add and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/start-here.sh +$ACCUMULO_HOME/bin/start-here.sh \end{verbatim}\endgroup Make sure the host in question has the new configuration, or else the tablet @@ -361,7 +361,7 @@ $ACCUMULO_HOME/bin/accumulo admin stop { ...} Alternatively, you can ssh to each of the hosts you want to remove and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/stop-here.sh +$ACCUMULO_HOME/bin/stop-here.sh \end{verbatim}\endgroup Be sure to update your \texttt{\$ACCUMULO\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to @@ -389,6 +389,50 @@ from the \texttt{\$ACCUMULO\_HOME/conf/slaves} file) to gracefully stop a node. ensure that the tabletserver is cleanly stopped and recovery will not need to be performed when the tablets are re-hosted. +\subsection{Running multiple TabletServers on a single node} + +With very powerful nodes, it may be beneficial to run more than one TabletServer on a given +node. This decision should be made carefully and with much deliberation as Accumulo is designed +to be able to scale to using 10's of GB of RAM and 10's of CPU cores. + +To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration +directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise +for the user. + +Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from +other nodes. This requires additional configuration values in \texttt{accumulo-site.xml}: + +\begin{itemize} + \item tserver.port.client +\end{itemize} + +Normally, setting a value of \texttt{0} for these configuration properties is sufficient. In some +environment, the ports used by Accumulo must be well-known for security reasons and require a +separate copy of the configuration files to use a static port for each TabletServer instance. + +It is also necessary to update the following exported variables in \texttt{accumulo-env.sh}. + +\begin{itemize} + \item ACCUMULO\_LOG\_DIR +\end{itemize} + +The values for these properties are left up to the user to define; there are no constraints +other than ensuring that the directory exists and the user running Accumulo has the permission +to read/write into that directory. + +Accumulo's provided scripts for stopping a cluster operate under the assumption that one process +is running per host. As such, starting and stopping multiple TabletServers on one host requires +more effort on the user. It is important to ensure that \texttt{ACCUMULO\_CONF\_DIR} is correctly +set for the instance of the TabletServer being started. + +\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} +$ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address & +\end{verbatim}\endgroup + +To stop TabletServers, the normal \texttt{stop-all.sh} will stop all instances of TabletServers across all nodes. +Using the provided \texttt{kill} command by your operation system is an option to stop a single instance on +a single node. \texttt{stop-server.sh} can be used to stop all TabletServers on a single node. + \section{Monitoring} The Accumulo Master provides an interface for monitoring the status and health of
[4/6] accumulo git commit: Merge branch '1.6' into 1.7
Merge branch '1.6' into 1.7 Conflicts: docs/src/main/latex/accumulo_user_manual/chapters/administration.tex Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/14e88e4b Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/14e88e4b Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/14e88e4b Branch: refs/heads/master Commit: 14e88e4ba6fef601bca2fa1307b654173315f356 Parents: 5efa0bd 97a92a0 Author: Josh Elser Authored: Mon Dec 28 13:37:16 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:37:16 2015 -0500 -- .../main/asciidoc/chapters/administration.txt | 40 1 file changed, 40 insertions(+) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/14e88e4b/docs/src/main/asciidoc/chapters/administration.txt -- diff --cc docs/src/main/asciidoc/chapters/administration.txt index 0a29711,000..919ec8f mode 100644,00..100644 --- a/docs/src/main/asciidoc/chapters/administration.txt +++ b/docs/src/main/asciidoc/chapters/administration.txt @@@ -1,1106 -1,0 +1,1146 @@@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +== Administration + +=== Hardware + +Because we are running essentially two or three systems simultaneously layered +across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to +consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have +at least one core and 2 - 4 GB each. + +One core running HDFS can typically keep 2 to 4 disks busy, so each machine may +typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks. + +It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB +each, but in this case it is recommended to only run up to two processes per +machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but +not all three. The constraint here is having enough available heap space for all the +processes on a machine. + +=== Network + +Accumulo communicates via remote procedure calls over TCP/IP for both passing +data and control messages. In addition, Accumulo uses HDFS clients to +communicate with HDFS. To achieve good ingest and query performance, sufficient +network bandwidth must be available between any two machines. + +In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will +use the following default ports. Please make sure that they are open, or change +their value in conf/accumulo-site.xml. + +.Accumulo default ports +[width="75%",cols=">,^2,^2"] +[options="header"] +| +|Port | Description | Property Name +|4445 | Shutdown Port (Accumulo MiniCluster) | n/a +|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j +|9997 | Tablet Server | tserver.port.client +| | Master Server | master.port.client +|12234 | Accumulo Tracer | trace.port.client +|42424 | Accumulo Proxy Server | n/a +|50091 | Accumulo GC | gc.port.client +|50095 | Accumulo HTTP monitor | monitor.port.client +|10001 | Master Replication service | master.replication.coordinator.port +|10002 | TabletServer Replication service | replication.receipt.service.port +| + +In addition, the user can provide +0+ and an ephemeral port will be chosen instead. This +ephemeral port is likely to be unique and not already bound. Thus, configuring ports to +use +0+ instead of an explicit value, should, in most cases, work around any issues of +running multiple distinct Accumulo instances (or any other process which tries to use the +same default ports) on the same hardware. + +=== Installation +Choose a directory for the Accumulo installation. This directory will be referenced +by the environment variable +$ACCUMULO_HOME+. Run the following: + + $ tar xzf accumulo-1.6.0-bin.tar.gz# unpack to subdirectory + $ mv accumulo-1.6.0 $ACCUMULO_HOME # move to desired location + +Repeat this step at each
[6/6] accumulo git commit: Merge branch '1.7'
Merge branch '1.7' Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/7f4cb62b Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/7f4cb62b Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/7f4cb62b Branch: refs/heads/master Commit: 7f4cb62b01595fa3477f7d0917d64ba91ed1ba17 Parents: 7b1e26a 14e88e4 Author: Josh Elser Authored: Mon Dec 28 13:37:25 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:37:50 2015 -0500 -- .../main/asciidoc/chapters/administration.txt | 41 1 file changed, 41 insertions(+) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/7f4cb62b/docs/src/main/asciidoc/chapters/administration.txt -- diff --cc docs/src/main/asciidoc/chapters/administration.txt index 01c5c5c,919ec8f..26ec395 --- a/docs/src/main/asciidoc/chapters/administration.txt +++ b/docs/src/main/asciidoc/chapters/administration.txt @@@ -472,6 -429,46 +472,47 @@@ from the +$ACCUMULO_HOME/conf/slaves+ f ensure that the tabletserver is cleanly stopped and recovery will not need to be performed when the tablets are re-hosted. + Running multiple TabletServers on a single node + + With very powerful nodes, it may be beneficial to run more than one TabletServer on a given + node. This decision should be made carefully and with much deliberation as Accumulo is designed + to be able to scale to using 10's of GB of RAM and 10's of CPU cores. + + To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration + directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise + for the user. + + Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from + other nodes. This requires additional configuration values in +accumulo-site.xml+: + + * +tserver.port.client+ + * +replication.receipt.service.port+ + + Normally, setting a value of +0+ for these configuration properties is sufficient. In some + environment, the ports used by Accumulo must be well-known for security reasons and require a + separate copy of the configuration files to use a static port for each TabletServer instance. + + It is also necessary to update the following exported variables in +accumulo-env.sh+. + + * +ACCUMULO_LOG_DIR+ ++* +ACCUMULO_PID_DIR+ + + The values for these properties are left up to the user to define; there are no constraints + other than ensuring that the directory exists and the user running Accumulo has the permission + to read/write into that directory. + + Accumulo's provided scripts for stopping a cluster operate under the assumption that one process + is running per host. As such, starting and stopping multiple TabletServers on one host requires + more effort on the user. It is important to ensure that +ACCUMULO_CONF_DIR+ is correctly + set for the instance of the TabletServer being started. + + $ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address & + + To stop TabletServers, the normal +stop-all.sh+ will stop all instances of TabletServers across all nodes. + Using the provided +kill+ command by your operation system is an option to stop a single instance on + a single node. +stop-server.sh+ can be used to stop all TabletServers on a single node. + + [[monitoring]] === Monitoring
[1/6] accumulo git commit: ACCUMULO-4072 Document how to run multiple tservers on one host in user manual
Repository: accumulo Updated Branches: refs/heads/1.6 da7aaeae9 -> 97a92a091 refs/heads/1.7 5efa0bd9b -> 14e88e4ba refs/heads/master 7b1e26ae2 -> 7f4cb62b0 ACCUMULO-4072 Document how to run multiple tservers on one host in user manual Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/97a92a09 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/97a92a09 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/97a92a09 Branch: refs/heads/1.6 Commit: 97a92a0917fa9e55fb8b033846705c24f870e764 Parents: da7aaea Author: Josh Elser Authored: Mon Dec 28 13:28:56 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:28:56 2015 -0500 -- .../chapters/administration.tex | 48 +++- 1 file changed, 46 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/97a92a09/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex -- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex index 120e88b..73044c1 100644 --- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex +++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex @@ -342,7 +342,7 @@ $ACCUMULO_HOME/bin/accumulo admin start { ...} Alternatively, you can ssh to each of the hosts you want to add and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/start-here.sh +$ACCUMULO_HOME/bin/start-here.sh \end{verbatim}\endgroup Make sure the host in question has the new configuration, or else the tablet @@ -361,7 +361,7 @@ $ACCUMULO_HOME/bin/accumulo admin stop { ...} Alternatively, you can ssh to each of the hosts you want to remove and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/stop-here.sh +$ACCUMULO_HOME/bin/stop-here.sh \end{verbatim}\endgroup Be sure to update your \texttt{\$ACCUMULO\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to @@ -389,6 +389,50 @@ from the \texttt{\$ACCUMULO\_HOME/conf/slaves} file) to gracefully stop a node. ensure that the tabletserver is cleanly stopped and recovery will not need to be performed when the tablets are re-hosted. +\subsection{Running multiple TabletServers on a single node} + +With very powerful nodes, it may be beneficial to run more than one TabletServer on a given +node. This decision should be made carefully and with much deliberation as Accumulo is designed +to be able to scale to using 10's of GB of RAM and 10's of CPU cores. + +To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration +directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise +for the user. + +Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from +other nodes. This requires additional configuration values in \texttt{accumulo-site.xml}: + +\begin{itemize} + \item tserver.port.client +\end{itemize} + +Normally, setting a value of \texttt{0} for these configuration properties is sufficient. In some +environment, the ports used by Accumulo must be well-known for security reasons and require a +separate copy of the configuration files to use a static port for each TabletServer instance. + +It is also necessary to update the following exported variables in \texttt{accumulo-env.sh}. + +\begin{itemize} + \item ACCUMULO\_LOG\_DIR +\end{itemize} + +The values for these properties are left up to the user to define; there are no constraints +other than ensuring that the directory exists and the user running Accumulo has the permission +to read/write into that directory. + +Accumulo's provided scripts for stopping a cluster operate under the assumption that one process +is running per host. As such, starting and stopping multiple TabletServers on one host requires +more effort on the user. It is important to ensure that \texttt{ACCUMULO\_CONF\_DIR} is correctly +set for the instance of the TabletServer being started. + +\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} +$ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address & +\end{verbatim}\endgroup + +To stop TabletServers, the normal \texttt{stop-all.sh} will stop all instances of TabletServers across all nodes. +Using the provided \texttt{kill} command by your operation system is an option to stop a single instance on +a single node. \texttt{stop-server.sh} can be used to stop all TabletServers on a single node. + \section{Monitoring} The Accumulo Master provides an interface for monito
[5/6] accumulo git commit: Merge branch '1.6' into 1.7
Merge branch '1.6' into 1.7 Conflicts: docs/src/main/latex/accumulo_user_manual/chapters/administration.tex Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/14e88e4b Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/14e88e4b Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/14e88e4b Branch: refs/heads/1.7 Commit: 14e88e4ba6fef601bca2fa1307b654173315f356 Parents: 5efa0bd 97a92a0 Author: Josh Elser Authored: Mon Dec 28 13:37:16 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:37:16 2015 -0500 -- .../main/asciidoc/chapters/administration.txt | 40 1 file changed, 40 insertions(+) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/14e88e4b/docs/src/main/asciidoc/chapters/administration.txt -- diff --cc docs/src/main/asciidoc/chapters/administration.txt index 0a29711,000..919ec8f mode 100644,00..100644 --- a/docs/src/main/asciidoc/chapters/administration.txt +++ b/docs/src/main/asciidoc/chapters/administration.txt @@@ -1,1106 -1,0 +1,1146 @@@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +== Administration + +=== Hardware + +Because we are running essentially two or three systems simultaneously layered +across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to +consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have +at least one core and 2 - 4 GB each. + +One core running HDFS can typically keep 2 to 4 disks busy, so each machine may +typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks. + +It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB +each, but in this case it is recommended to only run up to two processes per +machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but +not all three. The constraint here is having enough available heap space for all the +processes on a machine. + +=== Network + +Accumulo communicates via remote procedure calls over TCP/IP for both passing +data and control messages. In addition, Accumulo uses HDFS clients to +communicate with HDFS. To achieve good ingest and query performance, sufficient +network bandwidth must be available between any two machines. + +In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will +use the following default ports. Please make sure that they are open, or change +their value in conf/accumulo-site.xml. + +.Accumulo default ports +[width="75%",cols=">,^2,^2"] +[options="header"] +| +|Port | Description | Property Name +|4445 | Shutdown Port (Accumulo MiniCluster) | n/a +|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j +|9997 | Tablet Server | tserver.port.client +| | Master Server | master.port.client +|12234 | Accumulo Tracer | trace.port.client +|42424 | Accumulo Proxy Server | n/a +|50091 | Accumulo GC | gc.port.client +|50095 | Accumulo HTTP monitor | monitor.port.client +|10001 | Master Replication service | master.replication.coordinator.port +|10002 | TabletServer Replication service | replication.receipt.service.port +| + +In addition, the user can provide +0+ and an ephemeral port will be chosen instead. This +ephemeral port is likely to be unique and not already bound. Thus, configuring ports to +use +0+ instead of an explicit value, should, in most cases, work around any issues of +running multiple distinct Accumulo instances (or any other process which tries to use the +same default ports) on the same hardware. + +=== Installation +Choose a directory for the Accumulo installation. This directory will be referenced +by the environment variable +$ACCUMULO_HOME+. Run the following: + + $ tar xzf accumulo-1.6.0-bin.tar.gz# unpack to subdirectory + $ mv accumulo-1.6.0 $ACCUMULO_HOME # move to desired location + +Repeat this step at each mac
[3/6] accumulo git commit: ACCUMULO-4072 Document how to run multiple tservers on one host in user manual
ACCUMULO-4072 Document how to run multiple tservers on one host in user manual Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/97a92a09 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/97a92a09 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/97a92a09 Branch: refs/heads/master Commit: 97a92a0917fa9e55fb8b033846705c24f870e764 Parents: da7aaea Author: Josh Elser Authored: Mon Dec 28 13:28:56 2015 -0500 Committer: Josh Elser Committed: Mon Dec 28 13:28:56 2015 -0500 -- .../chapters/administration.tex | 48 +++- 1 file changed, 46 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/97a92a09/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex -- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex index 120e88b..73044c1 100644 --- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex +++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex @@ -342,7 +342,7 @@ $ACCUMULO_HOME/bin/accumulo admin start { ...} Alternatively, you can ssh to each of the hosts you want to add and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/start-here.sh +$ACCUMULO_HOME/bin/start-here.sh \end{verbatim}\endgroup Make sure the host in question has the new configuration, or else the tablet @@ -361,7 +361,7 @@ $ACCUMULO_HOME/bin/accumulo admin stop { ...} Alternatively, you can ssh to each of the hosts you want to remove and run: \begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ACCUMULO\_HOME/bin/stop-here.sh +$ACCUMULO_HOME/bin/stop-here.sh \end{verbatim}\endgroup Be sure to update your \texttt{\$ACCUMULO\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to @@ -389,6 +389,50 @@ from the \texttt{\$ACCUMULO\_HOME/conf/slaves} file) to gracefully stop a node. ensure that the tabletserver is cleanly stopped and recovery will not need to be performed when the tablets are re-hosted. +\subsection{Running multiple TabletServers on a single node} + +With very powerful nodes, it may be beneficial to run more than one TabletServer on a given +node. This decision should be made carefully and with much deliberation as Accumulo is designed +to be able to scale to using 10's of GB of RAM and 10's of CPU cores. + +To run multiple TabletServers on a single host, it is necessary to create multiple Accumulo configuration +directories. Ensuring that these properties are appropriately set (and remain consistent) are an exercise +for the user. + +Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from +other nodes. This requires additional configuration values in \texttt{accumulo-site.xml}: + +\begin{itemize} + \item tserver.port.client +\end{itemize} + +Normally, setting a value of \texttt{0} for these configuration properties is sufficient. In some +environment, the ports used by Accumulo must be well-known for security reasons and require a +separate copy of the configuration files to use a static port for each TabletServer instance. + +It is also necessary to update the following exported variables in \texttt{accumulo-env.sh}. + +\begin{itemize} + \item ACCUMULO\_LOG\_DIR +\end{itemize} + +The values for these properties are left up to the user to define; there are no constraints +other than ensuring that the directory exists and the user running Accumulo has the permission +to read/write into that directory. + +Accumulo's provided scripts for stopping a cluster operate under the assumption that one process +is running per host. As such, starting and stopping multiple TabletServers on one host requires +more effort on the user. It is important to ensure that \texttt{ACCUMULO\_CONF\_DIR} is correctly +set for the instance of the TabletServer being started. + +\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} +$ACCUMULO_CONF_DIR=$ACCUMULO_HOME/conf $ACCUMULO_HOME/bin/accumulo tserver --address & +\end{verbatim}\endgroup + +To stop TabletServers, the normal \texttt{stop-all.sh} will stop all instances of TabletServers across all nodes. +Using the provided \texttt{kill} command by your operation system is an option to stop a single instance on +a single node. \texttt{stop-server.sh} can be used to stop all TabletServers on a single node. + \section{Monitoring} The Accumulo Master provides an interface for monitoring the status and health of
accumulo git commit: ACCUMULO-4091 added MutationsRejectedException discussions
Repository: accumulo Updated Branches: refs/heads/master af040bfb4 -> 7b1e26ae2 ACCUMULO-4091 added MutationsRejectedException discussions Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/7b1e26ae Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/7b1e26ae Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/7b1e26ae Branch: refs/heads/master Commit: 7b1e26ae29bcb89c02aeb508864c48cb46f427fa Parents: af040bf Author: Eric C. Newton Authored: Mon Dec 28 11:41:19 2015 -0500 Committer: Eric C. Newton Committed: Mon Dec 28 11:41:19 2015 -0500 -- .../main/asciidoc/chapters/troubleshooting.txt | 55 1 file changed, 55 insertions(+) -- http://git-wip-us.apache.org/repos/asf/accumulo/blob/7b1e26ae/docs/src/main/asciidoc/chapters/troubleshooting.txt -- diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt index ada0fbf..9546638 100644 --- a/docs/src/main/asciidoc/chapters/troubleshooting.txt +++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt @@ -229,6 +229,61 @@ messages to zookeeper. *A*: Ensure the tablet server JVM is not running low on memory. +*Q*: I'm seeing errors in tablet server logs that include the words "MutationsRejectedException" and "# constraint violations: 1". Moments after that the server died. + +The error you are seeing is part of a failing tablet server scenario. +This is a bit complicated, so name two of your tablet servers A and B. + +Tablet server A is hosting a tablet, let's call it a-tablet. + +Tablet server B is hosting a metadata tablet, let's call it m-tablet. + +m-tablet records the information about a-tablet, for example, the names of the files it is using to store data. + +When A ingests some data, it eventually flushes the updates from memory to a file. + +Tablet server A then writes this new information to m-tablet, on Tablet server B. + +Here's a likely failure scenario: + +Tablet server A does not have enough memory for all the processes running on it. +The operating system sees a large chunk of the tablet server being unused, and swaps it out to disk to make room for other processes. +Tablet server A does a java memory garbage collection, which causes it to start using all the memory allocated to it. +As the server starts pulling data from swap, it runs very slowly. +It fails to send the keep-alive messages to zookeeper in a timely fashion, and it looses its zookeeper session. + +But, it's running so slowly, that it takes a moment to realize it should no longer be hosting tablets. + +The thread that is flushing a-tablet memory attempts to update m-tablet with the new file information. + +Fortunately there's a constraint on m-tablet. +Mutations to the metadata table must contain a valid zookeeper session. +This prevents tablet server A from making updates to m-tablet when it no long has the right to host the tablet. + +The "MutationsRejectedException" error is from tablet server A making an update to tablet server B's m-tablet. +It's getting a constraint violation: tablet server A has lost its zookeeper session, and will fail momentarily. + +*A*: Ensure that memory is not over-allocated. Monitor swap usage, or turn swap off. + +*Q*: My accumulo client is getting a MutationsRejectedException. The monitor is displaying "No Such SessionID" errors. + +When your client starts sending mutations to accumulo, it creates a session. Once the session is created, +mutations are streamed to accumulo, without acknowledgement, against this session. Once the client is done, +it will close the session, and get an acknowledgement. + +If the client fails to communicate with accumulo, it will release the session, assuming that the client has died. +If the client then attempts to send more mutations against the session, you will see "No Such SessionID" errors on +the server, and MutationRejectedExceptions in the client. + +The client library should be either actively using the connection to the tablet servers, +or closing the connection and sessions. If the session times out, something is causing your client +to pause. + +The most frequent source of these pauses are java garbage collection pauses +due to the JVM running out of memory, or being swapped out to disk. + +*A*: Ensure your client has adequate memory and is not being swapped out to disk. + ### Tools The accumulo script can be used to run classes from the command line.