Repository: accumulo
Updated Branches:
  refs/heads/master af040bfb4 -> 7b1e26ae2


ACCUMULO-4091 added MutationsRejectedException discussions


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/7b1e26ae
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/7b1e26ae
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/7b1e26ae

Branch: refs/heads/master
Commit: 7b1e26ae29bcb89c02aeb508864c48cb46f427fa
Parents: af040bf
Author: Eric C. Newton <eric.new...@gmail.com>
Authored: Mon Dec 28 11:41:19 2015 -0500
Committer: Eric C. Newton <eric.new...@gmail.com>
Committed: Mon Dec 28 11:41:19 2015 -0500

----------------------------------------------------------------------
 .../main/asciidoc/chapters/troubleshooting.txt  | 55 ++++++++++++++++++++
 1 file changed, 55 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/7b1e26ae/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt 
b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index ada0fbf..9546638 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -229,6 +229,61 @@ messages to zookeeper.
 
 *A*: Ensure the tablet server JVM is not running low on memory.
 
+*Q*: I'm seeing errors in tablet server logs that include the words 
"MutationsRejectedException" and "# constraint violations: 1". Moments after 
that the server died.
+
+The error you are seeing is part of a failing tablet server scenario.
+This is a bit complicated, so name two of your tablet servers A and B.
+
+Tablet server A is hosting a tablet, let's call it a-tablet.
+
+Tablet server B is hosting a metadata tablet, let's call it m-tablet.
+
+m-tablet records the information about a-tablet, for example, the names of the 
files it is using to store data.
+
+When A ingests some data, it eventually flushes the updates from memory to a 
file.
+
+Tablet server A then writes this new information to m-tablet, on Tablet server 
B.
+
+Here's a likely failure scenario:
+
+Tablet server A does not have enough memory for all the processes running on 
it.
+The operating system sees a large chunk of the tablet server being unused, and 
swaps it out to disk to make room for other processes.
+Tablet server A does a java memory garbage collection, which causes it to 
start using all the memory allocated to it.
+As the server starts pulling data from swap, it runs very slowly.
+It fails to send the keep-alive messages to zookeeper in a timely fashion, and 
it looses its zookeeper session.
+
+But, it's running so slowly, that it takes a moment to realize it should no 
longer be hosting tablets.
+
+The thread that is flushing a-tablet memory attempts to update m-tablet with 
the new file information.
+
+Fortunately there's a constraint on m-tablet.
+Mutations to the metadata table must contain a valid zookeeper session.
+This prevents tablet server A from making updates to m-tablet when it no long 
has the right to host the tablet.
+
+The "MutationsRejectedException" error is from tablet server A making an 
update to tablet server B's m-tablet.
+It's getting a constraint violation: tablet server A has lost its zookeeper 
session, and will fail momentarily.
+
+*A*: Ensure that memory is not over-allocated.  Monitor swap usage, or turn 
swap off.
+
+*Q*: My accumulo client is getting a MutationsRejectedException. The monitor 
is displaying "No Such SessionID" errors.
+
+When your client starts sending mutations to accumulo, it creates a session. 
Once the session is created,
+mutations are streamed to accumulo, without acknowledgement, against this 
session.  Once the client is done,
+it will close the session, and get an acknowledgement.
+
+If the client fails to communicate with accumulo, it will release the session, 
assuming that the client has died.
+If the client then attempts to send more mutations against the session, you 
will see "No Such SessionID" errors on
+the server, and MutationRejectedExceptions in the client.
+
+The client library should be either actively using the connection to the 
tablet servers,
+or closing the connection and sessions. If the session times out, something is 
causing your client
+to pause.
+
+The most frequent source of these pauses are java garbage collection pauses
+due to the JVM running out of memory, or being swapped out to disk.
+
+*A*: Ensure your client has adequate memory and is not being swapped out to 
disk.
+
 ### Tools
 
 The accumulo script can be used to run classes from the command line.

Reply via email to