If you get an error from a BatchWriter, you pretty much have to throw
away that instance of the BatchWriter and make a new one. See
ACCUMULO-2990. If you want, you should be able to catch/recover from
this without having to restart the ingester.
If the session ID is invalid, my guess is that it hasn't been used
recently and the tserver cleaned it up. The exception logic isn't the
greatest (as it just is presented to you as a RTE).
https://issues.apache.org/jira/browse/ACCUMULO-2990
On 8/22/14, 4:35 PM, Corey Nolet wrote:
Eric & Keith, Chris mentioned to me that you guys have seen this issue
before. Any ideas from anyone else are much appreciated as well.
I recently updated a project's dependencies to Accumulo 1.6.0 built with
Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
component which is running all the time with a batch writer using many
threads to push mutations into Accumulo.
The issue I'm having is a show stopper. At different intervals of time,
sometimes an hour, sometimes 30 minutes, I'm getting
MutationsRejectedExceptions (server errors) from the
TabletServerBatchWriter. Once they start, I need to restart the ingester to
get them to stop. They always come back within 30 minutes to an hour...
rinse, repeat.
The exception always happens on different tablet servers. It's a thrift
error saying a message was received out of sequence. In the TabletServer
logs, I see an "Invalid session id" exception which happens only once
before the client-side batch writer starts spitting out the MREs.
I'm running some heavyweight processing in Storm along side the tablet
servers. I shut that processing off in hopes that maybe it was the culprit
but that hasn't fixed the issue.
I'm surprised I haven't seen any other posts on the topic.
Thanks!