Eric & Keith, Chris mentioned to me that you guys have seen this issue before. Any ideas from anyone else are much appreciated as well.
I recently updated a project's dependencies to Accumulo 1.6.0 built with Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest component which is running all the time with a batch writer using many threads to push mutations into Accumulo. The issue I'm having is a show stopper. At different intervals of time, sometimes an hour, sometimes 30 minutes, I'm getting MutationsRejectedExceptions (server errors) from the TabletServerBatchWriter. Once they start, I need to restart the ingester to get them to stop. They always come back within 30 minutes to an hour... rinse, repeat. The exception always happens on different tablet servers. It's a thrift error saying a message was received out of sequence. In the TabletServer logs, I see an "Invalid session id" exception which happens only once before the client-side batch writer starts spitting out the MREs. I'm running some heavyweight processing in Storm along side the tablet servers. I shut that processing off in hopes that maybe it was the culprit but that hasn't fixed the issue. I'm surprised I haven't seen any other posts on the topic. Thanks!
