Josh Elser created ACCUMULO-3811:
------------------------------------
Summary: Improve exception during held commits sent back to
clients from BatchWriter
Key: ACCUMULO-3811
URL: https://issues.apache.org/jira/browse/ACCUMULO-3811
Project: Accumulo
Issue Type: Improvement
Components: client, tserver
Reporter: Josh Elser
Fix For: 1.8.0
Running CI on 1.7.0_rc3, I'm noticing that with datanode agitation, I'm
frequently seeing the BatchWriter die.
It seems to be that when the ingester is trying to flush right after a datanode
dies, the system is polling to minor compact, which blocks the flush and
ultimately results in throwing a HoldTimeoutException.
It might be due to under-replication that there are no other datanode available
to serve the necessary block, but it's a good example of how clients have no
way to recover from this case. Client should be able to know if the system is
blocking writes and be able to wait and then retry their update. Right now they
just see an opaque AccumuloSecurityException without reason as to the nature of
the failure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)