Hi Edward, these are not good news for us. If under low load you get 30 seconds our 3 minutes are quite normal. Especially because your records are quite big and there is lots of removals and inserts. I just wonder if our use case scenarios are not in the sweet spot of hbase or hbase availability simply low. Do you have any knowledge about change to architecture in 0.21? As far as I can see partially problem is with dividing logs from dead data node to table files logs. Is there any way we could speed up recovery ? And can someone explain what happened when we shutdown 3/6 regions servers? Why cluster got into inconsistent state with so many missing regions? Is this so extra usual situation that hbase can't handle?
Thanks, Michal