[ https://issues.apache.org/jira/browse/HBASE-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438863#comment-13438863 ]
nkeywal commented on HBASE-6490: -------------------------------- I don't think it's an issue to increase globally. I haven't yet looked at the memstore flush, but I think it's gonna be the same or worse: we don't really expecting a write to fail. I need to check if it can be a fixed value or if we need to take into account the replication factor or the number of machine... > 'dfs.client.block.write.retries' value could be increased in HBase > ------------------------------------------------------------------ > > Key: HBASE-6490 > URL: https://issues.apache.org/jira/browse/HBASE-6490 > Project: HBase > Issue Type: Improvement > Components: master, regionserver > Affects Versions: 0.96.0 > Environment: all > Reporter: nkeywal > Priority: Minor > > When allocating a new node during writing, hdfs tries > 'dfs.client.block.write.retries' times (default 3) to write the block. When > it fails, it goes back to the nanenode for a new list, and raises an error if > the number of retries is reached. In HBase, if the error is while we're > writing a hlog file, it will trigger a region server abort (as hbase does not > trust the log anymore). For simple case (new, and as such empty log file), > this seems to be ok, and we don't lose data. There could be some complex > cases if the error occurs on a hlog file with already multiple blocks written. > Logs lines are: > "Exception in createBlockOutputStream", then "Abandoning block " followed by > "Excluding datanode " for a retry. > IOException: "Unable to create new block.", when the number of retries is > reached. > Probability of occurence seems quite low, (number of bad nodes / number of > nodes)^(number of retries), and it implies that you have a region server > without its datanode. But it's per new block. > Increasing the default value of 'dfs.client.block.write.retries' could make > sense to be better covered in chaotic conditions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira