[jira] [Commented] (HBASE-6490) 'dfs.client.block.write.retries' value could be increased in HBase

nkeywal (JIRA) Tue, 21 Aug 2012 10:07:41 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438863#comment-13438863
 ]


nkeywal commented on HBASE-6490:
--------------------------------

I don't think it's an issue to increase globally. I haven't yet looked at the 
memstore flush, but I think it's gonna be the same or worse: we don't really 
expecting a write to fail.
I need to check if it can be a fixed value or if we need to take into account 
the replication factor or the number of machine...
                
> 'dfs.client.block.write.retries' value could be increased in HBase
> ------------------------------------------------------------------
>
>                 Key: HBASE-6490
>                 URL: https://issues.apache.org/jira/browse/HBASE-6490
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.96.0
>         Environment: all
>            Reporter: nkeywal
>            Priority: Minor
>
> When allocating a new node during writing, hdfs tries 
> 'dfs.client.block.write.retries' times (default 3) to write the block. When 
> it fails, it goes back to the nanenode for a new list, and raises an error if 
> the number of retries is reached. In HBase, if the error is while we're 
> writing a hlog file, it will trigger a region server abort (as hbase does not 
> trust the log anymore). For simple case (new, and as such empty log file), 
> this seems to be ok, and we don't lose data. There could be some complex 
> cases if the error occurs on a hlog file with already multiple blocks written.
> Logs lines are:
> "Exception in createBlockOutputStream", then "Abandoning block " followed by 
> "Excluding datanode " for a retry.
> IOException: "Unable to create new block.", when the number of retries is 
> reached.
> Probability of occurence seems quite low, (number of bad nodes / number of 
> nodes)^(number of retries), and it implies that you have a region server 
> without its datanode. But it's per new block.
> Increasing the default value of 'dfs.client.block.write.retries' could make 
> sense to be better covered in chaotic conditions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6490) 'dfs.client.block.write.retries' value could be increased in HBase

Reply via email to