[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

nkeywal (JIRA) Fri, 20 Jul 2012 14:21:37 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419581#comment-13419581
 ]


nkeywal commented on HBASE-6435:
--------------------------------

My thinking was it could make it on a hdfs release that accepts changing public 
interfaces. I fully agree with you Todd, we need to do our homeworks and push 
hdfs to ensure that what we need is understood and makes it to a release. On 
the other hand, if I look at how it worked for much simpler stuff like JUnit 
and surefire, our changes are in theie trunk for a few months and we're still 
waiting. These things take time. But I will do my homeworks on hdfs, I promise 
(I may need your help actually). The Jira will be created next week and if I 
have enough feedback I will propose a patch.

I was also wondering if proposing natively to have interceptors would not be 
interesting for hdfs. It was available a long time in an orb called orbix and 
was great to use. But they would need to be per conf, so cannot be available 
with static stuff.

bq. Do we have to do this in both master and regionserver? Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)
It can be put pretty late, basically before we start a recovery process. But we 
don't want it client side, so I will check this.

bq. Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might 
want to add more customizations while waiting on HDFS fix to arrive.
I've intercepted a lower level call: I'm between the DFSClient and the 
namenode. This because the DFSClient does more than just transferring calls: it 
contains some logic. Hence going in front of the namenode. But yes, I could 
make it more generic.

                
> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6435
>                 URL: https://issues.apache.org/jira/browse/HBASE-6435
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>         Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

Reply via email to