[ 
https://issues.apache.org/jira/browse/HADOOP-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616652#action_12616652
 ] 

Lohit Vijayarenu commented on HADOOP-3631:
------------------------------------------

On trunk this the sequence of events that happen during checkpoint.
- SecondaryNameNode request rollEditLog() and NameNode would close edits and 
starts writing to edits.new. (This should not take much time)
- SecondaryNameNode gets edits from NameNode via HTTP (This might take time 
depending on the size of edits)
- SecondaryNameNode merges edits with image creating fsimage.ckpt (This wont 
affect NameNode)
- SecondaryNameNode transfers the image to NameNode via HTTP (This again takes 
time depending on the size of new image)
- SecondaryNameNode request rollFSImage() and NameNode renames edits.new to 
edits and fsimage.ckpt to fsimage. (This should not take much time)

Most of the time is spent in transferring the Image over HTTP. One way out of 
this is to not do the transfer. Configure a shared directory and 
SecondaryNameNode reads and writes to that directory requesting NameNode to 
read from in there. This could be a special startup option for SecondarNameNode.

> Transfer of image from secondary name node should not interrupt service
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-3631
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3631
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Robert Chansler
>            Priority: Critical
>             Fix For: 0.19.0
>
>
> The transfer of the new image prepared by the secondary name node can 
> interfere with client services. Clients observe delays in completing RPCs. In 
> general, administrative activities should not be observed by the clients. For 
> large clusters, administrators are reluctant to run the secondary name node 
> leading to excessive edit logs. (Excessive in the sense that if the cluster 
> must be restarted, a long time is required to process the log.)
> Maybe the new image does not have to be transfered; it could be fetched when 
> needed.
> Maybe the priority of the transfer task can be reduced so that the transfer 
> is not observed.
> Maybe a different transfer protocol is more appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to