[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

Todd Lipcon (JIRA) Tue, 28 Aug 2012 13:59:09 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443526#comment-13443526
 ]


Todd Lipcon commented on HDFS-1490:
-----------------------------------

- I dont like reusing the ipc ping interval for this timeout here. It's from an 
entirely separate module, and I don't see why one should correlate to the 
other. Why not introduce a new config which defaults to something like 1 minute?
- In the test case, shouldn't you somehow notify the servlet to exit? Currently 
it waits on itself, but nothing notifies it. 

                
> TransferFSImage should timeout
> ------------------------------
>
>                 Key: HDFS-1490
>                 URL: https://issues.apache.org/jira/browse/HDFS-1490
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>            Priority: Minor
>         Attachments: HDFS-1490.patch, HDFS-1490.patch
>
>
> Sometimes when primary crashes during image transfer secondary namenode would 
> hang trying to read the image from HTTP connection forever.
> It would be great to set timeouts on the connection so if something like that 
> happens there is no need to restart the secondary itself.
> In our case restarting components is handled by the set of scripts and since 
> the Secondary as the process is running it would just stay hung until we get 
> an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

Reply via email to