[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439755#comment-13439755 ] Aaron T. Myers commented on HDFS-3561: -- Great! Thanks for doing that. I'm going to co

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439750#comment-13439750 ] Vinay commented on HDFS-3561: - Yes Aaron, We tested the described scenario after setting number

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439706#comment-13439706 ] Aaron T. Myers commented on HDFS-3561: -- +1, the latest patch looks good to me. Vinay,

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-22 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439573#comment-13439573 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-21 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439100#comment-13439100 ] Aaron T. Myers commented on HDFS-3561: -- Sounds good, Vinay. I'll be happy to review/co

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-20 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438463#comment-13438463 ] Vinay commented on HDFS-3561: - Thanks Aaron, I agree with your preference. I will post a new p

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-17 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437160#comment-13437160 ] Aaron T. Myers commented on HDFS-3561: -- That's a good point, Vinay, that the method wi

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-08-12 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432946#comment-13432946 ] Vinay commented on HDFS-3561: - Hi [~atm] any more comments you have on this..?

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-14 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414316#comment-13414316 ] Vinay commented on HDFS-3561: - That sounds good. But as of now, in ZKFC, tryGracefulFence() is

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-13 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414093#comment-13414093 ] Aaron T. Myers commented on HDFS-3561: -- Instead of creating a new Configuration object

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-13 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413654#comment-13413654 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-10 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410903#comment-13410903 ] Aaron T. Myers commented on HDFS-3561: -- I'd think that we'd only want the lower number

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-10 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410218#comment-13410218 ] Vinay commented on HDFS-3561: - Thanks Aaron for the suggestion. I have one question here. Shal

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-09 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409927#comment-13409927 ] Aaron T. Myers commented on HDFS-3561: -- Seems to me like these new configs should not

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-06 Thread Hadoop QA (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408385#comment-13408385 ] Hadoop QA commented on HDFS-3561: - -1 overall. Here are the results of testing the latest

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-06 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408255#comment-13408255 ] Uma Maheswara Rao G commented on HDFS-3561: --- How about the configuration key name

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405238#comment-13405238 ] Uma Maheswara Rao G commented on HDFS-3561: --- Thanks Aaron :-) {quote} Regardless

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405227#comment-13405227 ] Aaron T. Myers commented on HDFS-3561: -- Ah, yes. Both in the case of BKJM or the QJM,

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405222#comment-13405222 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot. {code}

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-02 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405151#comment-13405151 ] Aaron T. Myers commented on HDFS-3561: -- bq. How we can do shared storage fencing from

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404358#comment-13404358 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot for the expla

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404354#comment-13404354 ] Uma Maheswara Rao G commented on HDFS-3561: --- Hi Aaron, Thanks a lot for the expla

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-29 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404285#comment-13404285 ] Aaron T. Myers commented on HDFS-3561: -- I think some wires are getting crossed here. S

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-28 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402994#comment-13402994 ] Uma Maheswara Rao G commented on HDFS-3561: --- Yes, we have multiple level of fenci

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-28 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402978#comment-13402978 ] Vinay commented on HDFS-3561: - {quote}This isn't acceptable. The point of fencing is to ensure

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400847#comment-13400847 ] Todd Lipcon commented on HDFS-3561: --- +1 for setting it to 0 or 1 for the graceful fence a

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Aaron T. Myers (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400833#comment-13400833 ] Aaron T. Myers commented on HDFS-3561: -- bq. Suggestion: If ZKFC is not able to reach o

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Uma Maheswara Rao G (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400504#comment-13400504 ] Uma Maheswara Rao G commented on HDFS-3561: --- I think we can set retries to 1/2 fo

[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-06-25 Thread Vinay (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400487#comment-13400487 ] Vinay commented on HDFS-3561: - During transition, fencing of old active will be done. Here bef