[jira] [Commented] (HDFS-2742) HA: observed dataloss in replication stress test
[ https://issues.apache.org/jira/browse/HDFS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183911#comment-13183911 ] Todd Lipcon commented on HDFS-2742: --- bq. What is the implication of ignoring RBW altogether at the standby? That's an idea I've thought a little about, but I think it has some implications for lease recovery. In actuality, in order to fix the cases in HDFS-2691, I think we need to send RBW blockReceived messages to the SBN as soon as a pipeline is constructed. I do like it, though, as at least a stop-gap for now while we work on a more thorough solution. bq. If editlog has a finalized record, can we just ignore the RBW from the block report? Possibly - I haven't thought through the whole Append state machine. I assumed that the code that marks a RBW replica as corrupt when received for a COMPLETED block is probably there for a good reason... so changing the behavior there might introduce some other bugs that could even hurt the non-HA case. I'm going to keep working on this and see if I can come up with a simpler solution based on some of Suresh's ideas above. > HA: observed dataloss in replication stress test > > > Key: HDFS-2742 > URL: https://issues.apache.org/jira/browse/HDFS-2742 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-2742.txt, log-colorized.txt > > > The replication stress test case failed over the weekend since one of the > replicas went missing. Still diagnosing the issue, but it seems like the > chain of events was something like: > - a block report was generated on one of the nodes while the block was being > written - thus the block report listed the block as RBW > - when the standby replayed this queued message, it was replayed after the > file was marked complete. Thus it marked this replica as corrupt > - it asked the DN holding the corrupt replica to delete it. And, I think, > removed it from the block map at this time. > - That DN then did another block report before receiving the deletion. This > caused it to be re-added to the block map, since it was "FINALIZED" now. > - Replication was lowered on the file, and it counted the above replica as > non-corrupt, and asked for the other replicas to be deleted. > - All replicas were lost. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2766) HA: test for case where standby partially reads log and then performs checkpoint
[ https://issues.apache.org/jira/browse/HDFS-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183902#comment-13183902 ] Todd Lipcon commented on HDFS-2766: --- +1 lgtm. > HA: test for case where standby partially reads log and then performs > checkpoint > > > Key: HDFS-2766 > URL: https://issues.apache.org/jira/browse/HDFS-2766 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Aaron T. Myers > Attachments: HDFS-2766-HDFS-1623.patch, HDFS-2766-HDFS-1623.patch > > > Here's a potential bug case that we don't currently test for: > - SBN is reading a finalized edits file when NFS disappears halfway through > (or some intermittent error happens) > - SBN performs a checkpoint and uploads it to the NN > - NN receives a checkpoint that doesn't correspond to the end of any log > segment > - Both NN and SBN should be able to restart at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2775. --- Resolution: Fixed Fix Version/s: HA branch (HDFS-1623) Hadoop Flags: Reviewed Committed to branch, thx. > HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently > --- > > Key: HDFS-2775 > URL: https://issues.apache.org/jira/browse/HDFS-2775 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2775.txt > > > This test is failing periodically on this assertion: > {code} > assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() > .getMostRecentCheckpointTxId()); > {code} > My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183900#comment-13183900 ] Todd Lipcon commented on HDFS-2775: --- bq. Should FSImage#getMostRecentCheckpointTxId perhaps be marked @VisibleForTesting? Eh, I don't see any reason it shouldn't be used elsewhere in the code either. I generally try to only use that when you're exposing some piece of internal state that shouldn't normally be used from the main non-test code. > HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently > --- > > Key: HDFS-2775 > URL: https://issues.apache.org/jira/browse/HDFS-2775 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2775.txt > > > This test is failing periodically on this assertion: > {code} > assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() > .getMostRecentCheckpointTxId()); > {code} > My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2738) FSEditLog.selectinputStreams is reading through in-progress streams even when non-in-progress are requested
[ https://issues.apache.org/jira/browse/HDFS-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183899#comment-13183899 ] Todd Lipcon commented on HDFS-2738: --- +1, looks good to me. Thanks for making those changes. > FSEditLog.selectinputStreams is reading through in-progress streams even when > non-in-progress are requested > --- > > Key: HDFS-2738 > URL: https://issues.apache.org/jira/browse/HDFS-2738 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-2738-HDFS-1623.patch, HDFS-2738-HDFS-1623.patch, > HDFS-2738-HDFS-1623.patch > > > The new code in HDFS-1580 is causing an issue with selectInputStreams in the > HA context. When the active is writing to the shared edits, > selectInputStreams is called on the standby. This ends up calling > {{journalSet.getInputStream}} but doesn't pass the {{inProgressOk=false}} > flag. So, {{getInputStream}} ends up reading and validating the in-progress > stream unnecessarily. Since the validation results are no longer properly > cached, {{findMaxTransaction}} also re-validates the in-progress stream, and > then breaks the corruption check in this code. The end result is a lot of > errors like: > 2011-12-30 16:45:02,521 ERROR namenode.FileJournalManager > (FileJournalManager.java:getNumberOfTransactions(266)) - Gap in transactions, > max txnid is 579, 0 txns from 578 > 2011-12-30 16:45:02,521 INFO ha.EditLogTailer (EditLogTailer.java:run(163)) > - Got error, will try again. > java.io.IOException: No non-corrupt logs for txid 578 > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.getInputStream(JournalSet.java:229) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1081) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:115) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$0(EditLogTailer.java:100) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:154) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2773) HA: reading edit logs from an earlier version leaves blocks in under-construction state
[ https://issues.apache.org/jira/browse/HDFS-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2773. --- Resolution: Fixed Fix Version/s: HA branch (HDFS-1623) Hadoop Flags: Reviewed Committed to branch, thx for review. > HA: reading edit logs from an earlier version leaves blocks in > under-construction state > --- > > Key: HDFS-2773 > URL: https://issues.apache.org/jira/browse/HDFS-2773 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Fix For: HA branch (HDFS-1623) > > Attachments: hadoop-1.0-multiblock-file.tgz, hdfs-2773.txt > > > In HDFS-2602, the code for applying OP_ADD and OP_CLOSE was changed a bit, > and the new code has the following problem: if an OP_CLOSE includes new > blocks (ie not previously seen in an OP_ADD) then those blocks will remain in > the "under construction" state rather than being marked "complete". This is > because {{updateBlocks}} always creates {{BlockInfoUnderConstruction}} > regardless of the opcode. This bug only affects the upgrade path, since in > trunk we always persist blocks with OP_ADDs before we call OP_CLOSE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183895#comment-13183895 ] Uma Maheswara Rao G commented on HDFS-2592: --- Thanks a lot again, Todd. I will address all your comments in next patch. Infact i already started the refactoring, mainly to avoid the duplicates. Was waiting for the initial feedback on approach. Thanks Uma > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch, HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2499) Fix RPC client creation bug from HDFS-2459
[ https://issues.apache.org/jira/browse/HDFS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183896#comment-13183896 ] Hudson commented on HDFS-2499: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1544 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1544/]) Add HDFS-2499 to CHANGES.txt. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229897 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix RPC client creation bug from HDFS-2459 > -- > > Key: HDFS-2499 > URL: https://issues.apache.org/jira/browse/HDFS-2499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2499.txt, HDFS-2499.txt > > > HDFS-2459 incorrectly implemented the RPC getProxy for the JournalProtocol > client side. It sets retry policies and other policies that are not necessary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2773) HA: reading edit logs from an earlier version leaves blocks in under-construction state
[ https://issues.apache.org/jira/browse/HDFS-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183894#comment-13183894 ] Todd Lipcon commented on HDFS-2773: --- I added the following: {code} + // OP_CLOSE should add finalized blocks. This code path + // is only executed when loading edits written by prior + // versions of Hadoop. Current versions always log + // OP_ADD operations as each block is allocated. + newBI = new BlockInfo(newBlock, file.getReplication()); {code} Will commit momentarily. > HA: reading edit logs from an earlier version leaves blocks in > under-construction state > --- > > Key: HDFS-2773 > URL: https://issues.apache.org/jira/browse/HDFS-2773 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hadoop-1.0-multiblock-file.tgz, hdfs-2773.txt > > > In HDFS-2602, the code for applying OP_ADD and OP_CLOSE was changed a bit, > and the new code has the following problem: if an OP_CLOSE includes new > blocks (ie not previously seen in an OP_ADD) then those blocks will remain in > the "under construction" state rather than being marked "complete". This is > because {{updateBlocks}} always creates {{BlockInfoUnderConstruction}} > regardless of the opcode. This bug only affects the upgrade path, since in > trunk we always persist blocks with OP_ADDs before we call OP_CLOSE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2753. --- Resolution: Fixed Fix Version/s: HA branch (HDFS-1623) Hadoop Flags: Reviewed > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Fix For: HA branch (HDFS-1623) > > Attachments: HDFS-2753.patch, hdfs-2753.txt, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2527) Remove the use of Range header from webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2527: -- Target Version/s: (was: 1.0.0) Fix Version/s: (was: 1.1.0) (was: 0.23.0) This was included in branch-23.0 but not shipped as part of the 23.0 release. > Remove the use of Range header from webhdfs > --- > > Key: HDFS-2527 > URL: https://issues.apache.org/jira/browse/HDFS-2527 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.24.0, 0.23.1, 1.0.0 > > Attachments: h2527_2001b_0.20s.patch, h2527_2002.patch, > h2527_2002_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2416) distcp with a webhdfs uri on a secure cluster fails
[ https://issues.apache.org/jira/browse/HDFS-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2416: -- Target Version/s: (was: 1.0.0) Fix Version/s: (was: 1.1.0) (was: 0.23.0) 0.23.1 This was included in branch-23.0 but not shipped as part of the 23.0 release. > distcp with a webhdfs uri on a secure cluster fails > --- > > Key: HDFS-2416 > URL: https://issues.apache.org/jira/browse/HDFS-2416 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 0.20.205.0 >Reporter: Arpit Gupta >Assignee: Jitendra Nath Pandey > Fix For: 0.24.0, 0.23.1, 1.0.0 > > Attachments: HDFS-2416-branch-0.20-security.6.patch, > HDFS-2416-branch-0.20-security.7.patch, > HDFS-2416-branch-0.20-security.8.patch, HDFS-2416-branch-0.20-security.patch, > HDFS-2416-trunk.patch, HDFS-2416-trunk.patch, > HDFS-2419-branch-0.20-security.patch, HDFS-2419-branch-0.20-security.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2539) Support doAs and GETHOMEDIRECTORY in webhdfs
[ https://issues.apache.org/jira/browse/HDFS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2539: -- Fix Version/s: (was: 1.1.0) (was: 0.23.0) This was included in branch-23.0 but not shipped as part of the 23.0 release. > Support doAs and GETHOMEDIRECTORY in webhdfs > > > Key: HDFS-2539 > URL: https://issues.apache.org/jira/browse/HDFS-2539 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.24.0, 0.23.1, 1.0.0 > > Attachments: h2539_2008.patch, h2539_2008_0.20s.patch, > h2539_2008_0.20s.patch, h2539_2009.patch, h2539_2009_0.20s.patch, > h2539_2009b.patch, h2539_2009b_0.20s.patch, h2539_2009c.patch, > h2539_2009c_0.20s.patch, h2539_2010.patch, > h2539_2010_0.20s.patch, h2539_2010b.patch, h2539_2010b_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2528) webhdfs rest call to a secure dn fails when a token is sent
[ https://issues.apache.org/jira/browse/HDFS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2528: -- Target Version/s: (was: 1.0.0) Fix Version/s: (was: 1.1.0) (was: 0.23.0) This was included in branch-23.0 but not shipped as part of the 23.0 release. > webhdfs rest call to a secure dn fails when a token is sent > --- > > Key: HDFS-2528 > URL: https://issues.apache.org/jira/browse/HDFS-2528 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 0.20.205.0 >Reporter: Arpit Gupta >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.24.0, 0.23.1, 1.0.0 > > Attachments: h2528_2001.patch, h2528_2001_0.20s.patch, > h2528_2001b.patch, h2528_2001b_0.20s.patch, h2528_2002.patch, > h2528_2002_0.20s.patch, h2528_2003.patch, h2528_2003_0.20s.patch, > h2528_2003_0.20s.patch > > > curl -L -u : --negotiate -i > "http://NN:50070/webhdfs/v1/tmp/webhdfs_data/file_small_data.txt?op=OPEN"; > the following exception is thrown by the datanode when the redirect happens. > {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Call > to failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]"}} > Interestingly when using ./bin/hadoop with a webhdfs path we are able to cat > or tail a file successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2540) Change WebHdfsFileSystem to two-step create/append
[ https://issues.apache.org/jira/browse/HDFS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2540: -- Target Version/s: (was: 1.0.0) Fix Version/s: (was: 1.1.0) (was: 0.23.0) This was included in branch-23.0 but not shipped as part of the 23.0 release. > Change WebHdfsFileSystem to two-step create/append > -- > > Key: HDFS-2540 > URL: https://issues.apache.org/jira/browse/HDFS-2540 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 0.24.0, 0.23.1, 1.0.0 > > Attachments: h2540_2007.patch, h2540_2007_0.20s.patch, > h2540_2008.patch, h2540_2008_0.20s.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G reassigned HDFS-2767: - Assignee: Uma Maheswara Rao G (was: Todd Lipcon) > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Blocker > Attachments: HDFS-2767.patch, hdfs-2767-what-todd-had.txt > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2753: -- Attachment: hdfs-2753.txt Adjusted the test case javadoc to explain the queueing more clearly. I also fixed a bad javadoc @link higher up in the test cases I noticed while I was in there. Will commit this momentarily. > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: HDFS-2753.patch, hdfs-2753.txt, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2499) Fix RPC client creation bug from HDFS-2459
[ https://issues.apache.org/jira/browse/HDFS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183883#comment-13183883 ] Hudson commented on HDFS-2499: -- Integrated in Hadoop-Common-trunk-Commit #1525 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1525/]) Add HDFS-2499 to CHANGES.txt. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229897 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix RPC client creation bug from HDFS-2459 > -- > > Key: HDFS-2499 > URL: https://issues.apache.org/jira/browse/HDFS-2499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2499.txt, HDFS-2499.txt > > > HDFS-2459 incorrectly implemented the RPC getProxy for the JournalProtocol > client side. It sets retry policies and other policies that are not necessary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2499) Fix RPC client creation bug from HDFS-2459
[ https://issues.apache.org/jira/browse/HDFS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183882#comment-13183882 ] Hudson commented on HDFS-2499: -- Integrated in Hadoop-Hdfs-trunk-Commit #1598 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1598/]) Add HDFS-2499 to CHANGES.txt. szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229897 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Fix RPC client creation bug from HDFS-2459 > -- > > Key: HDFS-2499 > URL: https://issues.apache.org/jira/browse/HDFS-2499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2499.txt, HDFS-2499.txt > > > HDFS-2459 incorrectly implemented the RPC getProxy for the JournalProtocol > client side. It sets retry policies and other policies that are not necessary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2737) HA: Automatically trigger log rolls periodically on the active NN
[ https://issues.apache.org/jira/browse/HDFS-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2737: -- Attachment: hdfs-2737-prelim.txt attaching preliminary patch since ATM was wanting to do some cluster testing which depends on this. Still need to finish some work on this, I forget if it's actually working or not :) > HA: Automatically trigger log rolls periodically on the active NN > - > > Key: HDFS-2737 > URL: https://issues.apache.org/jira/browse/HDFS-2737 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2737-prelim.txt > > > Currently, the edit log tailing process can only read finalized log segments. > So, if the active NN is not rolling its logs periodically, the SBN will lag a > lot. This also causes many datanode messages to be queued up in the > PendingDatanodeMessage structure. > To combat this, the active NN needs to roll its logs periodically -- perhaps > based on a time threshold, or perhaps based on a number of transactions. I'm > not sure yet whether it's better to have the NN roll on its own or to have > the SBN ask the active NN to roll its logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2499) Fix RPC client creation bug from HDFS-2459
[ https://issues.apache.org/jira/browse/HDFS-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183880#comment-13183880 ] Tsz Wo (Nicholas), SZE commented on HDFS-2499: -- Forgot to add an entry in CHANGES.txt? Let me update it now. > Fix RPC client creation bug from HDFS-2459 > -- > > Key: HDFS-2499 > URL: https://issues.apache.org/jira/browse/HDFS-2499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.24.0 > > Attachments: HDFS-2499.txt, HDFS-2499.txt > > > HDFS-2459 incorrectly implemented the RPC getProxy for the JournalProtocol > client side. It sets retry policies and other policies that are not necessary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183876#comment-13183876 ] Todd Lipcon commented on HDFS-2592: --- This looks fairly reasonable. A few items: - Is it possible to move that new code out of the NameNodeConnector constructor into a static method in DFSUtil or even DFSClient? - Rather than duplicating the code to parse the maxFailoverAttempts, failoverBaseSleepMillis, etc, can we reuse some of the code that's in DFSClient? If we move the connection code into a static method in DFSClient, then we can instantiate a DFSClient.Conf and pull out the variables from there, for example. - Some too-long lines in the new test code - The new test is mostly dup code from TestBalancer. Is it possible to reuse more of the code by refactoring into static methods, etc? - Similarly much of the setup code is duplicated from HAUtil.configureFailoverFs. Can you just call that function, then grab the conf from the resulting filesystem, or refactor that method so you can reuse the configuration generating code? > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch, HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183872#comment-13183872 ] Uma Maheswara Rao G commented on HDFS-2767: --- Thanks a lot Todd, for the comments. I will check and update accordingly. Mean.while can you please take a look on Balancer issue also? > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch, hdfs-2767-what-todd-had.txt > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183867#comment-13183867 ] Todd Lipcon commented on HDFS-2767: --- Hi Uma. I had started working on this before you posted your patch, but looks like we went a similar direction. The only suggestion I have is to make the interface an argument of the constructor rather than calling a setter after it's instantiated. I'll upload what I have - do you think you could make that change in your patch? Also, regarding this section: {code} +// TODO(HA): Need other way to create the proxy instance based on +// protocol here. +if (protocol != null && NamenodeProtocol.class.equals(protocol)) { + current.namenode = DFSUtil.createNamenodeWithNNProtocol( + current.address, conf); +} else { {code} I think you can remove the TODO and change the {{else}} to an {{else if}} to check for ClientProtocol, with a final {{else}} clause that throws an AssertionError or IllegalStateException. Lastly, I think we do need to wire the {{ugi}} parameter in to {{createNamenodeWithNNProtocol}} or else after a failover the user accessing HDFS might accidentally switch! > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch, hdfs-2767-what-todd-had.txt > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2767: -- Attachment: hdfs-2767-what-todd-had.txt > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch, hdfs-2767-what-todd-had.txt > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183838#comment-13183838 ] Hudson commented on HDFS-2739: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1543 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1543/]) HDFS-2739. SecondaryNameNode doesn't start up. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229877 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/NamenodeProtocol.proto > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183825#comment-13183825 ] Hudson commented on HDFS-2739: -- Integrated in Hadoop-Common-trunk-Commit #1524 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1524/]) HDFS-2739. SecondaryNameNode doesn't start up. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229877 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/NamenodeProtocol.proto > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183824#comment-13183824 ] Hudson commented on HDFS-2739: -- Integrated in Hadoop-Hdfs-trunk-Commit #1597 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1597/]) HDFS-2739. SecondaryNameNode doesn't start up. jitendra : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229877 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/NamenodeProtocol.proto > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2739: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183818#comment-13183818 ] Jitendra Nath Pandey commented on HDFS-2739: TestDistributedUpgrade failure is unrelated and it actually passes on my machine. The findbugs, release audit and javadoc warnings also seem to exist in trunk for a while now. I tested the patch manually because secondary namenode was required to be run as a separate process to reproduce the problem. > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183806#comment-13183806 ] Hadoop QA commented on HDFS-2739: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510134/HDFS-2739.trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 21 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1771//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1771//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1771//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1771//console This message is automatically generated. > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProto
[jira] [Commented] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183804#comment-13183804 ] Uma Maheswara Rao G commented on HDFS-2767: --- If you agree with this approach, i can just file on issue in common for FailoverProxyProvider interface method. If you have any other approach which works better, please suggest i can make the changes. > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183800#comment-13183800 ] Uma Maheswara Rao G commented on HDFS-2592: --- This patch expects HDFS-2767 to apply first. > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch, HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2772) HA: On transition to active, standby should not swallow ELIE
[ https://issues.apache.org/jira/browse/HDFS-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183799#comment-13183799 ] Aaron T. Myers commented on HDFS-2772: -- I'm still working on writing a test case for it, but I can confirm that the present code will not in fact allow the standby to "silently fail to load all the edits before becoming active." That is, the current code does not presently have any correctness issue. Writing a test for this is a little annoying because of the way exceptions are propagated, but I'll post a patch for it shortly. > HA: On transition to active, standby should not swallow ELIE > > > Key: HDFS-2772 > URL: https://issues.apache.org/jira/browse/HDFS-2772 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > > EditLogTailer#doTailEdits currently catches, logs, and swallows > EditLogInputException. This is fine in the case when the standby is sitting > idly behind tailing logs. However, when the standby is transitioning to > active, swallowing this exception is incorrect, since it could cause the > standby to silently fail to load all the edits before becoming active. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2584) Out of the box, visiting /jmx on the NN gives a whole lot of errors in logs.
[ https://issues.apache.org/jira/browse/HDFS-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183782#comment-13183782 ] Chris Leroy commented on HDFS-2584: --- I think the exception being thrown is not a problem. We're getting it because in exploring the JMX bean the way we are we're effectively trying to get the usage threshold of a memory pool where isUsageThresholdSupported is false. This leads the UnsupportedOperationException to be thrown, and then we spew. Isn't it reasonable to just catch the exception and not log when this happens? Something like: diff --git a/src/core/org/apache/hadoop/jmx/JMXJsonServlet.java b/src/core/org/apache/hadoop/jmx/JMXJsonServlet.java index 2c8f797..e9d1f9e 100644 --- a/src/core/org/apache/hadoop/jmx/JMXJsonServlet.java +++ b/src/core/org/apache/hadoop/jmx/JMXJsonServlet.java @@ -34,6 +34,7 @@ import javax.management.MBeanServer; import javax.management.MalformedObjectNameException; import javax.management.ObjectName; import javax.management.ReflectionException; +import javax.management.RuntimeMBeanException; import javax.management.openmbean.CompositeData; import javax.management.openmbean.CompositeType; import javax.management.openmbean.TabularData; @@ -239,6 +240,15 @@ public class JMXJsonServlet extends HttpServlet { // and fall back on the class name LOG.error("getting attribute " + prs + " of " + oname + " threw an exception", e); + } catch (RuntimeMBeanException e) { + // The code inside the attribute getter threw an exception, so we + // skip outputting the attribute. We will log the exception in + // certain cases, but suppress the log message in others. + if (!(e.getCause() instanceof UnsupportedOperationException)) { + LOG.error("getting attribute " + attName + " of " + oname + + " threw an exception", e); + } + return; } catch (RuntimeException e) { // For some reason even with an MBeanException available to them // Runtime exceptionscan still find their way through, so treat them > Out of the box, visiting /jmx on the NN gives a whole lot of errors in logs. > > > Key: HDFS-2584 > URL: https://issues.apache.org/jira/browse/HDFS-2584 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: Harsh J >Priority: Minor > > Logs that follow a {{/jmx}} servlet visit: > {code} > 11/11/22 12:09:52 ERROR jmx.JMXJsonServlet: getting attribute UsageThreshold > of java.lang:type=MemoryPool,name=Par Eden Space threw an exception > javax.management.RuntimeMBeanException: > java.lang.UnsupportedOperationException: Usage threshold is not supported > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:869) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:670) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638) > at > org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:314) > at > org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:292) > at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:192) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:940) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollecti
[jira] [Commented] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183777#comment-13183777 ] Uma Maheswara Rao G commented on HDFS-2592: --- Hi Todd, Thanks for the care on this issue. Actually I stated work on HDFS-2767 also as part of this issue. Just updated my work in HDFS-2767. With that change, Balancer can work with failover now. {noformat} 2012-01-11 06:48:43,791 INFO balancer.Balancer (Balancer.java:run(1390)) - p = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 2012-01-11 06:48:43,891 WARN retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(105)) - Exception while invoking create of class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB after 0 fail over attempts. Trying to fail over immediately. ... 2012-01-11 06:48:58,857 WARN retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(105)) - Exception while invoking getBlocks of class org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB after 0 fail over attempts. Trying to fail over immediately. {noformat} > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch, HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2592: -- Attachment: HDFS-2592.patch > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch, HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183772#comment-13183772 ] Uma Maheswara Rao G commented on HDFS-2767: --- Hi Todd, I worked on this as part of Balancer issue. Included common class FailoverProxyProvider interface also in this one. Added one setter method for setting the protocol and created the corresponding proxy instance based on protocol. Thanks Uma > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2739: --- Assignee: Jitendra Nath Pandey (was: Suresh Srinivas) Hadoop Flags: Reviewed Status: Patch Available (was: Open) > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2767: -- Attachment: HDFS-2767.patch > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > Attachments: HDFS-2767.patch > > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183758#comment-13183758 ] Suresh Srinivas commented on HDFS-2739: --- +1 for the patch. > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Suresh Srinivas >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2739) SecondaryNameNode doesn't start up
[ https://issues.apache.org/jira/browse/HDFS-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2739: --- Attachment: HDFS-2739.trunk.patch The problem is that WritableRpc proxy is getting created in the PB translator. We don't see this problem in unit tests because rpc engines are globally configured. I have tested the patch on a single node installation. Also fixed the typo pointed out by Harsh. > SecondaryNameNode doesn't start up > -- > > Key: HDFS-2739 > URL: https://issues.apache.org/jira/browse/HDFS-2739 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Sho Shimauchi >Assignee: Suresh Srinivas >Priority: Critical > Attachments: HDFS-2739.trunk.patch > > > Built a 0.24-SNAPSHOT tar from today, used a general config, started NN/DN, > but SNN won't come up with following error: > {code} > 11/12/31 12:13:14 ERROR namenode.SecondaryNameNode: Throwable Exception in > doCheckpoint > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > java.lang.RuntimeException: java.lang.NoSuchFieldException: versionID > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:154) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invocation.(WritableRpcEngine.java:112) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:226) > at $Proxy9.getTransationId(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.getTransactionID(NamenodeProtocolTranslatorPB.java:185) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.countUncheckpointedTxns(SecondaryNameNode.java:625) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.shouldCheckpointBasedOnCount(SecondaryNameNode.java:633) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:386) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:356) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.lang.NoSuchFieldException: versionID > at java.lang.Class.getField(Class.java:1520) > at org.apache.hadoop.ipc.RPC.getProtocolVersion(RPC.java:150) > ... 9 more > 11/12/31 12:13:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at sho-mba.local/192.168.11.2 > / > {code} > full error log: http://pastebin.com/mSaVbS34 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183747#comment-13183747 ] Hari Mankude commented on HDFS-2753: Sounds good. > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: HDFS-2753.patch, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183743#comment-13183743 ] Aaron T. Myers commented on HDFS-2753: -- Hari, does Todd's explanation address your concerns? I verified that the test fails consistently without the fix (ran it 10 times) and that it passes with the fix. Todd, my only suggestion would be to add the above explanation about why this test case elicits the desired behavior as a comment, given that it's not at all obvious. +1 once that issue is addressed. > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: HDFS-2753.patch, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2773) HA: reading edit logs from an earlier version leaves blocks in under-construction state
[ https://issues.apache.org/jira/browse/HDFS-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183732#comment-13183732 ] Aaron T. Myers commented on HDFS-2773: -- Looking good, Todd. I verified that the test passes with this fix, and fails without it. My only suggestion would be to add a comment in the OP_CLOSE case to mention that this code will only be reached when loading old edits log versions. +1 once the above is addressed. > HA: reading edit logs from an earlier version leaves blocks in > under-construction state > --- > > Key: HDFS-2773 > URL: https://issues.apache.org/jira/browse/HDFS-2773 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hadoop-1.0-multiblock-file.tgz, hdfs-2773.txt > > > In HDFS-2602, the code for applying OP_ADD and OP_CLOSE was changed a bit, > and the new code has the following problem: if an OP_CLOSE includes new > blocks (ie not previously seen in an OP_ADD) then those blocks will remain in > the "under construction" state rather than being marked "complete". This is > because {{updateBlocks}} always creates {{BlockInfoUnderConstruction}} > regardless of the opcode. This bug only affects the upgrade path, since in > trunk we always persist blocks with OP_ADDs before we call OP_CLOSE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2771) Move Federation and WebHDFS documentation into HDFS project
[ https://issues.apache.org/jira/browse/HDFS-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183730#comment-13183730 ] Suresh Srinivas commented on HDFS-2771: --- Before this can happen, the HDFS docs should move to maven Doxia apt. > Move Federation and WebHDFS documentation into HDFS project > --- > > Key: HDFS-2771 > URL: https://issues.apache.org/jira/browse/HDFS-2771 > Project: Hadoop HDFS > Issue Type: Task > Components: documentation >Affects Versions: 0.23.0 >Reporter: Todd Lipcon > > For some strange reason, the WebHDFS and Federation documentation is > currently in the hadoop-yarn site. This is counter-intuitive. We should move > these documents to an hdfs site, or if we think that all documentation should > go on one site, it should go into the hadoop-common project somewhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2738) FSEditLog.selectinputStreams is reading through in-progress streams even when non-in-progress are requested
[ https://issues.apache.org/jira/browse/HDFS-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2738: - Attachment: HDFS-2738-HDFS-1623.patch Thanks a lot for the review, Todd. Here's an updated patch which should address your concerns. And yes, I did manually verify that those exceptions as described no longer show up in the logs when the SBN is tailing an NN making continuous edits. > FSEditLog.selectinputStreams is reading through in-progress streams even when > non-in-progress are requested > --- > > Key: HDFS-2738 > URL: https://issues.apache.org/jira/browse/HDFS-2738 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Aaron T. Myers >Priority: Blocker > Attachments: HDFS-2738-HDFS-1623.patch, HDFS-2738-HDFS-1623.patch, > HDFS-2738-HDFS-1623.patch > > > The new code in HDFS-1580 is causing an issue with selectInputStreams in the > HA context. When the active is writing to the shared edits, > selectInputStreams is called on the standby. This ends up calling > {{journalSet.getInputStream}} but doesn't pass the {{inProgressOk=false}} > flag. So, {{getInputStream}} ends up reading and validating the in-progress > stream unnecessarily. Since the validation results are no longer properly > cached, {{findMaxTransaction}} also re-validates the in-progress stream, and > then breaks the corruption check in this code. The end result is a lot of > errors like: > 2011-12-30 16:45:02,521 ERROR namenode.FileJournalManager > (FileJournalManager.java:getNumberOfTransactions(266)) - Gap in transactions, > max txnid is 579, 0 txns from 578 > 2011-12-30 16:45:02,521 INFO ha.EditLogTailer (EditLogTailer.java:run(163)) > - Got error, will try again. > java.io.IOException: No non-corrupt logs for txid 578 > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.getInputStream(JournalSet.java:229) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1081) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:115) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$0(EditLogTailer.java:100) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:154) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183702#comment-13183702 ] Aaron T. Myers commented on HDFS-2775: -- +1, the patch looks good to me. Should {{FSImage#getMostRecentCheckpointTxId}} perhaps be marked {{@VisibleForTesting}}? > HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently > --- > > Key: HDFS-2775 > URL: https://issues.apache.org/jira/browse/HDFS-2775 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2775.txt > > > This test is failing periodically on this assertion: > {code} > assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() > .getMostRecentCheckpointTxId()); > {code} > My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2776) Missing interface annotation on JournalSet
[ https://issues.apache.org/jira/browse/HDFS-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li reassigned HDFS-2776: Assignee: Brandon Li > Missing interface annotation on JournalSet > -- > > Key: HDFS-2776 > URL: https://issues.apache.org/jira/browse/HDFS-2776 > Project: Hadoop HDFS > Issue Type: Task > Components: name-node >Affects Versions: 0.24.0 >Reporter: Todd Lipcon >Assignee: Brandon Li >Priority: Trivial > Labels: newbie > > This public class is missing an annotation that it is for private usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2766) HA: test for case where standby partially reads log and then performs checkpoint
[ https://issues.apache.org/jira/browse/HDFS-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2766: - Attachment: HDFS-2766-HDFS-1623.patch Thanks a lot for the review, Todd. Here's an updated patch which adds the requested comment. > HA: test for case where standby partially reads log and then performs > checkpoint > > > Key: HDFS-2766 > URL: https://issues.apache.org/jira/browse/HDFS-2766 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Aaron T. Myers > Attachments: HDFS-2766-HDFS-1623.patch, HDFS-2766-HDFS-1623.patch > > > Here's a potential bug case that we don't currently test for: > - SBN is reading a finalized edits file when NFS disappears halfway through > (or some intermittent error happens) > - SBN performs a checkpoint and uploads it to the NN > - NN receives a checkpoint that doesn't correspond to the end of any log > segment > - Both NN and SBN should be able to restart at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2742) HA: observed dataloss in replication stress test
[ https://issues.apache.org/jira/browse/HDFS-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183658#comment-13183658 ] Suresh Srinivas commented on HDFS-2742: --- Todd, Couple of questions: # What is the implication of ignoring RBW altogether at the standby? # If editlog has a finalized record, can we just ignore the RBW from the block report? > HA: observed dataloss in replication stress test > > > Key: HDFS-2742 > URL: https://issues.apache.org/jira/browse/HDFS-2742 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node, ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Blocker > Attachments: hdfs-2742.txt, log-colorized.txt > > > The replication stress test case failed over the weekend since one of the > replicas went missing. Still diagnosing the issue, but it seems like the > chain of events was something like: > - a block report was generated on one of the nodes while the block was being > written - thus the block report listed the block as RBW > - when the standby replayed this queued message, it was replayed after the > file was marked complete. Thus it marked this replica as corrupt > - it asked the DN holding the corrupt replica to delete it. And, I think, > removed it from the block map at this time. > - That DN then did another block report before receiving the deletion. This > caused it to be re-added to the block map, since it was "FINALIZED" now. > - Replication was lowered on the file, and it counted the above replica as > non-corrupt, and asked for the other replicas to be deleted. > - All replicas were lost. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2777) When copying a file out of HDFS, modifying it, and uploading it back into HDFS, the put fails due to a CRC mismatch
When copying a file out of HDFS, modifying it, and uploading it back into HDFS, the put fails due to a CRC mismatch --- Key: HDFS-2777 URL: https://issues.apache.org/jira/browse/HDFS-2777 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.1 Environment: KR at Yahoo Reporter: Kevin J. Price Performing an hdfs -get on a file, modifying the file, and using hdfs -put to place the file back into HDFS results in a checksum error. It seems that the problem is a .crc file being generated locally from the -get command even though the -crc option was NOT specified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183626#comment-13183626 ] Todd Lipcon commented on HDFS-2753: --- The test adds blocks while the SBN is down. This makes them get queued up in the block received list of that BPServiceActor. When it restarts, the DN calls register(), followed by reportReceivedDeletedBlocks(), followed by blockReport(). So the received blocks always show up first. If you comment out the fix, the test case reliably fails with the error you described (stuck in safemode). > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: HDFS-2753.patch, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2753) Standby namenode stuck in safemode during a failover
[ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183612#comment-13183612 ] Hari Mankude commented on HDFS-2753: Todd, Looking at the new test in the patch, I am not sure as to how this is going to bring out the race. The race happens when a blockreceived happens before the first block report when NN is in safemode. I don't see test guarantee this sequence of operations. > Standby namenode stuck in safemode during a failover > > > Key: HDFS-2753 > URL: https://issues.apache.org/jira/browse/HDFS-2753 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: HDFS-2753.patch, hdfs-2753.txt > > > Write traffic initiated from the client. Manual failover is done by killing > NN and converting a different standby to active. NN is restarted as standby. > The restarted standby stays in safemode forever. More information in the > description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2775: -- Attachment: hdfs-2775.txt Simple fix - makes the call for {{getMostRecentCheckpointTxId}} go through FSImage, where it can be synchronized against the saveNamespace call. I verified the fix by adding a sleep before setting {{mostRecentCheckpointTxId}} - it used to make the test fail reliably, but now the test passes regardless. > HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently > --- > > Key: HDFS-2775 > URL: https://issues.apache.org/jira/browse/HDFS-2775 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2775.txt > > > This test is failing periodically on this assertion: > {code} > assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() > .getMostRecentCheckpointTxId()); > {code} > My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2776) Missing interface annotation on JournalSet
Missing interface annotation on JournalSet -- Key: HDFS-2776 URL: https://issues.apache.org/jira/browse/HDFS-2776 Project: Hadoop HDFS Issue Type: Task Components: name-node Affects Versions: 0.24.0 Reporter: Todd Lipcon Priority: Trivial This public class is missing an annotation that it is for private usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183502#comment-13183502 ] Todd Lipcon commented on HDFS-2775: --- Yes, this just a test race. The issue is that the checkpoint is saved to storage, and only after that is {{mostRecentCheckpointTxId}} updated. So, the test sees the checkpoint and then the assert fails. We should probably fix this with some simple synchronization - but it's only a test problem and not a code issue. > HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently > --- > > Key: HDFS-2775 > URL: https://issues.apache.org/jira/browse/HDFS-2775 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > This test is failing periodically on this assertion: > {code} > assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() > .getMostRecentCheckpointTxId()); > {code} > My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2775) HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently
HA: TestStandbyCheckpoints.testBothNodesInStandbyState fails intermittently --- Key: HDFS-2775 URL: https://issues.apache.org/jira/browse/HDFS-2775 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, test Affects Versions: HA branch (HDFS-1623) Reporter: Todd Lipcon Assignee: Todd Lipcon This test is failing periodically on this assertion: {code} assertEquals(12, nn0.getNamesystem().getFSImage().getStorage() .getMostRecentCheckpointTxId()); {code} My guess is it's a test race. Investigating... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2592) HA: Balancer support for HA namenodes
[ https://issues.apache.org/jira/browse/HDFS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183486#comment-13183486 ] Todd Lipcon commented on HDFS-2592: --- Uma, do you mind if I take this over to finish up your patch? I was planning on working on HDFS-2767 which is closely related. > HA: Balancer support for HA namenodes > - > > Key: HDFS-2592 > URL: https://issues.apache.org/jira/browse/HDFS-2592 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2592.patch > > > The balancer currently interacts directly with namenode InetSocketAddresses > and makes its own IPC proxies. We need to integrate it with HA so that it > uses the same client failover infrastructure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2767) HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol
[ https://issues.apache.org/jira/browse/HDFS-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-2767: - Assignee: Todd Lipcon > HA: ConfiguredFailoverProxyProvider should support NameNodeProtocol > --- > > Key: HDFS-2767 > URL: https://issues.apache.org/jira/browse/HDFS-2767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, hdfs client >Affects Versions: HA branch (HDFS-1623) >Reporter: Uma Maheswara Rao G >Assignee: Todd Lipcon >Priority: Blocker > > Presentely ConfiguredFailoverProxyProvider supports ClinetProtocol. > It should support NameNodeProtocol also, because Balancer uses > NameNodeProtocol for getting blocks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2740) Enable the trash feature by default
[ https://issues.apache.org/jira/browse/HDFS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183454#comment-13183454 ] Harsh J commented on HDFS-2740: --- [~eli] - The FsShell does log out that the file was moved to trash and not completely removed. If we can solve this with more info/doc efforts, am up for doing that. I do think a lot of them miss out on the trash feature until they run into a situation that makes them search if there is one. Stuff we can document more explicitly about, to help: - How do I disable Trash? - How do I clear out Trash? - How do I force-delete a file (skipping trash)? - How do I tweak the checkpoint periods? And maybe some dev documentation on trash policies, as I think that is now pluggable (evolving API)? > Enable the trash feature by default > --- > > Key: HDFS-2740 > URL: https://issues.apache.org/jira/browse/HDFS-2740 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs client, name-node >Affects Versions: 0.23.0 >Reporter: Harsh J > Labels: newbie > Attachments: hdfs-2740.patch, hdfs-2740.patch > > > Currently trash is disabled out of box. I do not think it'd be of high > surprise to anyone (but surely a relief when *hit happens) to have trash > enabled by default, with the usually recommended periods of 1-day. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2740) Enable the trash feature by default
[ https://issues.apache.org/jira/browse/HDFS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183419#comment-13183419 ] Eli Collins commented on HDFS-2740: --- I'm not sold that we should ship with Trash enabled out of the box. Equally confusing is users who expect deleting files actually frees up space right? > Enable the trash feature by default > --- > > Key: HDFS-2740 > URL: https://issues.apache.org/jira/browse/HDFS-2740 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs client, name-node >Affects Versions: 0.23.0 >Reporter: Harsh J > Labels: newbie > Attachments: hdfs-2740.patch, hdfs-2740.patch > > > Currently trash is disabled out of box. I do not think it'd be of high > surprise to anyone (but surely a relief when *hit happens) to have trash > enabled by default, with the usually recommended periods of 1-day. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2740) Enable the trash feature by default
[ https://issues.apache.org/jira/browse/HDFS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183330#comment-13183330 ] Hadoop QA commented on HDFS-2740: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510051/hdfs-2740.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1770//console This message is automatically generated. > Enable the trash feature by default > --- > > Key: HDFS-2740 > URL: https://issues.apache.org/jira/browse/HDFS-2740 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs client, name-node >Affects Versions: 0.23.0 >Reporter: Harsh J > Labels: newbie > Attachments: hdfs-2740.patch, hdfs-2740.patch > > > Currently trash is disabled out of box. I do not think it'd be of high > surprise to anyone (but surely a relief when *hit happens) to have trash > enabled by default, with the usually recommended periods of 1-day. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2740) Enable the trash feature by default
[ https://issues.apache.org/jira/browse/HDFS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Meyarivan updated HDFS-2740: -- Attachment: hdfs-2740.patch Includes changes to docs. > Enable the trash feature by default > --- > > Key: HDFS-2740 > URL: https://issues.apache.org/jira/browse/HDFS-2740 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs client, name-node >Affects Versions: 0.23.0 >Reporter: Harsh J > Labels: newbie > Attachments: hdfs-2740.patch, hdfs-2740.patch > > > Currently trash is disabled out of box. I do not think it'd be of high > surprise to anyone (but surely a relief when *hit happens) to have trash > enabled by default, with the usually recommended periods of 1-day. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2740) Enable the trash feature by default
[ https://issues.apache.org/jira/browse/HDFS-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183262#comment-13183262 ] Harsh J commented on HDFS-2740: --- Meyarivan, Your changeset is fine. HDFS buildbot failed to apply it cause its under the hadoop-common directory, but I think the JIRA should stick here, on HDFS, as its relevant to this component. Could you also update the docs regarding this behavior change, and incorporate [~daryn]'s and [~revans2]'s comments above into the docs? The trash feature is currently documented at {{hadoop-hdfs-project/hadoop-hdfs/src/main/docs/src/documentation/content/xdocs/hdfs_design.xml}}, but if you feel its fit in the filesystem_shell guide itself, feel free to move it there. > Enable the trash feature by default > --- > > Key: HDFS-2740 > URL: https://issues.apache.org/jira/browse/HDFS-2740 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs client, name-node >Affects Versions: 0.23.0 >Reporter: Harsh J > Labels: newbie > Attachments: hdfs-2740.patch > > > Currently trash is disabled out of box. I do not think it'd be of high > surprise to anyone (but surely a relief when *hit happens) to have trash > enabled by default, with the usually recommended periods of 1-day. > Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2724) NN web UI can throw NPE after startup, before standby state is entered
[ https://issues.apache.org/jira/browse/HDFS-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183227#comment-13183227 ] Hudson commented on HDFS-2724: -- Integrated in Hadoop-Hdfs-HAbranch-build #43 (See [https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/43/]) HDFS-2724. NN web UI can throw NPE after startup, before standby state is entered. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229466 Files : * /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAServiceProtocol.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.jsp > NN web UI can throw NPE after startup, before standby state is entered > -- > > Key: HDFS-2724 > URL: https://issues.apache.org/jira/browse/HDFS-2724 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Aaron T. Myers >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2724.txt > > > There's a brief period of time (a few seconds) after the NN web server has > been initialized, but before the NN's HA state is initialized. If > {{dfshealth.jsp}} is hit during this time, a {{NullPointerException}} will be > thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2762) TestCheckpoint is timing out
[ https://issues.apache.org/jira/browse/HDFS-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183226#comment-13183226 ] Hudson commented on HDFS-2762: -- Integrated in Hadoop-Hdfs-HAbranch-build #43 (See [https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/43/]) HDFS-2762. Fix TestCheckpoint timing out on HA branch. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1229464 Files : * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java > TestCheckpoint is timing out > > > Key: HDFS-2762 > URL: https://issues.apache.org/jira/browse/HDFS-2762 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: HA branch (HDFS-1623) >Reporter: Aaron T. Myers >Assignee: Uma Maheswara Rao G > Fix For: HA branch (HDFS-1623) > > Attachments: HDFS-2762.patch > > > TestCheckpoint is timing out on the HA branch, and has been for a few days. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira