[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407742#comment-13407742 ] Hadoop QA commented on HDFS-3577: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535316/h3577_20120705.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2747//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2747//console This message is automatically generated. > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407740#comment-13407740 ] Hadoop QA commented on HDFS-3577: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535316/h3577_20120705.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2746//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2746//console This message is automatically generated. > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407709#comment-13407709 ] Suresh Srinivas edited comment on HDFS-3077 at 7/6/12 5:14 AM: --- bq. What do you mean by "paxos-style". How does it relate to ZAB? Saw the updated design doc {{update for paxos-y recovery protocol}}. was (Author: sureshms): bq. What do you mean by "paxos-style". How does it relate to ZAB? Saw the updated design doc {{update for paxos-y recovery protocol}} along with ZAB}}. > Quorum-based protocol for reading and writing edit logs > --- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, > qjournal-design.pdf, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on > shared storage such as an NFS filer for the shared edit log. One alternative > that has been proposed is to depend on BookKeeper, a ZooKeeper subproject > which provides a highly available replicated edit log on commodity hardware. > This JIRA is to implement another alternative, based on a quorum commit > protocol, integrated more tightly in HDFS and with the requirements driven > only by HDFS's needs rather than more generic use cases. More details to > follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407709#comment-13407709 ] Suresh Srinivas commented on HDFS-3077: --- bq. What do you mean by "paxos-style". How does it relate to ZAB? Saw the updated design doc {{update for paxos-y recovery protocol}} along with ZAB}}. > Quorum-based protocol for reading and writing edit logs > --- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, > qjournal-design.pdf, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on > shared storage such as an NFS filer for the shared edit log. One alternative > that has been proposed is to depend on BookKeeper, a ZooKeeper subproject > which provides a highly available replicated edit log on commodity hardware. > This JIRA is to implement another alternative, based on a quorum commit > protocol, integrated more tightly in HDFS and with the requirements driven > only by HDFS's needs rather than more generic use cases. More details to > follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3577: - Attachment: h3577_20120705.patch h3577_20120705.patch: do not throw exceptions when Content-Length is missing. > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding
[ https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3577: - Status: Patch Available (was: Open) > webHdfsFileSystem fails to read files with chunked transfer encoding > > > Key: HDFS-3577 > URL: https://issues.apache.org/jira/browse/HDFS-3577 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Tsz Wo (Nicholas), SZE >Priority: Blocker > Attachments: h3577_20120705.patch > > > If reading a file large enough for which the httpserver running > webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of > webhdfs), then the WebHdfsFileSystem client fails with an IOException with > message *Content-Length header is missing*. > It looks like WebHdfsFileSystem is delegating opening of the inputstream to > *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* > header, but when using chunked transfer encoding the *Content-Length* header > is not present and the *URLOpener.openInputStream()* method thrown an > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407684#comment-13407684 ] Suresh Srinivas commented on HDFS-3077: --- Todd, I have not had time to look into the comments or the patch. Will try to get this done in next few days. As I said earlier, keeping JournalProtocol without adding Quorum semantics allows writers that have different policy. Perhaps the protocols should be different and may be JournalProtocol from 3092 can remain as is. Again this is an early thought - will spend time on this in next few days. Quick comment: bq. I disagree with this statement. The commit protocol is strongly intertwined with the way in which the server has to behave. For example, the "new epoch" command needs to provide back certain information about the current state of the journals and previous paxos-style 'accepted' decisions. Trying to shoehorn it into a generic protocol doesn't make much sense to me. What do you mean by "paxos-style". How does it relate to ZAB? > Quorum-based protocol for reading and writing edit logs > --- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, > qjournal-design.pdf, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on > shared storage such as an NFS filer for the shared edit log. One alternative > that has been proposed is to depend on BookKeeper, a ZooKeeper subproject > which provides a highly available replicated edit log on commodity hardware. > This JIRA is to implement another alternative, based on a quorum commit > protocol, integrated more tightly in HDFS and with the requirements driven > only by HDFS's needs rather than more generic use cases. More details to > follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3604: - Hadoop Flags: Reviewed +1 patch looks good. > Add dfs.webhdfs.enabled to hdfs-default.xml > --- > > Key: HDFS-3604 > URL: https://issues.apache.org/jira/browse/HDFS-3604 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3604.txt > > > Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407671#comment-13407671 ] Aaron T. Myers commented on HDFS-3077: -- I just finished a review of the latest patch. Overall it looks really good. Great test coverage, too. Some comments: # If the following is supposed to be a list of host:port pairs, I suggest we call it something other than "*.edits.dir". Also, if the default is just a path, is it really supposed to be a list of host:port pairs? Or is this comment supposed to be referring to DFS_JOURNALNODE_RPC_ADDRESS_KEY? {code} + // This is a comma separated host:port list of addresses hosting the journal service + public static final String DFS_JOURNALNODE_EDITS_DIR_KEY = "dfs.journalnode.edits.dir"; + public static final String DFS_JOURNALNODE_EDITS_DIR_DEFAULT = "/tmp/hadoop/dfs/journalnode/"; {code} # Could use a class comment and method comments in AsyncLogger. # Missing an @param comment for AsyncLoggerSet#createNewUniqueEpoch. # I think this won't substitute in the correct hostname in a multi-node setup with host-based principal names: {code} +SecurityUtil.getServerPrincipal(conf +.get(DFSConfigKeys.DFS_JOURNALNODE_USER_NAME_KEY), +NameNode.getAddress(conf).getHostName()) }; {code} # In IPCLoggerChannel, I wonder if you also shouldn't ensure that httpPort is not yet set here: {code} // Fill in HTTP port. TODO: is there a more elegant place to put this? httpPort = ret.getHttpPort(); {code} # Is there no need for IPCLoggerChannel to have a way of closing its associated proxy? # Could use some comments in JNStorage. # Seems a little odd that JNStorage relies on a few static functions of NNStorage. Is there some better place those functions could live? # I don't understand why JNStorage#analyzeStorage locks the storage directory after formatting it. What, if anything, relies on that behavior? Where is it unlocked? Might want to add a comment explaining it. # Patch needs to be rebased on trunk, e.g. PersistentLong was renamed to PersistentLongFile. # This line kind of creeps me out in the constructor of the Journal class. Maybe make a no-args version of Storage#getStorageDir that asserts there's only one dir? {code} File currentDir = storage.getStorageDir(0).getCurrentDir(); {code} # In general this patch seems to be mixing in protobufs in a few places where non-proto classes seem more appropriate, notably in the Journal and JournalNodeRpcServer classes. Perhaps we should create non-proto analogs for these protos and add translator methods? # This seems really goofy. Just make another non-proto class and use a translator? {code} // Return the partial builder instead of the proto, since {code} # I notice that there's a few TODOs left in this patch. It would be useful to know which of these you think need to be fixed before we commit this for real, versus those you'd like to leave in and do as follow-ups. # Instead of putting all of these classes in the o.a.h.hdfs.qjournal packages, I recommend you try to separate these out into o.a.h.hdfs.qjoural.client, which implements the NN side of things, and o.a.h.hdfs.qjournal.server, which implements the JN side of things. I think doing so would make it easier to navigate the code. # Could definitely use some method comments in the Journal class. # Recommend renaming Journal#journal to something like Journal#logEdits or Journal#writeEdits. # In JournalNode#getOrCreateJournal, this log message could be more helpful: LOG.info("logDir: " + logDir); # Seems like all of the timeouts in QuorumJournalManager should be configurable. # I think you already have the config key to address this TODO in QJournalProtocolPB: // TODO: need to add a new principal for loggers # s/BackupNode/JournalNode/g: {code} + * Protocol used to journal edits to a remote node. Currently, + * this is used to publish edits from the NameNode to a BackupNode. {code} # Use an HTML comment in journalstatus.jsp, instead of Java comments within a code block. # Could use some more content for the journalstatus.jsp page. :) # A few spots in the tests you catch expected IOEs, but don't verify that you received the IOE you actually expect. # Really solid tests overall, but how about one that actually works with HA? You currently have a test for two entirely separate NNs, but not one that uses an HA mini cluster. > Quorum-based protocol for reading and writing edit logs > --- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, > qjo
[jira] [Commented] (HDFS-3584) Blocks are getting marked as corrupt with append operation under high load.
[ https://issues.apache.org/jira/browse/HDFS-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407665#comment-13407665 ] Uma Maheswara Rao G commented on HDFS-3584: --- Hi All, Do you have any comments on this issue? > Blocks are getting marked as corrupt with append operation under high load. > --- > > Key: HDFS-3584 > URL: https://issues.apache.org/jira/browse/HDFS-3584 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.1-alpha >Reporter: Brahma Reddy Battula > > Scenario: > = > 1. There are 2 clients cli1 and cli2 cli1 write a file F1 and not closed > 2. The cli2 will call append on unclosed file and triggers a leaserecovery > 3. Cli1 is closed > 4. Lease recovery is completed and with updated GS in DN and got BlockReport > since there is a mismatch in GS the block got corrupted > 5. Now we got a CommitBlockSync this will also fail since the File is already > closed by cli1 and state in NN is Finalized -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-799: -- Status: Patch Available (was: Open) > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > Attachments: HDFS-799.001.patch > > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407650#comment-13407650 ] Colin Patrick McCabe commented on HDFS-799: --- Note that the other nice thing about this solution is that it should speed things up a little bit, by eliminating the need to take a mutex in GetVM. > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > Attachments: HDFS-799.001.patch > > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-799: -- Attachment: HDFS-799.001.patch > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > Attachments: HDFS-799.001.patch > > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407630#comment-13407630 ] Andy Isaacson commented on HDFS-3597: - bq. The 2NN can be configured with multiple directories. Thanks for the explanation, that's very enlightening. Looking at the results now. > SNN can fail to start on upgrade > > > Key: HDFS-3597 > URL: https://issues.apache.org/jira/browse/HDFS-3597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3597-2.txt, hdfs-3597.txt > > > When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: > {code} > 2012-06-16 09:52:33,812 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in > doCheckpoint > java.io.IOException: Inconsistent checkpoint fields. > LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = > CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = > BP-1792677198-172.29.121.67-1339813967723. > Expecting respectively: -19; 64415959; 0; ; . > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) > at java.lang.Thread.run(Thread.java:662) > {code} > The error check we're hitting came from HDFS-1073, and it's intended to > verify that we're connecting to the correct NN. But the check is too strict > and considers "different metadata version" to be the same as "different > clusterID". > I believe the check in {{doCheckpoint}} simply needs to explicitly check for > and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3604: -- Attachment: hdfs-3604.txt Patch attached. > Add dfs.webhdfs.enabled to hdfs-default.xml > --- > > Key: HDFS-3604 > URL: https://issues.apache.org/jira/browse/HDFS-3604 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Minor > Attachments: hdfs-3604.txt > > > Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3604) Add dfs.webhdfs.enabled to hdfs-default.xml
Eli Collins created HDFS-3604: - Summary: Add dfs.webhdfs.enabled to hdfs-default.xml Key: HDFS-3604 URL: https://issues.apache.org/jira/browse/HDFS-3604 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407604#comment-13407604 ] Colin Patrick McCabe commented on HDFS-799: --- This can be accomplished by using the pthread thread-local-storage interface coupled with the "optional destructor function." See http://www.manpagez.com/man/3/pthread_key_create/ > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-799) libhdfs needs an API function that calls DetachCurrentThread
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-799: - Assignee: Colin Patrick McCabe > libhdfs needs an API function that calls DetachCurrentThread > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should provide an interface function allowing to detach the current > thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed
[ https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-799: -- Description: Threads that call AttachCurrentThread in libhdfs and disappear without calling DetachCurrentThread cause a memory leak. Libhdfs should detach the current thread when this thread exits. was: Threads that call AttachCurrentThread in libhdfs and disappear without calling DetachCurrentThread cause a memory leak. Libhdfs should provide an interface function allowing to detach the current thread. Summary: libhdfs must call DetachCurrentThread when a thread is destroyed (was: libhdfs needs an API function that calls DetachCurrentThread) > libhdfs must call DetachCurrentThread when a thread is destroyed > > > Key: HDFS-799 > URL: https://issues.apache.org/jira/browse/HDFS-799 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Christian Kunz >Assignee: Colin Patrick McCabe > > Threads that call AttachCurrentThread in libhdfs and disappear without > calling DetachCurrentThread cause a memory leak. > Libhdfs should detach the current thread when this thread exits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3015) NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and bestNode
[ https://issues.apache.org/jira/browse/HDFS-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-3015: -- Assignee: Colin Patrick McCabe > NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and bestNode > -- > > Key: HDFS-3015 > URL: https://issues.apache.org/jira/browse/HDFS-3015 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Eli Collins >Assignee: Colin Patrick McCabe >Priority: Minor > Labels: newbie > > Both NamenodeFsck and JspHelper duplicate DFSInputStream#copyBlock and > bestNode. There should be one shared implementation. > {code} > /* >* XXX (ab) Bulk of this method is copied verbatim from {@link DFSClient}, > which is >* bad. Both places should be refactored to provide a method to copy blocks >* around. >*/ > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3548) NamenodeFsck.copyBlock fails to create a Block Reader
[ https://issues.apache.org/jira/browse/HDFS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3548: --- Attachment: HDFS-3548.002.patch * fix style issues and rebase > NamenodeFsck.copyBlock fails to create a Block Reader > - > > Key: HDFS-3548 > URL: https://issues.apache.org/jira/browse/HDFS-3548 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.1, 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-3548.001.patch, HDFS-3548.002.patch > > > NamenodeFsck.copyBlock creates a Socket using {{new Socket()}}, and thus that > socket doesn't have an associated Channel. Then, it fails to create a > BlockReader since RemoteBlockReader2 needs a socket channel. > (thanks to Hiroshi Yokoi for reporting) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency
[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407596#comment-13407596 ] Hudson commented on HDFS-3170: -- Integrated in Hadoop-Mapreduce-trunk-Commit #2445 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2445/]) HDFS-3170. Add more useful metrics for write latency. Contributed by Matthew Jacobs. (Revision 1357970) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Add more useful metrics for write latency > - > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Matthew Jacobs > Fix For: 2.0.1-alpha > > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407587#comment-13407587 ] Hadoop QA commented on HDFS-3596: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535273/HDFS-3596-b1.001.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2745//console This message is automatically generated. > Improve FSEditLog pre-allocation in branch-1 > > > Key: HDFS-3596 > URL: https://issues.apache.org/jira/browse/HDFS-3596 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 1.1.0 > > Attachments: HDFS-3596-b1.001.patch > > > Implement HDFS-3510 in branch-1. This will improve FSEditLog preallocation > to decrease the incidence of corrupted logs after disk full conditions. (See > HDFS-3510 for a longer description.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency
[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407582#comment-13407582 ] Hudson commented on HDFS-3170: -- Integrated in Hadoop-Hdfs-trunk-Commit #2495 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2495/]) HDFS-3170. Add more useful metrics for write latency. Contributed by Matthew Jacobs. (Revision 1357970) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Add more useful metrics for write latency > - > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Matthew Jacobs > Fix For: 2.0.1-alpha > > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency
[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407579#comment-13407579 ] Hudson commented on HDFS-3170: -- Integrated in Hadoop-Common-trunk-Commit #2427 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2427/]) HDFS-3170. Add more useful metrics for write latency. Contributed by Matthew Jacobs. (Revision 1357970) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357970 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Add more useful metrics for write latency > - > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Matthew Jacobs > Fix For: 2.0.1-alpha > > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3603) TestHDFSTrash is failing
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407576#comment-13407576 ] Hadoop QA commented on HDFS-3603: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535260/HDFS-3603.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2744//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2744//console This message is automatically generated. > TestHDFSTrash is failing > > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3548) NamenodeFsck.copyBlock fails to create a Block Reader
[ https://issues.apache.org/jira/browse/HDFS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407573#comment-13407573 ] Colin Patrick McCabe commented on HDFS-3548: bq. Looks good. I'd go one step further to prevent similar situations (we have duplicate methods, see HDFS-3015, and only copy gets the fix) and (1) nuke the bestNode method here and use the version from jspHelper, and then (2) move copyBlock here to a util class and structure it similarly to streamBlockInAscii (eg bestNode already handles the connect timeout so we don't need to duplicate that logic in copyBlock). Yeah, there is definitely some refactoring we should do here to avoid the duplication. Let's do that as part of HDFS-3015, after the immediate bug can be fixed here. I'll re-issue this patch with style nits fixed... > NamenodeFsck.copyBlock fails to create a Block Reader > - > > Key: HDFS-3548 > URL: https://issues.apache.org/jira/browse/HDFS-3548 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.1, 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-3548.001.patch > > > NamenodeFsck.copyBlock creates a Socket using {{new Socket()}}, and thus that > socket doesn't have an associated Channel. Then, it fails to create a > BlockReader since RemoteBlockReader2 needs a socket channel. > (thanks to Hiroshi Yokoi for reporting) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407560#comment-13407560 ] Todd Lipcon commented on HDFS-3597: --- bq. That's an issue I was confused about too. I don't understand why the test has multiple checkpoint dirs, nor why my 2NN is running in snn.getCheckpointDirs().get(1) rather than .get(0). (If I corrupt the first checkpointdir, there is no perceptible effect on the testcase.) The println is a leftover from when I was still attempting to exercise the upgrade code. The 2NN can be configured with multiple directories. Our tests make use of that feature: {code} conf.set(DFS_NAMENODE_CHECKPOINT_DIR_KEY, fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 1)))+","+ fileAsURI(new File(base_dir, "namesecondary" + (2*nnIndex + 2; {code} (from MiniDFSCluster source) I bet we have some bug/feature whereby if only one of the two is corrupted, the behavior depends on which of the two it was. My guess is we iterate over each of the dirs during startup, and load the properties from each, so it's the last one which takes precedence by the time we get to the version checking code. Might be worth fixing this in a separate JIRA (out of scope for this one) Given the above, I think it makes sense to edit the VERSION file in both of those directories, though, since you're basically depending on some other bug in this test case currently. Will look at your new patch later this afternoon. > SNN can fail to start on upgrade > > > Key: HDFS-3597 > URL: https://issues.apache.org/jira/browse/HDFS-3597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3597-2.txt, hdfs-3597.txt > > > When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: > {code} > 2012-06-16 09:52:33,812 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in > doCheckpoint > java.io.IOException: Inconsistent checkpoint fields. > LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = > CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = > BP-1792677198-172.29.121.67-1339813967723. > Expecting respectively: -19; 64415959; 0; ; . > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) > at java.lang.Thread.run(Thread.java:662) > {code} > The error check we're hitting came from HDFS-1073, and it's intended to > verify that we're connecting to the correct NN. But the check is too strict > and considers "different metadata version" to be the same as "different > clusterID". > I believe the check in {{doCheckpoint}} simply needs to explicitly check for > and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3596: --- Attachment: HDFS-3596-b1.001.patch Patch for branch-1. Tested with: TestCheckpoint, TestEditLog, TestNameNodeRecovery, TestEditLogLoading, TestNameNodeMXBean, TestSaveNamespace, TestSecurityTokenEditLog, TestStorageDirectoryFailure, TestEditLogToleration, TestStorageRestore > Improve FSEditLog pre-allocation in branch-1 > > > Key: HDFS-3596 > URL: https://issues.apache.org/jira/browse/HDFS-3596 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 1.1.0 > > Attachments: HDFS-3596-b1.001.patch > > > Implement HDFS-3510 in branch-1. This will improve FSEditLog preallocation > to decrease the incidence of corrupted logs after disk full conditions. (See > HDFS-3510 for a longer description.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3596) Improve FSEditLog pre-allocation in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3596: --- Status: Patch Available (was: Open) > Improve FSEditLog pre-allocation in branch-1 > > > Key: HDFS-3596 > URL: https://issues.apache.org/jira/browse/HDFS-3596 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.1.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Fix For: 1.1.0 > > Attachments: HDFS-3596-b1.001.patch > > > Implement HDFS-3510 in branch-1. This will improve FSEditLog preallocation > to decrease the incidence of corrupted logs after disk full conditions. (See > HDFS-3510 for a longer description.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-3597: Attachment: hdfs-3597-2.txt Attaching new version of patch that addresses review comments. Please check the {{doCheckpoint}} logic specifically, I'm happy with this refactoring but am open to better suggestions. Running a full set of tests locally to verify no breakage. > SNN can fail to start on upgrade > > > Key: HDFS-3597 > URL: https://issues.apache.org/jira/browse/HDFS-3597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3597-2.txt, hdfs-3597.txt > > > When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: > {code} > 2012-06-16 09:52:33,812 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in > doCheckpoint > java.io.IOException: Inconsistent checkpoint fields. > LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = > CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = > BP-1792677198-172.29.121.67-1339813967723. > Expecting respectively: -19; 64415959; 0; ; . > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) > at java.lang.Thread.run(Thread.java:662) > {code} > The error check we're hitting came from HDFS-1073, and it's intended to > verify that we're connecting to the correct NN. But the check is too strict > and considers "different metadata version" to be the same as "different > clusterID". > I believe the check in {{doCheckpoint}} simply needs to explicitly check for > and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3170) Add more useful metrics for write latency
[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-3170: -- Resolution: Fixed Fix Version/s: 2.0.1-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-2 and trunk. Thanks, Matt! > Add more useful metrics for write latency > - > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Matthew Jacobs > Fix For: 2.0.1-alpha > > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3597) SNN can fail to start on upgrade
[ https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407544#comment-13407544 ] Andy Isaacson commented on HDFS-3597: - {quote} I think this should take {{StorageInfo}} as a parameter instead, and you would pass {{image.getStorage()}} in. {quote} Sounds good, thanks. {quote} I'm not 100% convinced of the logic. I think we should always verify that it's the same NN – but just loosen the validateStorageInfo check here to not check the versioning info. For example, if I accidentally point my 2NN at the wrong NN, it won't start, even if that NN happens to be from a different version. It should only blow its local storage away if it's the same NN (namespace/cluster) but a different version. {quote} Fair enough, but we don't want to loosen the check in {{validateStorageInfo}} because it's used in a half dozen other places that want full checking I think. I'll refactor the checks. bq. Instead, can you use {{FSImageTestUtil.corruptVersionFile}} here? Great, didn't know about that! bq. No need for these...? indeed, leftover from a previous test design. bq. Can you change this test to not need any datanodes? ... mkdir A fine plan, done. bq. It seems odd that you print out all of the checkpoint dirs, but then only corrupt the property in one of them. Shouldn't you be corrupting it in all of them? That's an issue I was confused about too. I don't understand why the test has multiple checkpoint dirs, nor why my 2NN is running in snn.getCheckpointDirs().get(1) rather than .get(0). (If I corrupt the first checkpointdir, there is no perceptible effect on the testcase.) The println is a leftover from when I was still attempting to exercise the upgrade code. bq. The spelling fix in NNStorage is unrelated. Cleanup's good, but try not to do so in files that aren't otherwise touched by your patch. Dropped. At some point during development my fix touched NNstorage. > SNN can fail to start on upgrade > > > Key: HDFS-3597 > URL: https://issues.apache.org/jira/browse/HDFS-3597 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3597.txt > > > When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up: > {code} > 2012-06-16 09:52:33,812 ERROR > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in > doCheckpoint > java.io.IOException: Inconsistent checkpoint fields. > LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = > CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = > BP-1792677198-172.29.121.67-1339813967723. > Expecting respectively: -19; 64415959; 0; ; . > at > org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297) > at java.lang.Thread.run(Thread.java:662) > {code} > The error check we're hitting came from HDFS-1073, and it's intended to > verify that we're connecting to the correct NN. But the check is too strict > and considers "different metadata version" to be the same as "different > clusterID". > I believe the check in {{doCheckpoint}} simply needs to explicitly check for > and handle the update case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3170) Add more useful metrics for write latency
[ https://issues.apache.org/jira/browse/HDFS-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407537#comment-13407537 ] Todd Lipcon commented on HDFS-3170: --- +1, the patch looks good to me. Thanks for these nice new metrics, Matt. > Add more useful metrics for write latency > - > > Key: HDFS-3170 > URL: https://issues.apache.org/jira/browse/HDFS-3170 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Matthew Jacobs > Attachments: hdfs-3170.txt, hdfs-3170.txt, hdfs-3170.txt > > > Currently, the only write-latency related metric we expose is the total > amount of time taken by opWriteBlock. This is practically useless, since (a) > different blocks may be wildly different sizes, and (b) if the writer is only > generating data slowly, it will make a block write take longer by no fault of > the DN. I would like to propose two new metrics: > 1) *flush-to-disk time*: count how long it takes for each call to flush an > incoming packet to disk (including the checksums). In most cases this will be > close to 0, as it only flushes to buffer cache, but if the backing block > device enters congested writeback, it can take much longer, which provides an > interesting metric. > 2) *round trip to downstream pipeline node*: track the round trip latency for > the part of the pipeline between the local node and its downstream neighbors. > When we add a new packet to the ack queue, save the current timestamp. When > we receive an ack, update the metric based on how long since we sent the > original packet. This gives a metric of the total RTT through the pipeline. > If we also include this metric in the ack to upstream, we can subtract the > amount of time due to the later stages in the pipeline and have an accurate > count of this particular link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3603) TestHDFSTrash is failing
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned HDFS-3603: Assignee: Jason Lowe > TestHDFSTrash is failing > > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3603) TestHDFSTrash is failing
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-3603: - Target Version/s: 0.23.3, 2.0.1-alpha, 3.0.0 Status: Patch Available (was: Open) > TestHDFSTrash is failing > > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3603) TestHDFSTrash is failing
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HDFS-3603: - Attachment: HDFS-3603.patch Patch to update TestHDFSTrash to JUnit 4 and only execute the two test cases that TestHDFSTrash provides. > TestHDFSTrash is failing > > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HDFS-3603.patch > > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments
[ https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407483#comment-13407483 ] Hadoop QA commented on HDFS-3482: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535248/HDFS-3482-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2743//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2743//console This message is automatically generated. > hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified > without arguments > > > Key: HDFS-3482 > URL: https://issues.apache.org/jira/browse/HDFS-3482 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.0-alpha >Reporter: Stephen Chu >Assignee: madhukara phatak >Priority: Minor > Labels: newbie > Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, > HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch > > > When running the hdfs balancer with an option but no argument, we run into an > ArrayIndexOutOfBoundsException. It's preferable to print the usage. > {noformat} > bash-3.2$ hdfs balancer -threshold > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > bash-3.2$ hdfs balancer -policy > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3603) TestHDFSTrash is failing
[ https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407461#comment-13407461 ] Jason Lowe commented on HDFS-3603: -- Failure is: {noformat} testTrashEmptier(org.apache.hadoop.hdfs.TestHDFSTrash) Time elapsed: 0.025 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.fs.TestTrash.testTrashEmptier(TestTrash.java:536) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:23) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.extensions.TestSetup.run(TestSetup.java:27) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) {noformat} Problem seems to have been triggered since HADOOP-8110 was integrated, although that appears to have uncovered an existing issue rather than causing it. Here's what's happening: * TestViewFSTrash runs and can end up leaving 4 things in the trash, like: {noformat} $ ls ~/.Trash 120705182754 120705182754-1 120705182754-2 Current {noformat} * TestHDFSTrash runs testTrashEmptier, sees there are 4 things in the trash, and since it has found 4 checkpoints, it immediately asserts if the current trash directory listing is < 4. The 4 < 4 assert fails the test. * If there are fewer than 4 things in the trash when testTrashEmptier starts, the test will pass. If there are more than 4 things in the trash when testTrashEmptier starts then it can hang, see HADOOP-7326. The saddest thing is TestHDFSTrash isn't even testing HDFS when it runs testTrashEmptier, because that test simply uses a local filesystem config. TestHDFSTrash is picking it up because it inherits from TestTrash which contains that test case. > TestHDFSTrash is failing > > > Key: HDFS-3603 > URL: https://issues.apache.org/jira/browse/HDFS-3603 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 >Reporter: Jason Lowe >Priority: Blocker > > TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407458#comment-13407458 ] Uma Maheswara Rao G commented on HDFS-3541: --- +1 Patch looks good to me as well. I will commit this patch in some time. > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3603) TestHDFSTrash is failing
Jason Lowe created HDFS-3603: Summary: TestHDFSTrash is failing Key: HDFS-3603 URL: https://issues.apache.org/jira/browse/HDFS-3603 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 Reporter: Jason Lowe Priority: Blocker TestHDFSTrash is failing pretty regularly during test builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407436#comment-13407436 ] Kihwal Lee commented on HDFS-3541: -- The new patch looks good. I ran the new test case without the fix. It successfully deadlocked and failed. It passed with the actual fix. > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments
[ https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-3482: -- Attachment: HDFS-3482-4.patch Attached the same patch as Madhu. Let's see the Jenkins results before commit. > hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified > without arguments > > > Key: HDFS-3482 > URL: https://issues.apache.org/jira/browse/HDFS-3482 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.0-alpha >Reporter: Stephen Chu >Assignee: madhukara phatak >Priority: Minor > Labels: newbie > Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, > HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch > > > When running the hdfs balancer with an option but no argument, we run into an > ArrayIndexOutOfBoundsException. It's preferable to print the usage. > {noformat} > bash-3.2$ hdfs balancer -threshold > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > bash-3.2$ hdfs balancer -policy > Usage: java Balancer > [-policy ]the balancing policy: datanode or blockpool > [-threshold ] Percentage of disk capacity > Balancing took 261.0 milliseconds > 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520) > at > org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
[ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G reassigned HDFS-3586: - Assignee: amith > Blocks are not getting replicate even DN's are availble. > > > Key: HDFS-3586 > URL: https://issues.apache.org/jira/browse/HDFS-3586 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: amith > Attachments: HDFS-3586-analysis.txt > > > Scenario: > = > Started four DN's(Say DN1,DN2,DN3 and DN4) > writing files with RF=3.. > formed pipeline with DN1->DN2->DN3. > Since DN3 network is very slow.it's not able to send acks. > Again pipeline is fromed with DN1->DN2->DN4. > Here DN4 network is also slow. > So finally commitblocksync happend tp DN1 and DN2 successfully. > block present in all the four DN's(finalized state in two DN's and rbw state > in another DN's).. > Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's > are already present in RBW dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
[ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407410#comment-13407410 ] Uma Maheswara Rao G commented on HDFS-3586: --- Thanks Konstantin. Assigning it to Amith as he started working on this change. > Blocks are not getting replicate even DN's are availble. > > > Key: HDFS-3586 > URL: https://issues.apache.org/jira/browse/HDFS-3586 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node >Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 3.0.0 >Reporter: Brahma Reddy Battula > Attachments: HDFS-3586-analysis.txt > > > Scenario: > = > Started four DN's(Say DN1,DN2,DN3 and DN4) > writing files with RF=3.. > formed pipeline with DN1->DN2->DN3. > Since DN3 network is very slow.it's not able to send acks. > Again pipeline is fromed with DN1->DN2->DN4. > Here DN4 network is also slow. > So finally commitblocksync happend tp DN1 and DN2 successfully. > block present in all the four DN's(finalized state in two DN's and rbw state > in another DN's).. > Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's > are already present in RBW dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407336#comment-13407336 ] Hadoop QA commented on HDFS-3541: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535219/HDFS-3541-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2742//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2742//console This message is automatically generated. > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3602) Enhancements to HDFS for Windows Server and Windows Azure development and runtime environments
Bikas Saha created HDFS-3602: Summary: Enhancements to HDFS for Windows Server and Windows Azure development and runtime environments Key: HDFS-3602 URL: https://issues.apache.org/jira/browse/HDFS-3602 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha This JIRA tracks the work that needs to be done on trunk to enable Hadoop to run on Windows Server and Azure environments. This incorporates porting relevant work from the similar effort on branch 1 tracked via HADOOP-8079. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3541) Deadlock between recovery, xceiver and packet responder
[ https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay updated HDFS-3541: Attachment: HDFS-3541-2.patch Attaching the patch which address above comments. Thanks Lee for the hint to write test to reproduce same case. > Deadlock between recovery, xceiver and packet responder > --- > > Key: HDFS-3541 > URL: https://issues.apache.org/jira/browse/HDFS-3541 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.3, 2.0.1-alpha >Reporter: suja s >Assignee: Vinay > Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch > > > Block Recovery initiated while write in progress at Datanode side. Found a > lock between recovery, xceiver and packet responder. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check
[ https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407143#comment-13407143 ] Hudson commented on HDFS-3581: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3581. FSPermissionChecker#checkPermission sticky bit check missing range check. Contributed by Eli Collins (Revision 1356971) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356971 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java > FSPermissionChecker#checkPermission sticky bit check missing range check > - > > Key: HDFS-3581 > URL: https://issues.apache.org/jira/browse/HDFS-3581 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 2.0.1-alpha > > Attachments: hdfs-3581.txt > > > The checkStickyBit call in FSPermissionChecker#checkPermission is missing a > range check which results in an index out of bounds when accessing root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3343) Improve metrics for DN read latency
[ https://issues.apache.org/jira/browse/HDFS-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407140#comment-13407140 ] Hudson commented on HDFS-3343: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Improve metrics for DN read latency > --- > > Key: HDFS-3343 > URL: https://issues.apache.org/jira/browse/HDFS-3343 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Todd Lipcon >Assignee: Andrew Wang > Fix For: 2.0.1-alpha > > Attachments: hdfs-3343-2.patch, hdfs-3343-3.patch, hdfs-3343-4.patch, > hdfs-3343.patch > > > Similar to HDFS-3170 on the write side, we should improve the metrics that > are generated on the DN for read latency. We should have separate metrics for > the time spent in {{transferTo}} vs {{waitWritable}} so that it's easy to > distinguish slow local disks from slow readers on the other end of the socket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407131#comment-13407131 ] Hudson commented on HDFS-3190: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3190. Simple refactors in existing NN code to assist QuorumJournalManager extension. Contributed by Todd Lipcon. (Revision 1356525) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356525 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageErrorReporter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AtomicFileOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/PersistentLongFile.java > Simple refactors in existing NN code to assist QuorumJournalManager extension > - > > Key: HDFS-3190 > URL: https://issues.apache.org/jira/browse/HDFS-3190 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, > hdfs-3190.txt, hdfs-3190.txt > > > This JIRA is for some simple refactors in the NN: > - refactor the code which writes the seen_txid file in NNStorage into a new > "LongContainingFile" utility class. This is useful for the JournalNode to > atomically/durably record its last promised epoch > - refactor the interface from FileJournalManager back to StorageDirectory to > use a StorageErrorReport interface. This allows FileJournalManager to be used > in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened
[ https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407133#comment-13407133 ] Hudson commented on HDFS-3157: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3157. Fix a bug in the case that the generation stamps of the stored block in a namenode and the reported block from a datanode do not match. Contributed by Ashish Singhi (Revision 1356086) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356086 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestRBWBlockInvalidation.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java > Error in deleting block is keep on coming from DN even after the block report > and directory scanning has happened > - > > Key: HDFS-3157 > URL: https://issues.apache.org/jira/browse/HDFS-3157 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: J.Andreina >Assignee: Ashish Singhi > Fix For: 2.0.1-alpha > > Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch, > HDFS-3157-3.patch, HDFS-3157-3.patch, HDFS-3157-4.patch, HDFS-3157-5.patch, > HDFS-3157.patch, HDFS-3157.patch, HDFS-3157.patch, h3157_20120618.patch > > > Cluster setup: > 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" > 300,"dfs.datanode.directoryscan.interval" 1 > step 1: write one file "a.txt" with sync(not closed) > step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which > replication happened. > step 3: close the file. > Since the replication factor is 2 the blocks are replicated to the other > datanode. > Then at the NN side the following cmd is issued to DN from which the block is > deleted > - > {noformat} > 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK > NameSystem.addToCorruptReplicasMap: duplicate requested for > blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX > because reported RBW replica with genstamp 1002 does not match COMPLETE > block's genstamp in block map 1003 > 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > Removing block blk_2903555284838653156_1003 from neededReplications as it has > enough replicas. > {noformat} > From the datanode side in which the block is deleted the following exception > occured > {noformat} > 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Unexpected error trying to delete block blk_2903555284838653156_1003. > BlockInfo not found in volumeMap. > 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Error processing datanode Command > java.io.IOException: Error in deleting blocks. > at > org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662) > at java.lang.Thread.run(Thread.java:619) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3573) Supply NamespaceInfo when instantiating JournalManagers
[ https://issues.apache.org/jira/browse/HDFS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407130#comment-13407130 ] Hudson commented on HDFS-3573: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3573. Supply NamespaceInfo when instantiating JournalManagers. Contributed by Todd Lipcon. (Revision 1356388) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356388 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java > Supply NamespaceInfo when instantiating JournalManagers > --- > > Key: HDFS-3573 > URL: https://issues.apache.org/jira/browse/HDFS-3573 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-3573.txt, hdfs-3573.txt, hdfs-3573.txt, > hdfs-3573.txt > > > Currently, the JournalManagers are instantiated before the NamespaceInfo is > loaded from local storage directories. This is problematic since the JM may > want to verify that the storage info associated with the journal matches the > NN which is starting up (eg to prevent an operator accidentally configuring > two clusters against the same remote journal storage). This JIRA rejiggers > the initialization sequence so that the JMs receive NamespaceInfo as a > constructor argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3575) HttpFS does not log Exception Stacktraces
[ https://issues.apache.org/jira/browse/HDFS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407126#comment-13407126 ] Hudson commented on HDFS-3575: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3575. HttpFS does not log Exception Stacktraces (brocknoland via tucu) (Revision 1356330) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356330 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSExceptionProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > HttpFS does not log Exception Stacktraces > - > > Key: HDFS-3575 > URL: https://issues.apache.org/jira/browse/HDFS-3575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Brock Noland >Assignee: Brock Noland >Priority: Minor > Labels: newbie > Fix For: 2.0.1-alpha > > Attachments: HDFS-3575-1.patch > > > In the 'log' method of the HttpFSExceptionProvider we log exceptions as > "warn" but the stacktrace itself is not logged: > LOG.warn("[{}:{}] response [{}] {}", new Object[]{method, path, status, > message, throwable}); > We should log the exception here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3601) Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology
[ https://issues.apache.org/jira/browse/HDFS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407136#comment-13407136 ] Hudson commented on HDFS-3601: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3601. Add BlockPlacementPolicyWithNodeGroup to support block placement with 4-layer network topology. Contributed by Junping Du (Revision 1357442) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357442 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java > Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network > topology > - > > Key: HDFS-3601 > URL: https://issues.apache.org/jira/browse/HDFS-3601 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Junping Du >Assignee: Junping Du > Fix For: 3.0.0 > > Attachments: > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v2.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v3.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v4.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v5.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v6.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl.patch > > > A subclass of ReplicaPlacementPolicyDefault, ReplicaPlacementPolicyNodeGroup > was developed along with unit tests to support the four-layer hierarchical > topology. > The replica placement strategy used in ReplicaPlacementPolicyNodeGroup > virtualization is almost the same as the original one. The differences are: > 1. The 3rd replica will be off node group of the 2nd replica > 2. If there is no local node available, the 1st replica will be placed on a > node in the local node group. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3574) Fix small race and do some cleanup in GetImageServlet
[ https://issues.apache.org/jira/browse/HDFS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407128#comment-13407128 ] Hudson commented on HDFS-3574: -- Integrated in Hadoop-Mapreduce-trunk #1127 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/]) HDFS-3574. Fix small race and do some cleanup in GetImageServlet. Contributed by Todd Lipcon. (Revision 1356939) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356939 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > Fix small race and do some cleanup in GetImageServlet > - > > Key: HDFS-3574 > URL: https://issues.apache.org/jira/browse/HDFS-3574 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 2.0.1-alpha > > Attachments: hdfs-3574.txt, hdfs-3574.txt, hdfs-3574.txt, > hdfs-3574.txt > > > There's a very small race window in GetImageServlet, if the following > interleaving occurs: > - The Storage object returns some local file in the storage directory (eg an > edits file or image file) > - *Race*: some other process removes the file > - GetImageServlet calls file.length() which returns 0, since it doesn't > exist. It thus faithfully sets the Content-Length header to 0 > - getFileClient() throws FileNotFoundException when trying to open the file. > But, since we call response.getOutputStream() before this, the headers have > already been sent, so we fail to send the "404" or "500" response that we > should. > Thus, the client sees a 0-length Content-Length followed by 0 lengths of > content, and thinks it successfully has downloaded the target file, where in > fact it downloads an empty one. > I saw this in practice during the "edits synchronization" phase of recovery > while working on HDFS-3077, though it could apply on existing code paths, as > well, I believe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3442) Incorrect count for Missing Replicas in FSCK report
[ https://issues.apache.org/jira/browse/HDFS-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407058#comment-13407058 ] Hudson commented on HDFS-3442: -- Integrated in Hadoop-Hdfs-0.23-Build #304 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/304/]) svn merge -c 1345408 FIXES: HDFS-3442. Incorrect count for Missing Replicas in FSCK report. Contributed by Andrew Wang. (Revision 1356828) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356828 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java > Incorrect count for Missing Replicas in FSCK report > --- > > Key: HDFS-3442 > URL: https://issues.apache.org/jira/browse/HDFS-3442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: suja s >Assignee: Andrew Wang >Priority: Minor > Fix For: 0.23.3 > > Attachments: HDFS-3442-2.patch, HDFS-3442-3.patch, HDFS-3442.patch > > > Scenario: > Cluster running in HA mode with 2 DNs. Files are written with replication > factor as 3. > There are 7 blocks in cluster. > FSCK report is including all blocks in UnderReplicated Blocks as well as > Missing Replicas. > HOST-XX-XX-XX-102:/home/Apr4/hadoop-2.0.0-SNAPSHOT/bin # ./hdfs fsck / > Connecting to namenode via http://XX.XX.XX.55:50070 > FSCK started by root (auth:SIMPLE) from /XX.XX.XX.102 for path / at Wed Apr > 04 17:28:37 IST 2012 > . > /1: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_2551710840802340037_1002. Target > Replicas is 3 but found 2 replica(s). > . > /2: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_-3851276776144500288_1004. Target > Replicas is 3 but found 2 replica(s). > . > /3: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_-3210606555285049524_1006. Target > Replicas is 3 but found 2 replica(s). > . > /4: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_4028835120510075310_1008. Target > Replicas is 3 but found 2 replica(s). > . > /5: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_-5238093749956876969_1010. Target > Replicas is 3 but found 2 replica(s). > . > /testrenamed/file1renamed: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_-5669194716756513504_1012. Target > Replicas is 3 but found 2 replica(s). > . > /testrenamed/file2: Under replicated > BP-534619337-XX.XX.XX.55-1333526344705:blk_8510284478280941311_1014. Target > Replicas is 3 but found 2 replica(s). > Status: HEALTHY > Total size:33215 B > Total dirs:3 > Total files: 7 (Files currently being written: 1) > Total blocks (validated): 7 (avg. block size 4745 B) > Minimally replicated blocks: 7 (100.0 %) > Over-replicated blocks:0 (0.0 %) > Under-replicated blocks: 7 (100.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor:3 > Average block replication: 2.0 > Corrupt blocks:0 > Missing replicas: 7 (50.0 %) > Number of data-nodes: 2 > Number of racks: 1 > FSCK ended at Wed Apr 04 17:28:37 IST 2012 in 2 milliseconds > The filesystem under path '/' is HEALTHY > Also it indicates a measure as 50% in brackets (There are only 7 blocks in > cluster and so if all 7 are included as Missing replicas it should be 100%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3343) Improve metrics for DN read latency
[ https://issues.apache.org/jira/browse/HDFS-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407014#comment-13407014 ] Hudson commented on HDFS-3343: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > Improve metrics for DN read latency > --- > > Key: HDFS-3343 > URL: https://issues.apache.org/jira/browse/HDFS-3343 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Reporter: Todd Lipcon >Assignee: Andrew Wang > Fix For: 2.0.1-alpha > > Attachments: hdfs-3343-2.patch, hdfs-3343-3.patch, hdfs-3343-4.patch, > hdfs-3343.patch > > > Similar to HDFS-3170 on the write side, we should improve the metrics that > are generated on the DN for read latency. We should have separate metrics for > the time spent in {{transferTo}} vs {{waitWritable}} so that it's easy to > distinguish slow local disks from slow readers on the other end of the socket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3190) Simple refactors in existing NN code to assist QuorumJournalManager extension
[ https://issues.apache.org/jira/browse/HDFS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407008#comment-13407008 ] Hudson commented on HDFS-3190: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3190. Simple refactors in existing NN code to assist QuorumJournalManager extension. Contributed by Todd Lipcon. (Revision 1356525) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356525 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/StorageErrorReporter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AtomicFileOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/PersistentLongFile.java > Simple refactors in existing NN code to assist QuorumJournalManager extension > - > > Key: HDFS-3190 > URL: https://issues.apache.org/jira/browse/HDFS-3190 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-3190.txt, hdfs-3190.txt, hdfs-3190.txt, > hdfs-3190.txt, hdfs-3190.txt > > > This JIRA is for some simple refactors in the NN: > - refactor the code which writes the seen_txid file in NNStorage into a new > "LongContainingFile" utility class. This is useful for the JournalNode to > atomically/durably record its last promised epoch > - refactor the interface from FileJournalManager back to StorageDirectory to > use a StorageErrorReport interface. This allows FileJournalManager to be used > in isolation of a full StorageDirectory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3573) Supply NamespaceInfo when instantiating JournalManagers
[ https://issues.apache.org/jira/browse/HDFS-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407007#comment-13407007 ] Hudson commented on HDFS-3573: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3573. Supply NamespaceInfo when instantiating JournalManagers. Contributed by Todd Lipcon. (Revision 1356388) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356388 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorage.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java > Supply NamespaceInfo when instantiating JournalManagers > --- > > Key: HDFS-3573 > URL: https://issues.apache.org/jira/browse/HDFS-3573 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 3.0.0 > > Attachments: hdfs-3573.txt, hdfs-3573.txt, hdfs-3573.txt, > hdfs-3573.txt > > > Currently, the JournalManagers are instantiated before the NamespaceInfo is > loaded from local storage directories. This is problematic since the JM may > want to verify that the storage info associated with the journal matches the > NN which is starting up (eg to prevent an operator accidentally configuring > two clusters against the same remote journal storage). This JIRA rejiggers > the initialization sequence so that the JMs receive NamespaceInfo as a > constructor argument. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3601) Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology
[ https://issues.apache.org/jira/browse/HDFS-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407012#comment-13407012 ] Hudson commented on HDFS-3601: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3601. Add BlockPlacementPolicyWithNodeGroup to support block placement with 4-layer network topology. Contributed by Junping Du (Revision 1357442) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1357442 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java > Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network > topology > - > > Key: HDFS-3601 > URL: https://issues.apache.org/jira/browse/HDFS-3601 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node >Reporter: Junping Du >Assignee: Junping Du > Fix For: 3.0.0 > > Attachments: > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v2.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v3.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v4.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v5.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl-v6.patch, > HADOOP-8472-BlockPlacementPolicyWithNodeGroup-impl.patch > > > A subclass of ReplicaPlacementPolicyDefault, ReplicaPlacementPolicyNodeGroup > was developed along with unit tests to support the four-layer hierarchical > topology. > The replica placement strategy used in ReplicaPlacementPolicyNodeGroup > virtualization is almost the same as the original one. The differences are: > 1. The 3rd replica will be off node group of the 2nd replica > 2. If there is no local node available, the 1st replica will be placed on a > node in the local node group. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check
[ https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407016#comment-13407016 ] Hudson commented on HDFS-3581: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3581. FSPermissionChecker#checkPermission sticky bit check missing range check. Contributed by Eli Collins (Revision 1356971) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356971 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsFileSystemContract.java > FSPermissionChecker#checkPermission sticky bit check missing range check > - > > Key: HDFS-3581 > URL: https://issues.apache.org/jira/browse/HDFS-3581 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 2.0.1-alpha > > Attachments: hdfs-3581.txt > > > The checkStickyBit call in FSPermissionChecker#checkPermission is missing a > range check which results in an index out of bounds when accessing root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3574) Fix small race and do some cleanup in GetImageServlet
[ https://issues.apache.org/jira/browse/HDFS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407005#comment-13407005 ] Hudson commented on HDFS-3574: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3574. Fix small race and do some cleanup in GetImageServlet. Contributed by Todd Lipcon. (Revision 1356939) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356939 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ServletUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/GetImageServlet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/TransferFsImage.java > Fix small race and do some cleanup in GetImageServlet > - > > Key: HDFS-3574 > URL: https://issues.apache.org/jira/browse/HDFS-3574 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: 2.0.1-alpha > > Attachments: hdfs-3574.txt, hdfs-3574.txt, hdfs-3574.txt, > hdfs-3574.txt > > > There's a very small race window in GetImageServlet, if the following > interleaving occurs: > - The Storage object returns some local file in the storage directory (eg an > edits file or image file) > - *Race*: some other process removes the file > - GetImageServlet calls file.length() which returns 0, since it doesn't > exist. It thus faithfully sets the Content-Length header to 0 > - getFileClient() throws FileNotFoundException when trying to open the file. > But, since we call response.getOutputStream() before this, the headers have > already been sent, so we fail to send the "404" or "500" response that we > should. > Thus, the client sees a 0-length Content-Length followed by 0 lengths of > content, and thinks it successfully has downloaded the target file, where in > fact it downloads an empty one. > I saw this in practice during the "edits synchronization" phase of recovery > while working on HDFS-3077, though it could apply on existing code paths, as > well, I believe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3575) HttpFS does not log Exception Stacktraces
[ https://issues.apache.org/jira/browse/HDFS-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407004#comment-13407004 ] Hudson commented on HDFS-3575: -- Integrated in Hadoop-Hdfs-trunk #1094 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/]) HDFS-3575. HttpFS does not log Exception Stacktraces (brocknoland via tucu) (Revision 1356330) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356330 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSExceptionProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > HttpFS does not log Exception Stacktraces > - > > Key: HDFS-3575 > URL: https://issues.apache.org/jira/browse/HDFS-3575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Brock Noland >Assignee: Brock Noland >Priority: Minor > Labels: newbie > Fix For: 2.0.1-alpha > > Attachments: HDFS-3575-1.patch > > > In the 'log' method of the HttpFSExceptionProvider we log exceptions as > "warn" but the stacktrace itself is not logged: > LOG.warn("[{}:{}] response [{}] {}", new Object[]{method, path, status, > message, throwable}); > We should log the exception here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira