[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
[ https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-554: - Status: Open (was: Patch Available) > BlockInfo.ensureCapacity may get a speedup from System.arraycopy() > -- > > Key: HDFS-554 > URL: https://issues.apache.org/jira/browse/HDFS-554 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Harsh J >Priority: Minor > Fix For: 0.24.0 > > Attachments: HDFS-554.patch, HDFS-554.txt > > > BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into > the expanded array. {{System.arraycopy()}} is generally much faster for > this, as it can do a bulk memory copy. There is also the typesafe Java6 > {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
[ https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-554: - Status: Patch Available (was: Open) > BlockInfo.ensureCapacity may get a speedup from System.arraycopy() > -- > > Key: HDFS-554 > URL: https://issues.apache.org/jira/browse/HDFS-554 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Harsh J >Priority: Minor > Fix For: 0.24.0 > > Attachments: HDFS-554.patch, HDFS-554.txt > > > BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into > the expanded array. {{System.arraycopy()}} is generally much faster for > this, as it can do a bulk memory copy. There is also the typesafe Java6 > {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171758#comment-13171758 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Mapreduce-0.23-Commit #315 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/315/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171752#comment-13171752 ] Hudson commented on HDFS-2700: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1473 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1473/]) HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0 > > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171753#comment-13171753 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1473 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1473/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir
[ https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171748#comment-13171748 ] Todd Lipcon commented on HDFS-2703: --- Looks good, but see the comment about a test plan in my comment on HDFS-2701 > removedStorageDirs is not updated everywhere we remove a storage dir > > > Key: HDFS-2703 > URL: https://issues.apache.org/jira/browse/HDFS-2703 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2703.txt > > > There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) > where we remove a storage directory but don't add it to the > removedStorageDirs list. This means a storage dir may have been removed but > we don't see it in the log or Web UI. This doesn't affect trunk/23 since the > code there is totally different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2701) Cleanup FS* processIOError methods
[ https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171747#comment-13171747 ] Todd Lipcon commented on HDFS-2701: --- - The behavior of exitIfInvalidStreams is extremely counter-intuitive... why don't you change it to check for an empty list, and just change the call site to call _after_ the errored dir is removed? In removeEditsAndStorageDir: {code} +editStreams.remove(idx); +fsimage.removeStorageDir(getStorageDirForStream(idx)); {code} I don't think this is correct -- because getStorageDirForStream is called after it's removed from editStreams, it will remove the one that came _after_ the stream in the storage dir list (or throw an ArrayIndexOutOfBounds if it was the last stream) In {{removeEditsStreamsAndStorageDirs}}, you can use a foreach loop instead of indexed iteration: {code} +for (int i = 0; i < errorStreams.size(); i++) { + int idx = editStreams.indexOf(errorStreams.get(i)); {code} {code} +FSNamesystem.LOG.error("Unable to sync edit log"); {code} we should probably include the path of the failed stream here {code} + throw new IOException( + "Inconsistent existance of edits.new " + editsNew); {code} spelling error - should be "existence" What's the test plan for this, HDFS-2702, and HDFS-2703? I agree its buggy but we should articulate a way to make sure we fixed all the issues and didn't introduce new ones. > Cleanup FS* processIOError methods > -- > > Key: HDFS-2701 > URL: https://issues.apache.org/jira/browse/HDFS-2701 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt > > > Let's rename the various "processIOError" methods to be more descriptive. The > current code makes it difficult to identify and reason about bug fixes. While > we're at it let's remove "Fatal" from the "Unable to sync the edit log" log > since it's not actually a fatal error (this is confusing to users). And 2NN > "Checkpoint done" should be info, not a warning (also confusing to users). > Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171745#comment-13171745 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Common-trunk-Commit #1450 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1450/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171744#comment-13171744 ] Hudson commented on HDFS-2700: -- Integrated in Hadoop-Common-trunk-Commit #1450 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1450/]) HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0 > > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171741#comment-13171741 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Hdfs-trunk-Commit #1523 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1523/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220317 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171740#comment-13171740 ] Hudson commented on HDFS-2700: -- Integrated in Hadoop-Hdfs-trunk-Commit #1523 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1523/]) HDFS-2700. Fix failing TestDataNodeMultipleRegistrations in trunk. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220315 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0 > > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171743#comment-13171743 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Common-0.23-Commit #304 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/304/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171738#comment-13171738 ] Hudson commented on HDFS-2553: -- Integrated in Hadoop-Hdfs-0.23-Commit #293 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/293/]) HDFS-2553. Fix BlockPoolSliceScanner spinning in a tight loop. Contributed by Uma Maheswara Rao G. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1220316 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2553. --- Resolution: Fixed Fix Version/s: 0.23.1 0.24.0 Target Version/s: 0.24.0, 0.23.1 (was: 0.23.1, 0.24.0) Hadoop Flags: Reviewed Committed to trunk and 23. Thanks, Uma! > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.24.0, 0.23.1 > > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2553: -- Target Version/s: 0.24.0, 0.23.1 (was: 0.23.1, 0.24.0) Status: Open (was: Patch Available) > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2700: -- Resolution: Fixed Fix Version/s: 0.24.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk, thx Uma. > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Fix For: 0.24.0 > > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171732#comment-13171732 ] Todd Lipcon commented on HDFS-2700: --- +1, thanks for taking care of this Uma. Will commit momentarily. > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171712#comment-13171712 ] Scott Carey commented on HDFS-2699: --- That brings up a related question: Why a 4 byte crc per 512 bytes and not per 4096 bytes? 512 aligns with the old hard drive block size, the physical media had ECC at 512 byte blocks and could not read or write in smaller chunks than that. New hard drives all have 4096 byte blocks and ECC at that granularity -- no smaller chunk can be read or written. SSDs use 4096 or 8192 byte blocks these days. If the physical media is corrupting blocks, these will most likely be corrupted in 4k chunks. A CRC per 4k decreases the checksum overhead by a factor of 8, increasing the likelihood of finding it in OS cache if it is in a side file. Now that CRC is accelerated by the processor and very fast, I don't think the overhead of the larger block CRC for reads smaller than 4k will matter either. Inlining the CRC could decrease seek and OS pagecache overhead a lot. Since most file systems and OS's work on 4k blocks, HDFS could store a 4 byte crc and 4092 bytes of data into a single OS / disk page. (Or, 8 4 byte CRCs and 4064 bytes in a page) This has big advantages: If your data is in the OS pagecache, the crc will be too -- one will never be written to disk without the other, nor evicted from cache without the other. > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2
[ https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171704#comment-13171704 ] Todd Lipcon commented on HDFS-2654: --- woops, sorry to have missed that. IOUtils.closeStream is better. > Make BlockReaderLocal not extend RemoteBlockReader2 > --- > > Key: HDFS-2654 > URL: https://issues.apache.org/jira/browse/HDFS-2654 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.23.1, 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, > hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, > hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, > hdfs-2654-b1-4.patch > > > The BlockReaderLocal code paths are easier to understand (especially true on > branch-1 where BlockReaderLocal inherits code from BlockerReader and > FSInputChecker) if the local and remote block reader implementations are > independent, and they're not really sharing much code anyway. If for some > reason they start to share sifnificant code we can make the BlockReader > interface an abstract class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2335) DataNodeCluster and NNStorage always pull fresh entropy
[ https://issues.apache.org/jira/browse/HDFS-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2335: -- Affects Version/s: 1.0.0 0.24.0 0.23.0 > DataNodeCluster and NNStorage always pull fresh entropy > --- > > Key: HDFS-2335 > URL: https://issues.apache.org/jira/browse/HDFS-2335 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, name-node >Affects Versions: 0.23.0, 0.24.0, 1.0.0 >Reporter: Eli Collins >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2335.patch, HDFS-2335.patch > > > Jira for giving DataNodeCluster and NNStorage the same treatment as > HDFS-1835. They're not truly cryptographic uses as well. We should also > factor this out to a utility method, seems like the three uses are slightly > different, eg one uses DFSUtil.getRandom and the other creates a new Random > object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171698#comment-13171698 ] Todd Lipcon commented on HDFS-2699: --- +1 on considering putting them in the same file. Block files already have a metadata header so we could backward-compatibly support the earlier format without requiring a data rewrite on upgrade (prohibitively expensive) Regarding the other ideas, like caching checksums in buffer cache or on SSD, I think the issue here is that the 0.78% overhead (4/512) still makes for fairly large checksum size on a big DN. For example, if the application has a dataset of 4TB per node, then even caching just the checksums is 31GB of RAM. If you're mostly missing HBase's data cache, then you'll probably be missing the checksum cache too (are you really going to devote 30G to it?) > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2654) Make BlockReaderLocal not extend RemoteBlockReader2
[ https://issues.apache.org/jira/browse/HDFS-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2654: -- Attachment: hdfs-2654-b1-4-fix.patch In hdfs-2654-b1-4.patch I replaced reader.close with IOUtils.closeStream, which is incorrect (reader may be null). Fixing that. > Make BlockReaderLocal not extend RemoteBlockReader2 > --- > > Key: HDFS-2654 > URL: https://issues.apache.org/jira/browse/HDFS-2654 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.23.1, 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2654-1.patch, hdfs-2654-2.patch, hdfs-2654-2.patch, > hdfs-2654-2.patch, hdfs-2654-3.patch, hdfs-2654-b1-1.patch, > hdfs-2654-b1-2.patch, hdfs-2654-b1-3.patch, hdfs-2654-b1-4-fix.patch, > hdfs-2654-b1-4.patch > > > The BlockReaderLocal code paths are easier to understand (especially true on > branch-1 where BlockReaderLocal inherits code from BlockerReader and > FSInputChecker) if the local and remote block reader implementations are > independent, and they're not really sharing much code anyway. If for some > reason they start to share sifnificant code we can make the BlockReader > interface an abstract class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2679) Add interface to query current state to HAServiceProtocol
[ https://issues.apache.org/jira/browse/HDFS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2679: -- Attachment: hdfs-2679.txt Updated patch to reflect HADOOP-7925. > Add interface to query current state to HAServiceProtocol > -- > > Key: HDFS-2679 > URL: https://issues.apache.org/jira/browse/HDFS-2679 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2679.txt, hdfs-2679.txt, hdfs-2679.txt, > hdfs-2679.txt, hdfs-2679.txt > > > Let's add an interface to HAServiceProtocol to query the current state of a > NameNode for use by the the CLI (HAAdmin) and Web UI (HDFS-2677). This > essentially makes the names "active" and "standby" from ACTIVE_STATE and > STANDBY_STATE public interfaces, which IMO seems reasonable. Unlike the other > APIs we should be able to use the interface even when HA is not enabled (as > by default a non-HA NN is active). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2677) HA: Web UI should indicate the NN state
[ https://issues.apache.org/jira/browse/HDFS-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2677: -- Attachment: hdfs-2677.txt Updated patch to reflect HADOOP-7925. > HA: Web UI should indicate the NN state > --- > > Key: HDFS-2677 > URL: https://issues.apache.org/jira/browse/HDFS-2677 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2677.txt, hdfs-2677.txt, hdfs-2677.txt, > hdfs-2677.txt > > > The DFS web UI should indicate whether it's an active or standby. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods
[ https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2701: -- Attachment: hdfs-2701.txt Minor update. > Cleanup FS* processIOError methods > -- > > Key: HDFS-2701 > URL: https://issues.apache.org/jira/browse/HDFS-2701 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2701.txt, hdfs-2701.txt, hdfs-2701.txt > > > Let's rename the various "processIOError" methods to be more descriptive. The > current code makes it difficult to identify and reason about bug fixes. While > we're at it let's remove "Fatal" from the "Unable to sync the edit log" log > since it's not actually a fatal error (this is confusing to users). And 2NN > "Checkpoint done" should be info, not a warning (also confusing to users). > Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2702) A single failed name dir can cause the NN to exit
[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171685#comment-13171685 ] Eli Collins commented on HDFS-2702: --- And verified the NN still exits after all storage directories have failed. > A single failed name dir can cause the NN to exit > -- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Critical > Attachments: hdfs-2702.txt > > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit
[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2702: -- Attachment: hdfs-2702.txt Patch attached. Depends on HDFS-2701 and HDFS-2703. I verified the NN no longer exits after checkpointing when a single dir failed. I'm working on a unit test for this jira and 2703 but thought I'd put up a fix for review in the meantime. > A single failed name dir can cause the NN to exit > -- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Critical > Attachments: hdfs-2702.txt > > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir
[ https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171679#comment-13171679 ] Eli Collins commented on HDFS-2703: --- This applies atop HDFS-2701. > removedStorageDirs is not updated everywhere we remove a storage dir > > > Key: HDFS-2703 > URL: https://issues.apache.org/jira/browse/HDFS-2703 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2703.txt > > > There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) > where we remove a storage directory but don't add it to the > removedStorageDirs list. This means a storage dir may have been removed but > we don't see it in the log or Web UI. This doesn't affect trunk/23 since the > code there is totally different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2702) A single failed name dir can cause the NN to exit
[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171678#comment-13171678 ] Eli Collins commented on HDFS-2702: --- This bug exists in FSEditLog#close as well. > A single failed name dir can cause the NN to exit > -- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Critical > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir
[ https://issues.apache.org/jira/browse/HDFS-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2703: -- Attachment: hdfs-2703.txt Patch attached. Verified that when the 2NN triggers a roll and a storage directory fails a warning now shows up in the log and the Web UI lists this storage directory as failed. > removedStorageDirs is not updated everywhere we remove a storage dir > > > Key: HDFS-2703 > URL: https://issues.apache.org/jira/browse/HDFS-2703 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2703.txt > > > There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) > where we remove a storage directory but don't add it to the > removedStorageDirs list. This means a storage dir may have been removed but > we don't see it in the log or Web UI. This doesn't affect trunk/23 since the > code there is totally different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods
[ https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2701: -- Attachment: hdfs-2701.txt test-patch results. Just cleanup so existing tests suffice. Will do a full test run before committing. {noformat} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] {noformat} > Cleanup FS* processIOError methods > -- > > Key: HDFS-2701 > URL: https://issues.apache.org/jira/browse/HDFS-2701 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2701.txt, hdfs-2701.txt > > > Let's rename the various "processIOError" methods to be more descriptive. The > current code makes it difficult to identify and reason about bug fixes. While > we're at it let's remove "Fatal" from the "Unable to sync the edit log" log > since it's not actually a fatal error (this is confusing to users). And 2NN > "Checkpoint done" should be info, not a warning (also confusing to users). > Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2704) NameNodeResouceChecker#checkAvailableResources should check for inodes
NameNodeResouceChecker#checkAvailableResources should check for inodes -- Key: HDFS-2704 URL: https://issues.apache.org/jira/browse/HDFS-2704 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.24.0 Reporter: Eli Collins NameNodeResouceChecker#checkAvailableResources currently just checks for free space. However inodes are also a file system resource that needs to be available (you can run out of inodes but have free space). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171660#comment-13171660 ] Allen Wittenauer commented on HDFS-2699: Would it make sense to make the equivalent of a logging device instead? In other words, put the meta files on a dedicated fast/small disk to segregate them from the actual data blocks? Besides just being able to pick a better storage medium, it might potentially allow for better caching strategies at the OS level (depending upon the OS of course). > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit
[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2702: -- Target Version/s: 1.1.0 Affects Version/s: (was: 0.20.205.0) 1.1.0 Fix Version/s: (was: 1.1.0) > A single failed name dir can cause the NN to exit > -- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Critical > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods
[ https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2701: -- Target Version/s: 1.1.0 Affects Version/s: (was: 0.20.205.0) 1.0.0 Fix Version/s: (was: 1.1.0) > Cleanup FS* processIOError methods > -- > > Key: HDFS-2701 > URL: https://issues.apache.org/jira/browse/HDFS-2701 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-2701.txt > > > Let's rename the various "processIOError" methods to be more descriptive. The > current code makes it difficult to identify and reason about bug fixes. While > we're at it let's remove "Fatal" from the "Unable to sync the edit log" log > since it's not actually a fatal error (this is confusing to users). And 2NN > "Checkpoint done" should be info, not a warning (also confusing to users). > Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2703) removedStorageDirs is not updated everywhere we remove a storage dir
removedStorageDirs is not updated everywhere we remove a storage dir Key: HDFS-2703 URL: https://issues.apache.org/jira/browse/HDFS-2703 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.0.0 Reporter: Eli Collins Assignee: Eli Collins There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) where we remove a storage directory but don't add it to the removedStorageDirs list. This means a storage dir may have been removed but we don't see it in the log or Web UI. This doesn't affect trunk/23 since the code there is totally different. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2702) A single failed name dir can cause the NN to exit
[ https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2702: -- Affects Version/s: (was: 1.1.0) 1.0.0 > A single failed name dir can cause the NN to exit > -- > > Key: HDFS-2702 > URL: https://issues.apache.org/jira/browse/HDFS-2702 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Eli Collins >Assignee: Eli Collins >Priority: Critical > > There's a bug in FSEditLog#rollEditLog which results in the NN process > exiting if a single name dir has failed. Here's the relevant code: > {code} > close() // So editStreams.size() is 0 > foreach edits dir { > .. > eStream = new ... // Might get an IOE here > editStreams.add(eStream); > } catch (IOException ioe) { > removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 > } > {code} > If we get an IOException before we've added two edits streams to the list > we'll exit, eg if there's an error processing the 1st name dir we'll exit > even if there are 4 valid name dirs. The fix is to move the checking out of > removeEditsForStorageDir (nee processIOError) or modify it so it can be > disabled in some cases, eg here where we don't yet know how many streams are > valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2702) A single failed name dir can cause the NN to exit
A single failed name dir can cause the NN to exit -- Key: HDFS-2702 URL: https://issues.apache.org/jira/browse/HDFS-2702 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Critical Fix For: 1.1.0 There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here's the relevant code: {code} close() // So editStreams.size() is 0 foreach edits dir { .. eStream = new ... // Might get an IOE here editStreams.add(eStream); } catch (IOException ioe) { removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1 } {code} If we get an IOException before we've added two edits streams to the list we'll exit, eg if there's an error processing the 1st name dir we'll exit even if there are 4 valid name dirs. The fix is to move the checking out of removeEditsForStorageDir (nee processIOError) or modify it so it can be disabled in some cases, eg here where we don't yet know how many streams are valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171657#comment-13171657 ] Andrew Purtell commented on HDFS-2699: -- @Dhruba, yes I agree with you fully. From the HBase point of view optimizing IOPS in HDFS is very important. > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2701) Cleanup FS* processIOError methods
[ https://issues.apache.org/jira/browse/HDFS-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2701: -- Attachment: hdfs-2701.txt Patch attached. Running test-patch now. > Cleanup FS* processIOError methods > -- > > Key: HDFS-2701 > URL: https://issues.apache.org/jira/browse/HDFS-2701 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.205.0 >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 1.1.0 > > Attachments: hdfs-2701.txt > > > Let's rename the various "processIOError" methods to be more descriptive. The > current code makes it difficult to identify and reason about bug fixes. While > we're at it let's remove "Fatal" from the "Unable to sync the edit log" log > since it's not actually a fatal error (this is confusing to users). And 2NN > "Checkpoint done" should be info, not a warning (also confusing to users). > Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171653#comment-13171653 ] dhruba borthakur commented on HDFS-2699: Hi andrew, all of the points you mentioned are valid points that could decrease the amount of iops needed for a particular workload. But my point is that if we keep the other pieces constant (amount of ram, amount of flash, etc), then what can we do to reduce iops for the same workload. If the machine has more RAM memory, I would rather give all of it to the hbase block cache, because accesess from the hbase block cache is more optimal that accessing the file system cache. The hbase block cache can do better caching policies (because it is closer to the application) than the OS file cache (I am making the same arguments why databases typically do unbuffered io from the filesytem). Most disks are getting larger and larger in size (4TB disks coming next year), but the iops per spindle has not changed much. Given that, an efficient storage system should strive to optimize on iops, is it not? > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2701) Cleanup FS* processIOError methods
Cleanup FS* processIOError methods -- Key: HDFS-2701 URL: https://issues.apache.org/jira/browse/HDFS-2701 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.1.0 Let's rename the various "processIOError" methods to be more descriptive. The current code makes it difficult to identify and reason about bug fixes. While we're at it let's remove "Fatal" from the "Unable to sync the edit log" log since it's not actually a fatal error (this is confusing to users). And 2NN "Checkpoint done" should be info, not a warning (also confusing to users). Thanks to HDFS-1073 these issues don't exist on trunk or 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171643#comment-13171643 ] Andrew Purtell commented on HDFS-2699: -- IMHO, this is a design evolution question for HDFS. Is pread a first class use case? How many clients beyond HBase? If so, I think it makes sense to consider changes to DN storage that reduce IOPS. If not and/or if changes to DN storage are too radical by consensus, then a means to optionally fadvise away data file pages seems worthwhile to try. There are other considerations that suggest deployments should use a reasonable amount of RAM, this will be available in part for OS blockcache. There are other various alternatives: application level checksums, mixed device deployment (flash + disk), etc. Given the above two options, it may be a distraction to consider more options unless there is a compelling reason. (For example, optimizing IOPS for disk provides the same benefit for flash devices.) > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171638#comment-13171638 ] Uma Maheswara Rao G commented on HDFS-2700: --- Test failures are unrelated to this patch. Already raised an issue for that failures HDFS-2657. Findbugs and javadoc comments also unrelated. > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171635#comment-13171635 ] Hadoop QA commented on HDFS-2700: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12507785/HDFS-2700.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 90 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.http.server.TestHttpFSServer org.apache.hadoop.lib.servlet.TestServerWebApp +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1722//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1722//console This message is automatically generated. > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
[ https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171622#comment-13171622 ] Aaron T. Myers commented on HDFS-554: - The patch looks good to me. +1 pending clean Jenkins results. > BlockInfo.ensureCapacity may get a speedup from System.arraycopy() > -- > > Key: HDFS-554 > URL: https://issues.apache.org/jira/browse/HDFS-554 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Harsh J >Priority: Minor > Fix For: 0.24.0 > > Attachments: HDFS-554.patch, HDFS-554.txt > > > BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into > the expanded array. {{System.arraycopy()}} is generally much faster for > this, as it can do a bulk memory copy. There is also the typesafe Java6 > {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2700: -- Issue Type: Bug (was: Test) > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2700: -- Assignee: Uma Maheswara Rao G Status: Patch Available (was: Open) > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2700: -- Attachment: HDFS-2700.patch > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171616#comment-13171616 ] Uma Maheswara Rao G commented on HDFS-2700: --- Here the problem is DatanodeProtocolClientSideTranslatorPB is not proxy instance directly. This is wrapper around the proxy instance of DatanodeProtocolPB. So, shutting down the previos cluster, the clinets are not getting cleared and they will be cached. When new cluster starts it may get old invalid clients and getting EOFEceptions. So, we should just call close of DatanodeProtocolClientSideTranslatorPB. That will call the RPC.stopProxy by passing the real proxy instance (DatanodeProtocolPB). Attached the patch by closing the DatanodeProtocolClientSideTranslatorPB. Thanks Uma > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Uma Maheswara Rao G > Attachments: HDFS-2700.patch > > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-554) BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
[ https://issues.apache.org/jira/browse/HDFS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-554: - Attachment: HDFS-554.txt Thanks for that catch Todd, you're right :) > BlockInfo.ensureCapacity may get a speedup from System.arraycopy() > -- > > Key: HDFS-554 > URL: https://issues.apache.org/jira/browse/HDFS-554 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.21.0 >Reporter: Steve Loughran >Assignee: Harsh J >Priority: Minor > Fix For: 0.24.0 > > Attachments: HDFS-554.patch, HDFS-554.txt > > > BlockInfo.ensureCapacity() uses a for() loop to copy the old array data into > the expanded array. {{System.arraycopy()}} is generally much faster for > this, as it can do a bulk memory copy. There is also the typesafe Java6 > {{Arrays.copyOf()}} to consider, though here it offers no tangible benefit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171607#comment-13171607 ] Uma Maheswara Rao G commented on HDFS-2700: --- Some more information: Looks nor able clean the some of the proxy instances BP-239265342-67.195.138.31-1324137379885 (storage id DS-47228547-67.195.138.31-49285-1324137385405) registered with localhost/127.0.0.1:9930 2011-12-17 15:56:26,248 ERROR ipc.RPC (RPC.java:stopProxy(559)) - Tried to call RPC.stopProxy on an object that is not a proxy. java.lang.IllegalArgumentException: not a proxy instance at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637) at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:557) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.cleanUp(BPOfferService.java:450) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:639) at java.lang.Thread.run(Thread.java:662) 2011-12-17 15:56:26,248 ERROR ipc.RPC (RPC.java:stopProxy(559)) - Tried to call RPC.stopProxy on an object that is not a proxy. java.lang.IllegalArgumentException: not a proxy instance at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637) at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:557) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.cleanUp(BPOfferService.java:450) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.run(BPOfferService.java:639) at java.lang.Thread.run(Thread.java:662) > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Uma Maheswara Rao G > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171604#comment-13171604 ] dhruba borthakur commented on HDFS-2699: Another option that I am going to try is to fadvise away pages from data files (because those are anyways cached in the hbase cache) so that more file system cache is available to cache data from checksum files. Do people think this is a good idea? > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171601#comment-13171601 ] dhruba borthakur commented on HDFS-2699: There are various alternatie like you proposed. The advantage of an application level checksum (at the hbase block level) sounds easy to do. The disadvantage is that hdfs still have to do generate/store checksums to periodically validate data that is not accessed for a long time. > by supporting two block format simultaneously at the expense of code > complexity are u saying that the same data is stored in two places? One is the current format and another is the format with inline checksums? > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171599#comment-13171599 ] Uma Maheswara Rao G commented on HDFS-2553: --- {quote} -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations {quote} Test failure is unrelated to this patch. Filed separate JIRA for it HDFS-2700 > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171598#comment-13171598 ] Hadoop QA commented on HDFS-2553: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12507782/HDFS-2553.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 90 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1721//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1721//console This message is automatically generated. > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
TestDataNodeMultipleRegistrations is failing in trunk - Key: HDFS-2700 URL: https://issues.apache.org/jira/browse/HDFS-2700 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G TestDataNodeMultipleRegistrations is failing from last couple of builds https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2700) TestDataNodeMultipleRegistrations is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171596#comment-13171596 ] Uma Maheswara Rao G commented on HDFS-2700: --- more info: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "asf001.sp2.ygridcore.net/67.195.138.31"; destination host is: ""localhost":9929; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1140) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:169) at $Proxy14.getDatanodeReport(Unknown Source) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:127) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:81) at $Proxy14.getDatanodeReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:555) at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:1443) at org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:1486) at org.apache.hadoop.hdfs.MiniDFSCluster.addNameNode(MiniDFSCluster.java:1904) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations.testMiniDFSClusterWithMultipleNN(TestDataNodeMultipleRegistrations.java:237) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > TestDataNodeMultipleRegistrations is failing in trunk > - > > Key: HDFS-2700 > URL: https://issues.apache.org/jira/browse/HDFS-2700 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Uma Maheswara Rao G > > TestDataNodeMultipleRegistrations is failing from last couple of builds > https://builds.apache.org/job/PreCommit-HDFS-Build/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2553: -- Attachment: HDFS-2553.patch Attached the same patch again to trigger the Jenkins > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: CPUUsage.jpg, HDFS-2553.patch, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2553) BlockPoolSliceScanner spinning in loop
[ https://issues.apache.org/jira/browse/HDFS-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2553: -- Attachment: (was: HDFS-2553.patch) > BlockPoolSliceScanner spinning in loop > -- > > Key: HDFS-2553 > URL: https://issues.apache.org/jira/browse/HDFS-2553 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon >Assignee: Uma Maheswara Rao G >Priority: Critical > Attachments: CPUUsage.jpg, HDFS-2553.patch > > > Playing with trunk, I managed to get a DataNode in a situation where the > BlockPoolSliceScanner is spinning in the following loop, using 100% CPU: > at > org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.isAlive(DataNode.java:820) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.isBPServiceAlive(DataNode.java:2962) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:625) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:614) > at > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:95) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171586#comment-13171586 ] Hadoop QA commented on HDFS-2486: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12507779/HDFS-2486.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 90 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1720//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1720//console This message is automatically generated. > Review issues with UnderReplicatedBlocks > > > Key: HDFS-2486 > URL: https://issues.apache.org/jira/browse/HDFS-2486 > Project: Hadoop HDFS > Issue Type: Task > Components: name-node >Affects Versions: 0.23.0 >Reporter: Steve Loughran >Assignee: Uma Maheswara Rao G >Priority: Minor > Attachments: HDFS-2486.patch > > > Here are some things I've noted in the UnderReplicatedBlocks class that > someone else should review and consider if the code is correct. If not, they > are easy to fix. > remove(Block block, int priLevel) is not synchronized, and as the inner > classes are not, there is a risk of race conditions there. > some of the code assumes that getPriority can return the value LEVEL, and if > so does not attempt to queue the blocks. As this return value is not > currently possible, those checks can be removed. > The queue gives priority to blocks whose replication count is less than a > third of its expected count over those that are "normally under replicated". > While this is good for ensuring that files scheduled for large replication > are replicated fast, it may not be the best strategy for maintaining data > integrity. For that it may be better to give whichever blocks have only two > replicas priority over blocks that may, for example, already have 3 out of 10 > copies in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2362) More Improvements on NameNode Scalability
[ https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171582#comment-13171582 ] Uma Maheswara Rao G commented on HDFS-2362: --- Can some please merge this improvements to 0.23 versions as well. Because this introduced a good amount of delta between trunk and 0.23 version. So, we are not able to do direct merges of some other improvements like HDFS-1765. > More Improvements on NameNode Scalability > - > > Key: HDFS-2362 > URL: https://issues.apache.org/jira/browse/HDFS-2362 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Hairong Kuang > > This jira acts as an umbrella jira to track all the improvements we've done > recently to improve Namenode's performance, responsiveness, and hence > scalability. Those improvements include: > 1. Incremental block reports (HDFS-395) > 2. BlockManager.reportDiff optimization for processing block reports > (HDFS-2477) > 3. Upgradable lock to allow simutaleous read operation while reportDiff is in > progress in processing block reports (HDFS-2490) > 4. More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks (HDFS-2476) > 5. Increase granularity of write operations in ReplicationMonitor thus > reducing contention for write lock (HDFS-2495) > 6. Support variable block sizes > 7. Release RPC handlers while waiting for edit log is synced to disk > 8. Reduce network traffic pressure to the master rack where NN is located by > lowering read priority of the replicas on the rack > 9. A standalone KeepAlive heartbeat thread > 10. Reduce Multiple traversals of path directory to one for most namespace > manipulations > 11. Move logging out of write lock section. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171577#comment-13171577 ] Uma Maheswara Rao G commented on HDFS-2486: --- This patch addresses the second comment. After concluding the last comment, that can be handled as separate JIRA(subtask), if required and also depending on code change. > Review issues with UnderReplicatedBlocks > > > Key: HDFS-2486 > URL: https://issues.apache.org/jira/browse/HDFS-2486 > Project: Hadoop HDFS > Issue Type: Task > Components: name-node >Affects Versions: 0.23.0 >Reporter: Steve Loughran >Assignee: Uma Maheswara Rao G >Priority: Minor > Attachments: HDFS-2486.patch > > > Here are some things I've noted in the UnderReplicatedBlocks class that > someone else should review and consider if the code is correct. If not, they > are easy to fix. > remove(Block block, int priLevel) is not synchronized, and as the inner > classes are not, there is a risk of race conditions there. > some of the code assumes that getPriority can return the value LEVEL, and if > so does not attempt to queue the blocks. As this return value is not > currently possible, those checks can be removed. > The queue gives priority to blocks whose replication count is less than a > third of its expected count over those that are "normally under replicated". > While this is good for ensuring that files scheduled for large replication > are replicated fast, it may not be the best strategy for maintaining data > integrity. For that it may be better to give whichever blocks have only two > replicas priority over blocks that may, for example, already have 3 out of 10 > copies in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2486: -- Assignee: Uma Maheswara Rao G Target Version/s: 0.24.0 Status: Patch Available (was: Open) removed the unnecessary priLevel != LEVEL checks from UnderReplicated class as getPriority API will never return LEVEL (5) . Patch mainly tested with: TestReplication: this can verify the update api TestReplicationPolicy: this will verify add functionality. Normal other tests not impacted due to this change. > Review issues with UnderReplicatedBlocks > > > Key: HDFS-2486 > URL: https://issues.apache.org/jira/browse/HDFS-2486 > Project: Hadoop HDFS > Issue Type: Task > Components: name-node >Affects Versions: 0.23.0 >Reporter: Steve Loughran >Assignee: Uma Maheswara Rao G >Priority: Minor > Attachments: HDFS-2486.patch > > > Here are some things I've noted in the UnderReplicatedBlocks class that > someone else should review and consider if the code is correct. If not, they > are easy to fix. > remove(Block block, int priLevel) is not synchronized, and as the inner > classes are not, there is a risk of race conditions there. > some of the code assumes that getPriority can return the value LEVEL, and if > so does not attempt to queue the blocks. As this return value is not > currently possible, those checks can be removed. > The queue gives priority to blocks whose replication count is less than a > third of its expected count over those that are "normally under replicated". > While this is good for ensuring that files scheduled for large replication > are replicated fast, it may not be the best strategy for maintaining data > integrity. For that it may be better to give whichever blocks have only two > replicas priority over blocks that may, for example, already have 3 out of 10 > copies in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2486: -- Attachment: HDFS-2486.patch > Review issues with UnderReplicatedBlocks > > > Key: HDFS-2486 > URL: https://issues.apache.org/jira/browse/HDFS-2486 > Project: Hadoop HDFS > Issue Type: Task > Components: name-node >Affects Versions: 0.23.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: HDFS-2486.patch > > > Here are some things I've noted in the UnderReplicatedBlocks class that > someone else should review and consider if the code is correct. If not, they > are easy to fix. > remove(Block block, int priLevel) is not synchronized, and as the inner > classes are not, there is a risk of race conditions there. > some of the code assumes that getPriority can return the value LEVEL, and if > so does not attempt to queue the blocks. As this return value is not > currently possible, those checks can be removed. > The queue gives priority to blocks whose replication count is less than a > third of its expected count over those that are "normally under replicated". > While this is good for ensuring that files scheduled for large replication > are replicated fast, it may not be the best strategy for maintaining data > integrity. For that it may be better to give whichever blocks have only two > replicas priority over blocks that may, for example, already have 3 out of 10 > copies in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2640) Javadoc generation hangs
[ https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171569#comment-13171569 ] Hudson commented on HDFS-2640: -- Integrated in Hadoop-Mapreduce-trunk #930 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/]) HDFS-2640. Javadoc generation hangs. tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215354 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml > Javadoc generation hangs > > > Key: HDFS-2640 > URL: https://issues.apache.org/jira/browse/HDFS-2640 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tom White >Assignee: Tom White > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2640.patch > > > Typing 'mvn javadoc:javadoc' causes the build to hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2694) Removal of Avro broke non-PB NN services
[ https://issues.apache.org/jira/browse/HDFS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171565#comment-13171565 ] Hudson commented on HDFS-2694: -- Integrated in Hadoop-Mapreduce-trunk #930 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/]) HDFS-2694. Removal of Avro broke non-PB NN services. Contributed by Aaron T. Myers. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215364 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Removal of Avro broke non-PB NN services > > > Key: HDFS-2694 > URL: https://issues.apache.org/jira/browse/HDFS-2694 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2694.patch, HDFS-2694.txt, HDFS-2694.txt > > > RpcEngine implementations have to register themselves associated with an > RpcKind. Both WritableRpcEngine and ProtobufRpcEngine do this registration in > static initialization blocks. It used to be that the static initializer block > for WritableRpcEngine was triggered when AvroRpcEngine was initialized, since > this instantiated a WritableRpcEngine object. With AvroRpcEngine gone, > there's nothing in the NN to trigger the WritableRpcEngine static > initialization block. Therefore, those RPC services which still use Writable > and not PB no longer work. > For example, if I try to run `hdfs groups' on trunk, which uses the > GetUserMappingsProtocol, this error gets spit out: > {noformat} > $ hdfs groups > log4j:ERROR Could not find value for key log4j.appender.NullAppender > log4j:ERROR Could not instantiate appender named "NullAppender". > Exception in thread "main" java.io.IOException: Unknown rpc kind RPC_WRITABLE > at org.apache.hadoop.ipc.Client.call(Client.java:1136) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:205) > at $Proxy6.getGroupsForUser(Unknown Source) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:56) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2687) Tests are failing with ClassCastException, due to new protocol changes
[ https://issues.apache.org/jira/browse/HDFS-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171566#comment-13171566 ] Hudson commented on HDFS-2687: -- Integrated in Hadoop-Mapreduce-trunk #930 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/930/]) HDFS-2687. Tests failing with ClassCastException post protobuf RPC changes. Contributed by Suresh Srinivas. suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215366 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java > Tests are failing with ClassCastException, due to new protocol changes > --- > > Key: HDFS-2687 > URL: https://issues.apache.org/jira/browse/HDFS-2687 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Suresh Srinivas > Attachments: HDFS-2687.txt > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/lastCompletedBuild/testReport/ > java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.HdfsFileStatus > cannot be cast to org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.hasNext(DistributedFileSystem.java:452) > at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:1551) > at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1581) > at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1541) > at > org.apache.hadoop.fs.TestListFiles.testDirectory(TestListFiles.java:146) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2640) Javadoc generation hangs
[ https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171557#comment-13171557 ] Hudson commented on HDFS-2640: -- Integrated in Hadoop-Mapreduce-0.23-Build #130 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/130/]) Merge -r 1215353:1215354 from trunk to branch-0.23. Fixes: HDFS-2640. tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215355 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/pom.xml > Javadoc generation hangs > > > Key: HDFS-2640 > URL: https://issues.apache.org/jira/browse/HDFS-2640 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tom White >Assignee: Tom White > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2640.patch > > > Typing 'mvn javadoc:javadoc' causes the build to hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2640) Javadoc generation hangs
[ https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171550#comment-13171550 ] Hudson commented on HDFS-2640: -- Integrated in Hadoop-Hdfs-0.23-Build #110 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/110/]) Merge -r 1215353:1215354 from trunk to branch-0.23. Fixes: HDFS-2640. tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215355 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/pom.xml > Javadoc generation hangs > > > Key: HDFS-2640 > URL: https://issues.apache.org/jira/browse/HDFS-2640 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tom White >Assignee: Tom White > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2640.patch > > > Typing 'mvn javadoc:javadoc' causes the build to hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2684) Fix up some failing unit tests on HA branch
[ https://issues.apache.org/jira/browse/HDFS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171547#comment-13171547 ] Hudson commented on HDFS-2684: -- Integrated in Hadoop-Hdfs-HAbranch-build #19 (See [https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/19/]) HDFS-2684. Fix up some failing unit tests on HA branch. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215241 Files : * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-1623.txt * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestHeartbeatHandling.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBackupNode.java * /hadoop/common/branches/HDFS-1623/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java > Fix up some failing unit tests on HA branch > --- > > Key: HDFS-2684 > URL: https://issues.apache.org/jira/browse/HDFS-2684 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, test >Affects Versions: HA branch (HDFS-1623) >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2684.txt, hdfs-2684.txt, hdfs-2684.txt > > > To keep moving quickly on the HA branch, we've committed some stuff even > though some unit tests are failing. This JIRA is to take a pass through the > failing unit tests and get back to green (or close to it). If anything turns > out to be a major amount of work I'll file separate JIRAs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2640) Javadoc generation hangs
[ https://issues.apache.org/jira/browse/HDFS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171545#comment-13171545 ] Hudson commented on HDFS-2640: -- Integrated in Hadoop-Hdfs-trunk #897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/]) HDFS-2640. Javadoc generation hangs. tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215354 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml > Javadoc generation hangs > > > Key: HDFS-2640 > URL: https://issues.apache.org/jira/browse/HDFS-2640 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Tom White >Assignee: Tom White > Fix For: 0.24.0, 0.23.1 > > Attachments: HDFS-2640.patch > > > Typing 'mvn javadoc:javadoc' causes the build to hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2687) Tests are failing with ClassCastException, due to new protocol changes
[ https://issues.apache.org/jira/browse/HDFS-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171542#comment-13171542 ] Hudson commented on HDFS-2687: -- Integrated in Hadoop-Hdfs-trunk #897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/]) HDFS-2687. Tests failing with ClassCastException post protobuf RPC changes. Contributed by Suresh Srinivas. suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215366 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java > Tests are failing with ClassCastException, due to new protocol changes > --- > > Key: HDFS-2687 > URL: https://issues.apache.org/jira/browse/HDFS-2687 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Uma Maheswara Rao G >Assignee: Suresh Srinivas > Attachments: HDFS-2687.txt > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/lastCompletedBuild/testReport/ > java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.HdfsFileStatus > cannot be cast to org.apache.hadoop.hdfs.protocol.HdfsLocatedFileStatus > at > org.apache.hadoop.hdfs.DistributedFileSystem$1.hasNext(DistributedFileSystem.java:452) > at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:1551) > at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1581) > at org.apache.hadoop.fs.FileSystem$5.next(FileSystem.java:1541) > at > org.apache.hadoop.fs.TestListFiles.testDirectory(TestListFiles.java:146) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2694) Removal of Avro broke non-PB NN services
[ https://issues.apache.org/jira/browse/HDFS-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171541#comment-13171541 ] Hudson commented on HDFS-2694: -- Integrated in Hadoop-Hdfs-trunk #897 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/897/]) HDFS-2694. Removal of Avro broke non-PB NN services. Contributed by Aaron T. Myers. atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1215364 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java > Removal of Avro broke non-PB NN services > > > Key: HDFS-2694 > URL: https://issues.apache.org/jira/browse/HDFS-2694 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.24.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2694.patch, HDFS-2694.txt, HDFS-2694.txt > > > RpcEngine implementations have to register themselves associated with an > RpcKind. Both WritableRpcEngine and ProtobufRpcEngine do this registration in > static initialization blocks. It used to be that the static initializer block > for WritableRpcEngine was triggered when AvroRpcEngine was initialized, since > this instantiated a WritableRpcEngine object. With AvroRpcEngine gone, > there's nothing in the NN to trigger the WritableRpcEngine static > initialization block. Therefore, those RPC services which still use Writable > and not PB no longer work. > For example, if I try to run `hdfs groups' on trunk, which uses the > GetUserMappingsProtocol, this error gets spit out: > {noformat} > $ hdfs groups > log4j:ERROR Could not find value for key log4j.appender.NullAppender > log4j:ERROR Could not instantiate appender named "NullAppender". > Exception in thread "main" java.io.IOException: Unknown rpc kind RPC_WRITABLE > at org.apache.hadoop.ipc.Client.call(Client.java:1136) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:205) > at $Proxy6.getGroupsForUser(Unknown Source) > at org.apache.hadoop.tools.GetGroupsBase.run(GetGroupsBase.java:71) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83) > at org.apache.hadoop.hdfs.tools.GetGroups.main(GetGroups.java:56) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171539#comment-13171539 ] Luke Lu commented on HDFS-2699: --- bq. You'd need HFile v3 for this Or come up new compression codecs for compression codecs (including none) that don't have checksums. > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171531#comment-13171531 ] Luke Lu commented on HDFS-2699: --- bq. The number of random reads issued by HBase is almost twice the iops shown via iostat. Each hbase random io translates to a position read (pread) to HDFS. As I mentioned in our last conversation, you can embed an application level checksum in HBase block (a la Hypertable) and turn off verifyChecksum in preads. You'd need HFile v3 for this, of course :) bq. Any thoughts on how we can put data and checksums together on the same block file? As discussed in HADOOP-1134, inline checksums not only makes the code more complex, but also makes inplace upgrade a lot more expensive (you have to copy the content). We can solve the latter by supporting two block format simultaneously at the expense of code complexity. > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2699) Store data and checksums together in block file
[ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171493#comment-13171493 ] dhruba borthakur commented on HDFS-2699: The number of random reads issued by HBase is almost twice the iops shown via iostat. Each hbase random io translates to a position read (pread) to HDFS. In my workload, hbase is issuing 300 pread/sec. The iostat on the machine shows 600 reads/sec. I switched off "verifyChecksum" in the pread calls, and that reduces the iops (via iostats) to about 350/sec, thus validating the claim that storing data and checksum in two different files is very costly for an iops bound workload. Any thoughts on how we can put data and checksums together on the same block file? > Store data and checksums together in block file > --- > > Key: HDFS-2699 > URL: https://issues.apache.org/jira/browse/HDFS-2699 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The current implementation of HDFS stores the data in one block file and the > metadata(checksum) in another block file. This means that every read from > HDFS actually consumes two disk iops, one to the datafile and one to the > checksum file. This is a major problem for scaling HBase, because HBase is > usually bottlenecked on the number of random disk iops that the > storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2699) Store data and checksums together in block file
Store data and checksums together in block file --- Key: HDFS-2699 URL: https://issues.apache.org/jira/browse/HDFS-2699 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur The current implementation of HDFS stores the data in one block file and the metadata(checksum) in another block file. This means that every read from HDFS actually consumes two disk iops, one to the datafile and one to the checksum file. This is a major problem for scaling HBase, because HBase is usually bottlenecked on the number of random disk iops that the storage-hardware offers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira