[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047039#comment-13047039 ] Nigel Daley commented on HDFS-941: -- +1 for 0.22. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"
[ https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047016#comment-13047016 ] Konstantin Shvachko commented on HDFS-1409: --- +1 Looks good > The "register" method of the BackupNode class should be > "UnsupportedActionException("register")" > > > Key: HDFS-1409 > URL: https://issues.apache.org/jira/browse/HDFS-1409 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Ching-Shen Chen >Priority: Trivial > Fix For: 0.21.1 > > Attachments: HDFS-1409.patch, HDFS-1409.patch > > > The register method of the BackupNode class should be > "UnsupportedActionException("register")" rather than > "UnsupportedActionException("journal")". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
[ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046987#comment-13046987 ] Hari A V commented on HDFS-1973: Hi Aron, Thanks for the answer. I will watch these issues to get more information :-) -Hari > HA: HDFS clients must handle namenode failover and switch over to the new > active namenode. > -- > > Key: HDFS-1973 > URL: https://issues.apache.org/jira/browse/HDFS-1973 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Suresh Srinivas >Assignee: Aaron T. Myers > > During failover, a client must detect the current active namenode failure and > switch over to the new active namenode. The switch over might make use of IP > failover or some thing more elaborate such as zookeeper to discover the new > active. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2055) Add hflush support to libhdfs
[ https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046979#comment-13046979 ] Hadoop QA commented on HDFS-2055: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482015/HDFS-2055.patch against trunk revision 1134170. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/759//console This message is automatically generated. > Add hflush support to libhdfs > - > > Key: HDFS-2055 > URL: https://issues.apache.org/jira/browse/HDFS-2055 > Project: Hadoop HDFS > Issue Type: New Feature > Components: libhdfs >Reporter: Travis Crawford > Attachments: HDFS-2055.patch > > > libhdfs would be improved by adding support for hflush. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2055) Add hflush support to libhdfs
[ https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Crawford updated HDFS-2055: -- Release Note: Add hdfsHFlush to libhdfs. Status: Patch Available (was: Open) > Add hflush support to libhdfs > - > > Key: HDFS-2055 > URL: https://issues.apache.org/jira/browse/HDFS-2055 > Project: Hadoop HDFS > Issue Type: New Feature > Components: libhdfs >Reporter: Travis Crawford > Attachments: HDFS-2055.patch > > > libhdfs would be improved by adding support for hflush. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2055) Add hflush support to libhdfs
[ https://issues.apache.org/jira/browse/HDFS-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Crawford updated HDFS-2055: -- Attachment: HDFS-2055.patch Add {{hdfsHFlush}} to libhdfs. Its also viewable here which might be easier to read: https://github.com/traviscrawford/hadoop-hdfs/compare/apache:trunk...HDFS-2055_Add_hflush_support_to_libhdfs > Add hflush support to libhdfs > - > > Key: HDFS-2055 > URL: https://issues.apache.org/jira/browse/HDFS-2055 > Project: Hadoop HDFS > Issue Type: New Feature > Components: libhdfs >Reporter: Travis Crawford > Attachments: HDFS-2055.patch > > > libhdfs would be improved by adding support for hflush. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2056: --- Resolution: Fixed Status: Resolved (was: Patch Available) I have committed this. Thanks to Tanping! > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1295: - Attachment: HDFS-1295_for_ymerge_v2.patch Turns out HDFS-1295 is dependent on HDFS-900. Merged HDFS-900 to yahoo-merge, but now need a slightly modified port of HDFS-1295. Attached. > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, HDFS-1295_for_ymerge_v2.patch, > IBR_shortcut_v2a.patch, IBR_shortcut_v3atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v6atrunk.patch, > IBR_shortcut_v7atrunk.patch, shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046955#comment-13046955 ] Tanping Wang commented on HDFS-2056: # The two new Findbugs warning is not related to this patch. This is a simple usage update change. # not test included as this is a usage update change. # for the same reason, core tests failure is not related to this change. > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
[ https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-2057. --- Resolution: Fixed Hadoop Flags: [Reviewed] I committed the patch to 204, 205 and branch-0.20-security. Thank you Bharath. > Wait time to terminate the threads causing unit tests to take longer time > - > > Key: HDFS-2057 > URL: https://issues.apache.org/jira/browse/HDFS-2057 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20.204.0, 0.20.205.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi > Fix For: 0.20.205.0 > > Attachments: HDFS-2057-1.patch > > > As a part of datanode process hang, this part of code was introduced in > 0.20.204 to clean up all the waiting threads. > - try { > - readPool.awaitTermination(10, TimeUnit.SECONDS); > - } catch (InterruptedException e) { > - LOG.info("Exception occured in doStop:" + e.getMessage()); > - } > - readPool.shutdownNow(); > This was clearly meant for production, but all the unit tests uses > minidfscluster and minimrcluster for shutdown which waits on this part of the > code. Due to this, we saw increase in unit test run times. So removing this > code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2030: -- Resolution: Fixed Status: Resolved (was: Patch Available) I committed the patch. Thank you Bharath. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
[ https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046935#comment-13046935 ] Suresh Srinivas commented on HDFS-2057: --- This is reverting back to the previous code. +1 for the patch. > Wait time to terminate the threads causing unit tests to take longer time > - > > Key: HDFS-2057 > URL: https://issues.apache.org/jira/browse/HDFS-2057 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20.204.0, 0.20.205.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi > Fix For: 0.20.205.0 > > Attachments: HDFS-2057-1.patch > > > As a part of datanode process hang, this part of code was introduced in > 0.20.204 to clean up all the waiting threads. > - try { > - readPool.awaitTermination(10, TimeUnit.SECONDS); > - } catch (InterruptedException e) { > - LOG.info("Exception occured in doStop:" + e.getMessage()); > - } > - readPool.shutdownNow(); > This was clearly meant for production, but all the unit tests uses > minidfscluster and minimrcluster for shutdown which waits on this part of the > code. Due to this, we saw increase in unit test run times. So removing this > code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
[ https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2057: - Attachment: HDFS-2057-1.patch Attaching the patch. > Wait time to terminate the threads causing unit tests to take longer time > - > > Key: HDFS-2057 > URL: https://issues.apache.org/jira/browse/HDFS-2057 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.20.204.0, 0.20.205.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi > Fix For: 0.20.205.0 > > Attachments: HDFS-2057-1.patch > > > As a part of datanode process hang, this part of code was introduced in > 0.20.204 to clean up all the waiting threads. > - try { > - readPool.awaitTermination(10, TimeUnit.SECONDS); > - } catch (InterruptedException e) { > - LOG.info("Exception occured in doStop:" + e.getMessage()); > - } > - readPool.shutdownNow(); > This was clearly meant for production, but all the unit tests uses > minidfscluster and minimrcluster for shutdown which waits on this part of the > code. Due to this, we saw increase in unit test run times. So removing this > code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time
Wait time to terminate the threads causing unit tests to take longer time - Key: HDFS-2057 URL: https://issues.apache.org/jira/browse/HDFS-2057 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.20.204.0, 0.20.205.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.20.205.0 As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads. - try { - readPool.awaitTermination(10, TimeUnit.SECONDS); - } catch (InterruptedException e) { - LOG.info("Exception occured in doStop:" + e.getMessage()); - } - readPool.shutdownNow(); This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046929#comment-13046929 ] Suresh Srinivas commented on HDFS-2030: --- Findbugs warning and TestHDFSCLI is unrelated to this patch. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed
[ https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046931#comment-13046931 ] Hadoop QA commented on HDFS-2041: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481995/hdfs-2041.txt against trunk revision 1134124. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/758//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/758//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/758//console This message is automatically generated. > Some mtimes and atimes are lost when edit logs are replayed > --- > > Key: HDFS-2041 > URL: https://issues.apache.org/jira/browse/HDFS-2041 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: hdfs-2041.txt, hdfs-2041.txt > > > The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs: > - the atime field logged with OP_MKDIR is unused > - the timestamp field logged with OP_CONCAT_DELETE is unused > The concat issue is definitely real. The atime for MKDIR might always be > identical to mtime in that case, in which case it could be ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046925#comment-13046925 ] Hadoop QA commented on HDFS-2056: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481988/HDFS-2056.patch against trunk revision 1134031. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/757//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/757//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/757//console This message is automatically generated. > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046910#comment-13046910 ] Todd Lipcon commented on HDFS-2054: --- hrm... that's a pain. I guess our options are (a) parsing exception messages, or (b) passing the Socket object itself to BlockSender such that it can determine whether it's still open. Any other good ideas? > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046907#comment-13046907 ] Kihwal Lee commented on HDFS-2054: -- I tried SocketOutputStream.isOpen() in BlockSender.sendChunk(), but it seems even after EPIPE, isOpen() is not guaranteed to return false. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
[ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2003: -- Affects Version/s: 0.23.0 Fix Version/s: 0.23.0 Updating fix versions, this was done in trunk as well. > Separate FSEditLog reading logic from editLog memory state building logic > - > > Key: HDFS-2003 > URL: https://issues.apache.org/jira/browse/HDFS-2003 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: Edit log branch (HDFS-1073), 0.23.0 >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073), 0.23.0 > > Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt > > > Currently FSEditLogLoader has code for reading from an InputStream > interleaved with code which updates the FSNameSystem and FSDirectory. This > makes it difficult to read an edit log without having a whole load of other > object initialised, which is problematic if you want to do things like count > how many transactions are in a file etc. > This patch separates the reading of the stream and the building of the memory > state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed
[ https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2041: -- Status: Patch Available (was: Open) > Some mtimes and atimes are lost when edit logs are replayed > --- > > Key: HDFS-2041 > URL: https://issues.apache.org/jira/browse/HDFS-2041 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: hdfs-2041.txt, hdfs-2041.txt > > > The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs: > - the atime field logged with OP_MKDIR is unused > - the timestamp field logged with OP_CONCAT_DELETE is unused > The concat issue is definitely real. The atime for MKDIR might always be > identical to mtime in that case, in which case it could be ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046903#comment-13046903 ] Matt Foley commented on HDFS-1295: -- Response to test-patch: -1 core tests: TestHDFSCL failure is unrelated. -1 tests included: This is simply a completion of the previously approved patch. Committed HDFS-1295_delta_for_trunk.patch to trunk. > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, > IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, > shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed
[ https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2041: -- Attachment: hdfs-2041.txt Rebased on trunk > Some mtimes and atimes are lost when edit logs are replayed > --- > > Key: HDFS-2041 > URL: https://issues.apache.org/jira/browse/HDFS-2041 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.23.0 > > Attachments: hdfs-2041.txt, hdfs-2041.txt > > > The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs: > - the atime field logged with OP_MKDIR is unused > - the timestamp field logged with OP_CONCAT_DELETE is unused > The concat issue is definitely real. The atime for MKDIR might always be > identical to mtime in that case, in which case it could be ignored. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046896#comment-13046896 ] Hadoop QA commented on HDFS-2030: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481981/HDFS-2030-3.patch against trunk revision 1134031. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/756//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/756//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/756//console This message is automatically generated. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2027) 1073: Image inspector should return finalized logs before unfinalized logs
[ https://issues.apache.org/jira/browse/HDFS-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2027. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks Eli! > 1073: Image inspector should return finalized logs before unfinalized logs > -- > > Key: HDFS-2027 > URL: https://issues.apache.org/jira/browse/HDFS-2027 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2027.txt > > > Found this small bug while testing multiple NNs under failure conditions on > the 1073 branch. When the 2NN calls getEditLogManifest(), it expects a list > of finalized logs. In the case that one of the edit log directories had > failed and recovered, there would be some txid for which there was an > edit_N_inprogress and an edits_N-M (finalized). The edit log manifest needs > to see the finalized one when it exists. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2048) 1073: Improve upgrade tests from 0.22
[ https://issues.apache.org/jira/browse/HDFS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2048. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks for review, Eli! > 1073: Improve upgrade tests from 0.22 > - > > Key: HDFS-2048 > URL: https://issues.apache.org/jira/browse/HDFS-2048 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2048.txt > > > TestDFSUpgradeFromImage currently tests an upgrade from 0.22, but doesn't > test that the image checksum field is properly respected during the upgrade. > This JIRA is to improve those tests by also testing the case where the image > has been corrupted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2047) Improve TestNamespace and TestEditLog in 1073 branch
[ https://issues.apache.org/jira/browse/HDFS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2047. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks for review, Eli. > Improve TestNamespace and TestEditLog in 1073 branch > > > Key: HDFS-2047 > URL: https://issues.apache.org/jira/browse/HDFS-2047 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2047.txt > > > These tests currently have some test cases that don't make sense after > HDFS-1073. This JIRA is to update these tests to do the equivalent things on > 1073. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-2056: --- Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046873#comment-13046873 ] Jitendra Nath Pandey commented on HDFS-2056: +1. > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanping Wang updated HDFS-2056: --- Attachment: HDFS-2056.patch > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2056) Update fetchdt usage
[ https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanping Wang updated HDFS-2056: --- Component/s: tools documentation Affects Version/s: 0.23.0 Fix Version/s: 0.23.0 > Update fetchdt usage > > > Key: HDFS-2056 > URL: https://issues.apache.org/jira/browse/HDFS-2056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, tools >Affects Versions: 0.23.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2056.patch > > > Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2056) Update fetchdt usage
Update fetchdt usage Key: HDFS-2056 URL: https://issues.apache.org/jira/browse/HDFS-2056 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tanping Wang Assignee: Tanping Wang Priority: Minor Update the usage of fetchdt. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2027) 1073: Image inspector should return finalized logs before unfinalized logs
[ https://issues.apache.org/jira/browse/HDFS-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046848#comment-13046848 ] Eli Collins commented on HDFS-2027: --- +1 lgtm > 1073: Image inspector should return finalized logs before unfinalized logs > -- > > Key: HDFS-2027 > URL: https://issues.apache.org/jira/browse/HDFS-2027 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2027.txt > > > Found this small bug while testing multiple NNs under failure conditions on > the 1073 branch. When the 2NN calls getEditLogManifest(), it expects a list > of finalized logs. In the case that one of the edit log directories had > failed and recovered, there would be some txid for which there was an > edit_N_inprogress and an edits_N-M (finalized). The edit log manifest needs > to see the finalized one when it exists. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-2030: -- Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046826#comment-13046826 ] Suresh Srinivas commented on HDFS-2030: --- +1 for the change > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2048) 1073: Improve upgrade tests from 0.22
[ https://issues.apache.org/jira/browse/HDFS-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046819#comment-13046819 ] Eli Collins commented on HDFS-2048: --- +1 looks great > 1073: Improve upgrade tests from 0.22 > - > > Key: HDFS-2048 > URL: https://issues.apache.org/jira/browse/HDFS-2048 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2048.txt > > > TestDFSUpgradeFromImage currently tests an upgrade from 0.22, but doesn't > test that the image checksum field is properly respected during the upgrade. > This JIRA is to improve those tests by also testing the case where the image > has been corrupted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2030: - Attachment: HDFS-2030-3.patch Done some more minor cleanup related to comments and adding more description to test class. Please find the attached patch. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2047) Improve TestNamespace and TestEditLog in 1073 branch
[ https://issues.apache.org/jira/browse/HDFS-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046811#comment-13046811 ] Eli Collins commented on HDFS-2047: --- +1 looks great. The TODO in the WRITE_STORAGE_ONE case in TestSaveNamespace is out of scope for 1073, file a new jira (Save namespace should succeed as long as there's at least one valid storage dirdir)? Seems like we could/should fix that in parallel. > Improve TestNamespace and TestEditLog in 1073 branch > > > Key: HDFS-2047 > URL: https://issues.apache.org/jira/browse/HDFS-2047 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: hdfs-2047.txt > > > These tests currently have some test cases that don't make sense after > HDFS-1073. This JIRA is to update these tests to do the equivalent things on > 1073. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046808#comment-13046808 ] Kihwal Lee commented on HDFS-2054: -- >I'm not in favor of parsing the exception text for behavior-altering >things. But for deciding whether to log at debug vs warn level, it >seems OK to me. This sounds reasonable. >Another thought is to check something like socket.isInputShutdown() >or socket.isConnected()? Maybe we can assume that any case where we >get an IOE but the socket was then found to be disconnected is OK. >If we had a local IOE with the transferto, the socket would still be up. This is even better, IMO. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046804#comment-13046804 ] Todd Lipcon commented on HDFS-2054: --- Yea, it sucks that Java doesn't give us a way to get at the underlying errno in these cases. For the IOEs thrown by the hadoop-native code in common, we actually have an Errno enum that makes life easy. I'm not in favor of parsing the exception text for behavior-altering things. But for deciding whether to log at debug vs warn level, it seems OK to me. Another thought is to check something like socket.isInputShutdown() or socket.isConnected()? Maybe we can assume that any case where we get an IOE but the socket was then found to be disconnected is OK. If we had a local IOE with the transferto, the socket would still be up. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-2030: - Attachment: HDFS-2030-2.patch Attached the patch. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046801#comment-13046801 ] Kihwal Lee commented on HDFS-2054: -- Last time I tried what you said with EAGAIN in transferTo() in attempt to avoid doing epoll() evey time even before sending anything. Some folks were not thrilled about parsing the text. If it can be done in portable/i14n friendly way and people do not object the idea itself... > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command
[ https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046800#comment-13046800 ] Bharath Mundlapudi commented on HDFS-2030: -- Thanks for the review, Suresh. My comments inline. 1.1 Missing banner - done. 1.2 This method is package protected, this unit test just test this function instead of using time consuming MiniDFSCluster. 1.3 Removed the null and empty checks. 1.4 BoolpoolID is autogenerated. Now i have modified the tests to not mock this. 1.5 Added assertEquals where necessary 1.6 Made multiple tests 2.1 Since the setBlockPoolID() and setClusterID() are in NNStorage, i moved this function to this class now solves this problem. 2.2 renamed the function 2.3 comments moved outside the function and moved the if condition inside the method. Attaching the patch with these changes. > Fix the usability of namenode upgrade command > - > > Key: HDFS-2030 > URL: https://issues.apache.org/jira/browse/HDFS-2030 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Bharath Mundlapudi >Assignee: Bharath Mundlapudi >Priority: Minor > Fix For: 0.23.0 > > Attachments: HDFS-2030-1.patch > > > Fixing the Namenode upgrade option along the same line as Namenode format > option. > If clusterid is not given then clusterid will be automatically generated for > the upgrade but if clusterid is given then it will be honored. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046795#comment-13046795 ] Todd Lipcon commented on HDFS-2054: --- Maybe we can check the exception type and message, and only log warning for unexpected ones? EG "Connection reset by peer" and "Broken pipe" are expected exceptions, but anything else should be logged at WARN level. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2055) Add hflush support to libhdfs
Add hflush support to libhdfs - Key: HDFS-2055 URL: https://issues.apache.org/jira/browse/HDFS-2055 Project: Hadoop HDFS Issue Type: New Feature Components: libhdfs Reporter: Travis Crawford libhdfs would be improved by adding support for hflush. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046790#comment-13046790 ] Suresh Srinivas commented on HDFS-1295: --- +1 for the yahoo-merge patch also > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, > IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, > shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046787#comment-13046787 ] Suresh Srinivas commented on HDFS-1295: --- +1 for the trunk patch > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, > IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, > shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins reassigned HDFS-2053: - Assignee: Michael Noll Hey Michael - thank you for the excellent report! In summary, the condition used to warn in FSDirectory#computeContentSummary has a bug, it compares the cached value for the directory not to a computed value for that directory but to a computed value that includes the directory and it's siblings. The bug results in a spurious warning, it doesn't impact eg the correctness of quotas. Given this I think two things are reasonable: # Remove the warning (which removes the bug) # Compute the correct summary for just that directory (your patch) The latter sounds good to me. Allocating a 4 long array for each level in the directory hierarchy isn't bad and this method isn't on a hot path. Nit, I'd change array allocation to the following since we assume summary has len 4 and should be faster. {noformat} assert 4 == summary.length; long[] subtreeSummary = new long[]{0,0,0,0} {noformat} Wrt testing how about right after space is calculated adding the following: {noformat} assert -1 == node.getDsQuota() || space == subtreeSummary[3]; {noformat} Asserts are enabled by default when the tests are run, if TestQuota doesn't trigger this assert then add a test similar to what you did manullay which will trigger it. Also, please generate a patch against trunk (HDFS-2053_v2.txt doesn't apply for me). Thanks! > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Assignee: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.204.0, 0.20.205.0 > > Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}
[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1295: - Attachment: HDFS-1295_for_ymerge.patch Attaching patch ported to yahoo-merge branch. Turning off "Patch Available" so Hudson doesn't try to run test-patch on non-trunk patch. > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, > IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, > shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes
[ https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-1295: - Status: Open (was: Patch Available) > Improve namenode restart times by short-circuiting the first block reports > from datanodes > - > > Key: HDFS-1295 > URL: https://issues.apache.org/jira/browse/HDFS-1295 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0 >Reporter: dhruba borthakur >Assignee: Matt Foley > Fix For: 0.23.0 > > Attachments: HDFS-1295_delta_for_trunk.patch, > HDFS-1295_for_ymerge.patch, IBR_shortcut_v2a.patch, > IBR_shortcut_v3atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, > IBR_shortcut_v6atrunk.patch, IBR_shortcut_v7atrunk.patch, > shortCircuitBlockReport_1.txt > > > The namenode restart is dominated by the performance of processing block > reports. On a 2000 node cluster with 90 million blocks, block report > processing takes 30 to 40 minutes. The namenode "diffs" the contents of the > incoming block report with the contents of the blocks map, and then applies > these diffs to the blocksMap, but in reality there is no need to compute the > "diff" because this is the first block report from the datanode. > This code change improves block report processing time by 300%. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.
[ https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767 ] Aaron T. Myers commented on HDFS-1973: -- Hi Hari, bq. Can you please elaborate a little bit on your area of interest with ZOOKEEPER-1080? As noted in Sanjay's design doc, one proposal for detecting NN failure would be to use an external ZK service. The HDFS proposal doesn't go into great detail on this, but it suggests using ZK with a heartbeat mechanism to see if the NN is still alive. I personally like the ZK recipe better (i.e. using ephemeral + sequence nodes). Another possible use for ZK in the implementation of NN HA would be to use ZK as the source of truth for clients to determine the active NN. This would seem to flow naturally from the part of the ZK recipe which says "Applications may consider creating a separate to znode to acknowledge that the leader has executed the leader procedure." If NN HA were to utilize an implementation of the ZK leader election recipe, then perhaps this "leader-procedure-complete znode" could store the IP or hostname of the active NN which clients could use. I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and do that and post my comments there. I should also mention that we have not settled upon what strategy we'll take to do NN failure detection or client failover. As noted in Sanjay's design doc, we're also strongly considering using virtual IPs for client failover. > HA: HDFS clients must handle namenode failover and switch over to the new > active namenode. > -- > > Key: HDFS-1973 > URL: https://issues.apache.org/jira/browse/HDFS-1973 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Suresh Srinivas >Assignee: Aaron T. Myers > > During failover, a client must detect the current active namenode failure and > switch over to the new active namenode. The switch over might make use of IP > failover or some thing more elaborate such as zookeeper to discover the new > active. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046765#comment-13046765 ] Kihwal Lee commented on HDFS-2054: -- To minimum, it will get rid of the annoying stack trace. transferTo() is not exactly making it easy to deal with different exceptions differently. I believe things like EAGAIN was fixed now, but to deal with others you have to parse the error itself, which is rather gross. Ideally we want to deal with EAGAIN, EPIPE, etc. separately and if something else happens print an error message. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046760#comment-13046760 ] stack commented on HDFS-2054: - @Kihwal Do we think this enough to address this issue? I see loads of it running hbase loadings on 0.22. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046756#comment-13046756 ] stack commented on HDFS-941: Yeah, my 0.22 version fails against trunk (trunk already has guava, etc.) > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046755#comment-13046755 ] stack commented on HDFS-941: So, that would leave 48 beers that I need to buy (And Nigel probably wants two) -- I can get a keg? > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046754#comment-13046754 ] Kihwal Lee commented on HDFS-2054: -- It may reveal interesting errors in the future, so the log level is being lowered to warn. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2054: - Attachment: HDFS-2054.patch > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: HDFS-2054.patch > > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046749#comment-13046749 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481963/941.22.txt against trunk revision 1134031. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/755//console This message is automatically generated. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046744#comment-13046744 ] Eli Collins commented on HDFS-941: -- Make that two beers (52/48?). I reviewed an earlier version of this patch but if Nigel is game I think it's suitable for 22 as well. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2054: - Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-941: --- Attachment: 941.22.txt Forgot --no-prefix. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, 941.22.txt, HDFS-941-1.patch, > HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, > HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, > HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046739#comment-13046739 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481962/941.22.txt against trunk revision 1134031. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/754//console This message is automatically generated. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, > HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, > HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, > fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, > hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046738#comment-13046738 ] stack commented on HDFS-941: Todd, I'll buy you a beer to go 51/49 in favor of 0.22 commit. If Nigel wants me to a make a case, I could do it here or in another issue? > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, > HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, > HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, > fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, > hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-941: --- Attachment: 941.22.txt Here is my backport of Todds final patch. Main differences are adding in guava and removal of TestDataXceiver (util works differently in TRUNK). > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: 941.22.txt, HDFS-941-1.patch, HDFS-941-2.patch, > HDFS-941-3.patch, HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, > HDFS-941-6.22.patch, HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, > fix-close-delta.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, > hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-941: - Hadoop Flags: [Reviewed] > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
[ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2003: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk, thanks Ivan! I will try to merge this into the 1073 branch this afternoon or evening. > Separate FSEditLog reading logic from editLog memory state building logic > - > > Key: HDFS-2003 > URL: https://issues.apache.org/jira/browse/HDFS-2003 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt > > > Currently FSEditLogLoader has code for reading from an InputStream > interleaved with code which updates the FSNameSystem and FSDirectory. This > makes it difficult to read an edit log without having a whole load of other > object initialised, which is problematic if you want to do things like count > how many transactions are in a file etc. > This patch separates the reading of the stream and the building of the memory > state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
[ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046727#comment-13046727 ] Todd Lipcon commented on HDFS-2003: --- +1, good stuff! I'll commit momentarily > Separate FSEditLog reading logic from editLog memory state building logic > - > > Key: HDFS-2003 > URL: https://issues.apache.org/jira/browse/HDFS-2003 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt > > > Currently FSEditLogLoader has code for reading from an InputStream > interleaved with code which updates the FSNameSystem and FSDirectory. This > makes it difficult to read an edit log without having a whole load of other > object initialised, which is problematic if you want to do things like count > how many transactions are in a file etc. > This patch separates the reading of the stream and the building of the memory > state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2054: - Description: The addition of ERROR was part of HDFS-1527. In environments where clients tear down FSInputStream/connection before reaching the end of stream, this error message often pops up. Since these are not really errors and especially not the fault of data node, the message should be toned down at least. (was: The addition of ERROR was part of HDFS-1527. In environments where clients tear down FSInputStream/connection before reaching the end of stream, this error message often pops up. Since these are not really errors and especially not the fault of data node, the message should be toned down at least. Assigning to the author of HDFS-1527.) Assignee: Kihwal Lee (was: Patrick Kling) > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
[ https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046725#comment-13046725 ] Patrick Kling commented on HDFS-2054: - If I remember correctly, the bug fixed by HDFS-1527 was causing the affected transfers to fail silently. That's why I added this message. If it is polluting the log file, I have no objection to downgrading this to a warning. > BlockSender.sendChunk() prints ERROR for connection closures encountered > during transferToFully() > -- > > Key: HDFS-2054 > URL: https://issues.apache.org/jira/browse/HDFS-2054 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.22.0, 0.23.0 >Reporter: Kihwal Lee >Assignee: Patrick Kling > > The addition of ERROR was part of HDFS-1527. In environments where clients > tear down FSInputStream/connection before reaching the end of stream, this > error message often pops up. Since these are not really errors and especially > not the fault of data node, the message should be toned down at least. > Assigning to the author of HDFS-1527. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046724#comment-13046724 ] Todd Lipcon commented on HDFS-941: -- Also, big thanks to: bc for authoring the majority of the patch and test cases, Sam Rash for reviews, and Stack and Kihwal for both code review and cluster testing. Great team effort spanning 4 companies! > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046722#comment-13046722 ] Todd Lipcon commented on HDFS-941: -- Committed to trunk. I'm 50/50 on whether this should go into the 0.22 branch as well. Like Stack said, it's a nice carrot to help convince HBase users to try out 0.22. But, it's purely an optimization and on the riskier side as far as these things go. I guess I'll ping Nigel? > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-236) Random read benchmark for DFS
[ https://issues.apache.org/jira/browse/HDFS-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046718#comment-13046718 ] stack commented on HDFS-236: Reread Raghu's comments above. Its (still) great. > Random read benchmark for DFS > - > > Key: HDFS-236 > URL: https://issues.apache.org/jira/browse/HDFS-236 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Raghu Angadi >Assignee: Raghu Angadi > Attachments: HDFS-236.patch, RndRead-TestDFSIO.patch > > > We should have at least one random read benchmark that can be run with rest > of Hadoop benchmarks regularly. > Please provide benchmark ideas or requirements. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046716#comment-13046716 ] Kihwal Lee commented on HDFS-941: - They were pure readers and didn't write/report anything until the end. I just filed HDFS-2054 for the error message. If you find the other JIRA that was already filed, please dupe one to the other. +1 for commit. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully() -- Key: HDFS-2054 URL: https://issues.apache.org/jira/browse/HDFS-2054 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.22.0, 0.23.0 Reporter: Kihwal Lee Assignee: Patrick Kling The addition of ERROR was part of HDFS-1527. In environments where clients tear down FSInputStream/connection before reaching the end of stream, this error message often pops up. Since these are not really errors and especially not the fault of data node, the message should be toned down at least. Assigning to the author of HDFS-1527. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046706#comment-13046706 ] Todd Lipcon commented on HDFS-941: -- Regarding duplicate connections: also keep in mind that the caching only applies at the read side. So, assuming there's some output as well, there will be a socket for each of those streams. I agree we should fix the "sendChunks" error messages separately. I think JD might have filed a JIRA about this a few weeks ago. I'll see if I can dig it up. Kihwal: are you +1 on commit now as well? > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046700#comment-13046700 ] stack commented on HDFS-941: +1 on commit for latest version of patch. I've been running over the last few hours. I no longer see "Client /10.4.9.34did not send a valid status code after reading" (fix the space on commit) nor do I see the "Got error for OP_READ_BLOCK" exceptions". I have the BlockSender.sendChunks exceptions but they are something else (that we need to fix). Nice test you have over there Kihwal! My test was a 5 node cluster running hbase on a 451 patched 0.22. The loading was random reads running in MR and then another random-read test being done via a bunch of clients. Cache was disabled so went to FS for all data. I also had random writing going on concurrently. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046661#comment-13046661 ] Kihwal Lee commented on HDFS-941: - OK, I see it's from BlockSender.java:407. It really shouldn't say ERROR since clients can close connections any time, but I agree that this needs to be addressed in a separate work. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046650#comment-13046650 ] stack commented on HDFS-941: @Kihwal I see lots of those sendChunks exceptions but don't think related. Testing latest addition to patch... > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"
[ https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046649#comment-13046649 ] Hadoop QA commented on HDFS-1409: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12455111/HDFS-1409.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestBackupNode +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/753//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/753//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/753//console This message is automatically generated. > The "register" method of the BackupNode class should be > "UnsupportedActionException("register")" > > > Key: HDFS-1409 > URL: https://issues.apache.org/jira/browse/HDFS-1409 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Ching-Shen Chen >Priority: Trivial > Fix For: 0.21.1 > > Attachments: HDFS-1409.patch, HDFS-1409.patch > > > The register method of the BackupNode class should be > "UnsupportedActionException("register")" rather than > "UnsupportedActionException("journal")". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046643#comment-13046643 ] Kihwal Lee commented on HDFS-941: - I am retesting with Todd's patch and I don't see the messages anymore. Instead, I see more of "BlockSender.sendChunks() exception: java.io.IOException: Broken pipe" from DNs. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()
[ https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046626#comment-13046626 ] Hadoop QA commented on HDFS-2002: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481935/hdfs-2002.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.server.namenode.TestSafeMode +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/752//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/752//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/752//console This message is automatically generated. > Incorrect computation of needed blocks in getTurnOffTip() > - > > Key: HDFS-2002 > URL: https://issues.apache.org/jira/browse/HDFS-2002 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Matthias Eckert > Labels: newbie > Fix For: 0.22.0 > > Attachments: hdfs-2002.patch > > > {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to > reach the safemode threshold. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"
[ https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1409: -- Status: Patch Available (was: Open) > The "register" method of the BackupNode class should be > "UnsupportedActionException("register")" > > > Key: HDFS-1409 > URL: https://issues.apache.org/jira/browse/HDFS-1409 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Ching-Shen Chen >Priority: Trivial > Fix For: 0.21.1 > > Attachments: HDFS-1409.patch, HDFS-1409.patch > > > The register method of the BackupNode class should be > "UnsupportedActionException("register")" rather than > "UnsupportedActionException("journal")". -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()
[ https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-2002: -- Assignee: Matthias Eckert > Incorrect computation of needed blocks in getTurnOffTip() > - > > Key: HDFS-2002 > URL: https://issues.apache.org/jira/browse/HDFS-2002 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Matthias Eckert > Labels: newbie > Fix For: 0.22.0 > > Attachments: hdfs-2002.patch > > > {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to > reach the safemode threshold. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()
[ https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Eckert updated HDFS-2002: -- Status: Patch Available (was: Open) > Incorrect computation of needed blocks in getTurnOffTip() > - > > Key: HDFS-2002 > URL: https://issues.apache.org/jira/browse/HDFS-2002 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko > Labels: newbie > Fix For: 0.22.0 > > Attachments: hdfs-2002.patch > > > {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to > reach the safemode threshold. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046583#comment-13046583 ] Kihwal Lee commented on HDFS-941: - Regarding duplicate connections, it makes sense because the inputstream cache is per file and it is quite possible that the clients read blocks belonging to two files that are on the same DN within the window of 3 reads. I will look at the one happening during task initialization. May be they just stop reading in the middle of stream by design. Since one message will show up for every new map task, how about changing the message to DEBUG after we are done with testing? > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()
[ https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Eckert updated HDFS-2002: -- Attachment: hdfs-2002.patch Log a warning if the threshold is larger than 1. Correct the number of remaining nodes and blocks. > Incorrect computation of needed blocks in getTurnOffTip() > - > > Key: HDFS-2002 > URL: https://issues.apache.org/jira/browse/HDFS-2002 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko > Labels: newbie > Fix For: 0.22.0 > > Attachments: hdfs-2002.patch > > > {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to > reach the safemode threshold. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-236) Random read benchmark for DFS
[ https://issues.apache.org/jira/browse/HDFS-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046572#comment-13046572 ] Kihwal Lee commented on HDFS-236: - * Some test.io.randomread.* seem to deserve a spot in command line args. * The buffer size can be used as the read size in random reads. I see no reason to separate the two in the random read mode. * The default behavior is, one random reader operates on just one file out of N files. Since it already has ability to limit the number of files that each reader can access, it might be better to make it work on all N files by default. > Random read benchmark for DFS > - > > Key: HDFS-236 > URL: https://issues.apache.org/jira/browse/HDFS-236 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Raghu Angadi >Assignee: Raghu Angadi > Attachments: HDFS-236.patch, RndRead-TestDFSIO.patch > > > We should have at least one random read benchmark that can be run with rest > of Hadoop benchmarks regularly. > Please provide benchmark ideas or requirements. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1410) The doCheckpoint() method should be invoked every hour
[ https://issues.apache.org/jira/browse/HDFS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan resolved HDFS-1410. --- Resolution: Duplicate This was fixed in HDFS-1572. Resolving. > The doCheckpoint() method should be invoked every hour > -- > > Key: HDFS-1410 > URL: https://issues.apache.org/jira/browse/HDFS-1410 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.21.0 >Reporter: Ching-Shen Chen > Fix For: 0.21.1 > > Attachments: HDFS-1410.patch, HDFS-1410.patch > > > The doCheckpoint() method should be invoked every hour rather than five > minutes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046526#comment-13046526 ] Kihwal Lee commented on HDFS-941: - Good catch and fix! I took a close look the open connections each reader has and sometimes saw more than one connections to a same DN. I will see if that is fixed with the Todd's fix. Otherwise I will look further to determine if it is an issue. The test I did was primarily for exercising the socket cache itself. To make it more interesting, the socket cache size was lowered to 3 and dfs.replication to 1. I used the random read test (work in progress) in HDFS-236 on a cluster with 8 data nodes. 200 X 170MB files were created. 200 readers (25 on each DN) read 200 files randomly 64K at a time, jumping among files, for about 6 hours last night. Each reader caches DFSInputStream to all 200 files during its lifetime. Checked the client/server logs afterward. ** I saw 25 of the "did not send a valid status code after reading. Will close connection" warning at around the task initialization (readers are map tasks) on each data node. They all look local, so they are likely accessing the job conf/jar files that are replicated and available on all eight data nodes, unlike regular data files. Or accessing local DN for some other reasons during this time period. Need to check whether this needs to be fixed. ** While running, there were 3 ESTABLISHED connections per process and some number of sockets in TIME_WAIT all the time. It means socket cache is not leaking anything, clients are not denied of new connections and eviction is working. ** The only thing I think a bit odd is the symptom I mentioned above: Duplicate connections in the socket cache. I will try to reproduce with Todd's latest fix. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046499#comment-13046499 ] Hadoop QA commented on HDFS-2053: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481918/HDFS-2053_v2.txt against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/751//console This message is automatically generated. > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.204.0, 0.20.205.0 > > Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 >
[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Noll updated HDFS-2053: --- Attachment: HDFS-2053_v2.txt New patch version, no properly using 'git diff --no-prefix' to generate it. Doh! > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.204.0, 0.20.205.0 > > Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 > patch, so this might not mean anything. ;-) > That said I am unsure what the most appropriate way to unit test this issue > would be. A straight-forward approach would be to automate the steps in the > _How to reproduce section_ above and check whether the NN logs an incorrect > warning message. But I'm not sure how this check could be implemented. Feel > free to provide some pointers if you have some ideas. > *Note about Fix Version/s* > The patch _should_ apply to all branches where the HDFS-1377 patch has > committed to. In my environment, the build was Hadoop 0.20.203.0 release > with a (trivial) backport of
[jira] [Commented] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046490#comment-13046490 ] Hadoop QA commented on HDFS-2053: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481915/HDFS-2053_v1.txt against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/750//console This message is automatically generated. > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.204.0, 0.20.205.0 > > Attachments: HDFS-2053_v1.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 > patch, so this mi
[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Noll updated HDFS-2053: --- Fix Version/s: (was: 0.20.203.0) 0.20.205.0 0.20.204.0 Affects Version/s: (was: 0.20.203.0) 0.20.205.0 0.20.204.0 Status: Patch Available (was: Open) Again, I am not sure how to properly identify the correct names of the versions. For instance, the patch successfully applies to branch-0.20-security-204 but I am not sure whether this translates to version "0.20.204.0" in the dropdown list. > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.204.0, 0.20.205.0 > > Attachments: HDFS-2053_v1.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 > patch, so this might not mean anything. ;-) > That said I am unsure what the most appropriate way to unit test this issue > would be. A straight-forward approach would be to automate the steps in the > _How to reproduce section_ above
[jira] [Updated] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Noll updated HDFS-2053: --- Attachment: HDFS-2053_v1.txt Patch version 1 for HDFS-2053. The patch should apply to all branches to which the original HDFS-1377 patch has been applied to. See the ticket description for more details regarding "Fix Version/s". > NameNode detects "Inconsistent diskspace" for directories with quota-enabled > subdirectories (introduced by HDFS-1377) > - > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20.3, 0.20.203.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). >Reporter: Michael Noll >Priority: Minor > Fix For: 0.20.3, 0.20.203.0 > > Attachments: HDFS-2053_v1.txt > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 > patch, so this might not mean anything. ;-) > That said I am unsure what the most appropriate way to unit test this issue > would be. A straight-forward approach would be to automate the steps in the > _How to reproduce section_ above and check whether the NN logs an incorrect > warning message. But I'm not sure how this check could be implemented. Feel > free to provide some pointers if you have some ideas. > *Note about Fix Version/s* > The patch _should_ apply to all branches where the HDFS-1377 patch has > committed to. In my envir
[jira] [Created] (HDFS-2053) NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377)
NameNode detects "Inconsistent diskspace" for directories with quota-enabled subdirectories (introduced by HDFS-1377) - Key: HDFS-2053 URL: https://issues.apache.org/jira/browse/HDFS-2053 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.3, 0.20.203.0 Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch applied. My impression is that the same issue exists also in the other branches where the HDFS-1377 patch has been applied to (see description). Reporter: Michael Noll Priority: Minor Fix For: 0.20.3, 0.20.203.0 *How to reproduce* {code} # create test directories $ hadoop fs -mkdir /hdfs-1377/A $ hadoop fs -mkdir /hdfs-1377/B $ hadoop fs -mkdir /hdfs-1377/C # ...add some test data (few kB or MB) to all three dirs... # set space quota for subdir C only $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C # the following two commands _on the parent dir_ trigger the warning $ hadoop fs -dus /hdfs-1377 $ hadoop fs -count -q /hdfs-1377 {code} Warning message in the namenode logs: {code} 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 {code} Note that the commands are run on the _parent directory_ but the warning is shown for the _subdirectory_ with space quota. *Background* The bug was introduced by the HDFS-1377 patch, which is currently committed to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, branch-0.20-security-205 and release-0.20.3-rc2. In the patch, {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was updated to trigger the warning above if the cached and computed diskspace values are not the same for a directory with quota. The warning is written by {{computecontentSummary(long[] summary)}} in {{INodeDirectory}}. In the method an inode's children are recursively walked through while the {{summary}} parameter is passed and updated along the way. {code} /** {@inheritDoc} */ long[] computeContentSummary(long[] summary) { if (children != null) { for (INode child : children) { child.computeContentSummary(summary); } } {code} The condition that triggers the warning message compares the current node's cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding field in {{summary}}. {code} if (-1 != node.getDsQuota() && space != summary[3]) { NameNode.LOG.warn("Inconsistent diskspace for directory " +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); {code} However {{summary}} may already include diskspace information from other inodes at this point (i.e. from different subtrees than the subtree of the node for which the warning message is shown; in our example for the tree at {{/hdfs-1377}}, {{summary}} can already contain information from {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be different from the computed value. *How to fix* The supplied patch creates a fresh summary array for the subtree of the current node. The walk through the children passes and updates this {{subtreeSummary}} array, and the condition is checked against {{subtreeSummary}} instead of the original {{summary}}. The original {{summary}} is updated with the values of {{subtreeSummary}} before it returns. *Unit Tests* I have run "ant test" on my patched build without any errors*. However the existing unit tests did not catch this issue for the original HDFS-1377 patch, so this might not mean anything. ;-) That said I am unsure what the most appropriate way to unit test this issue would be. A straight-forward approach would be to automate the steps in the _How to reproduce section_ above and check whether the NN logs an incorrect warning message. But I'm not sure how this check could be implemented. Feel free to provide some pointers if you have some ideas. *Note about Fix Version/s* The patch _should_ apply to all branches where the HDFS-1377 patch has committed to. In my environment, the build was Hadoop 0.20.203.0 release with a (trivial) backport of HDFS-1377 (0.20.203.0 release does not ship with the HDFS-1377 fix). I could apply the patch successfully to {{branch-0.20-security}}, {{branch-0.20-security-204}} and {{release-0.20.3-rc2}}, for instance. Since I'm a bit confused regarding the upcoming 0.20.x release versions (0.20.x vs. 0.20.20x.y) I have been so bold and added 0.20.203.0 to the list of affected versions even though it is actually only affected when HDFS-1377 is applied to it... Best, Michael *Well, I get one error for {{TestRumenJobTraces}} but first this see
[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
[ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046429#comment-13046429 ] Hadoop QA commented on HDFS-2003: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481897/HDFS-2003.diff against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/749//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/749//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/749//console This message is automatically generated. > Separate FSEditLog reading logic from editLog memory state building logic > - > > Key: HDFS-2003 > URL: https://issues.apache.org/jira/browse/HDFS-2003 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt > > > Currently FSEditLogLoader has code for reading from an InputStream > interleaved with code which updates the FSNameSystem and FSDirectory. This > makes it difficult to read an edit log without having a whole load of other > object initialised, which is problematic if you want to do things like count > how many transactions are in a file etc. > This patch separates the reading of the stream and the building of the memory > state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection
[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046422#comment-13046422 ] Hadoop QA commented on HDFS-941: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481896/hdfs-941.txt against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/748//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/748//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/748//console This message is automatically generated. > Datanode xceiver protocol should allow reuse of a connection > > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, > HDFS-941-6.patch, HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046397#comment-13046397 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/747//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/747//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/747//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
[ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-2003: - Attachment: HDFS-2003.diff Addressed the two things from Todd's previous comment. > Separate FSEditLog reading logic from editLog memory state building logic > - > > Key: HDFS-2003 > URL: https://issues.apache.org/jira/browse/HDFS-2003 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: Edit log branch (HDFS-1073) >Reporter: Ivan Kelly >Assignee: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, > hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt > > > Currently FSEditLogLoader has code for reading from an InputStream > interleaved with code which updates the FSNameSystem and FSDirectory. This > makes it difficult to read an edit log without having a whole load of other > object initialised, which is problematic if you want to do things like count > how many transactions are in a file etc. > This patch separates the reading of the stream and the building of the memory > state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
[ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046389#comment-13046389 ] Hadoop QA commented on HDFS-988: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481892/hdfs-988-6.patch against trunk revision 1133476. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/746//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/746//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/746//console This message is automatically generated. > saveNamespace can corrupt edits log, apparently due to race conditions > -- > > Key: HDFS-988 > URL: https://issues.apache.org/jira/browse/HDFS-988 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.20-append, 0.21.0, 0.22.0 >Reporter: dhruba borthakur >Assignee: Eli Collins >Priority: Blocker > Fix For: 0.20-append, 0.22.0 > > Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, > hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, > hdfs-988-6.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt, > saveNamespace_20-append.patch > > > The adminstrator puts the namenode is safemode and then issues the > savenamespace command. This can corrupt the edits log. The problem is that > when the NN enters safemode, there could still be pending logSycs occuring > from other threads. Now, the saveNamespace command, when executed, would save > a edits log with partial writes. I have seen this happen on 0.20. > https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira