[jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
[ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700274#comment-14700274 ] Colin Patrick McCabe commented on HDFS-7446: I would like to see this backported to 2.6.1 just because otherwise it will create hassles for people who want to start using inotify. Do you think this is feasible? HDFS inotify should have the ability to determine what txid it has read up to - Key: HDFS-7446 URL: https://issues.apache.org/jira/browse/HDFS-7446 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7446.001.patch, HDFS-7446.002.patch, HDFS-7446.003.patch HDFS inotify should have the ability to determine what txid it has read up to. This will allow users who want to avoid missing any events to record this txid and use it to resume reading events at the spot they left off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699337#comment-14699337 ] Rakesh R commented on HDFS-8897: Hi [~Alexandre LINTE], Thanks for reporting this. Jira description is confusing a bit, could you please give more details about your test scenario and the expected output. Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8713) Convert DatanodeDescriptor to use SLF4J logging
[ https://issues.apache.org/jira/browse/HDFS-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699358#comment-14699358 ] Hadoop QA commented on HDFS-8713: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 32s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 5s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 172m 48s | Tests failed in hadoop-hdfs. | | | | 214m 24s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743378/hdfs-8713.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 13604bd | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12007/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12007/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12007/console | This message was automatically generated. Convert DatanodeDescriptor to use SLF4J logging --- Key: HDFS-8713 URL: https://issues.apache.org/jira/browse/HDFS-8713 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.6-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Attachments: hdfs-8713.001.patch Let's convert this class to use SLF4J -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699454#comment-14699454 ] Rakesh R commented on HDFS-8897: I could see hdfs://sandbox appears two times {code}15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox]{code} Is that duplicate entries? am I miss anything? Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8845) DiskChecker should not traverse the entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8845: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to 2.8. Thanks, [~lichangleo]. DiskChecker should not traverse the entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Fix For: 2.8.0 Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8833: Attachment: HDFS-8833-HDFS-7285-merge.01.patch Updating the patch to: # Remove all references of {{ErasureCodingZone}} in {{src/main}}, will handle {{test}} separately. # Updated {{INodeFile}}, changing {{isStriped}} to {{erasureCodingPolicy}} in header and added an API # Changed how {{FSDirErasureCodingOp#getErasureCodingPolicyForPath}} queries EC policy, taking into consideration the policy stored in file header # Updated the main test {{TestErasureCodingZones}} Version 00 patch is mostly refactoring and this patch has some logic-level changes. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch, HDFS-8833-HDFS-7285-merge.01.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
[ https://issues.apache.org/jira/browse/HDFS-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700478#comment-14700478 ] Hadoop QA commented on HDFS-8908: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 7m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 31s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 7s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 176m 23s | Tests failed in hadoop-hdfs. | | | | 199m 1s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750861/h8908_20150817.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / c77bd6a | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12011/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12011/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12011/console | This message was automatically generated. TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode -- Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8908_20150817.patch See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8911: --- Status: Patch Available (was: Open) NameNode Metric : Add WAL counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8911: --- Attachment: HDFS-8911.001.patch Adds TotalSyncCount and TotalSyncTimes metrics to NameNodeMetrics NameNode Metric : Add WAL counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8278) HDFS Balancer should consider remaining storage % when checking for under-utilized machines
[ https://issues.apache.org/jira/browse/HDFS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700522#comment-14700522 ] Hadoop QA commented on HDFS-8278: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 48s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 10s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 174m 22s | Tests failed in hadoop-hdfs. | | | | 219m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.fs.viewfs.TestViewFsWithXAttrs | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750868/h8278_20150817.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c77bd6a | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12012/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12012/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12012/console | This message was automatically generated. HDFS Balancer should consider remaining storage % when checking for under-utilized machines --- Key: HDFS-8278 URL: https://issues.apache.org/jira/browse/HDFS-8278 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Tsz Wo Nicholas Sze Attachments: h8278_20150817.patch DFS balancer mistakenly identifies a node with very little storage space remaining as an underutilized node and tries to move large amounts of data to that particular node. All these block moves fail to execute successfully, as the % utilization is less relevant than the dfs remaining storage on that node. {code} 15/04/24 04:25:55 INFO balancer.Balancer: 0 over-utilized: [] 15/04/24 04:25:55 INFO balancer.Balancer: 1 underutilized: [172.19.1.46:50010:DISK] 15/04/24 04:25:55 INFO balancer.Balancer: Need to move 47.68 GB to make the cluster balanced. 15/04/24 04:25:55 INFO balancer.Balancer: Decided to move 413.08 MB bytes from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK 15/04/24 04:25:55 INFO balancer.Balancer: Will move 413.08 MB in this iteration 15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed {code} The machine in concern is under-full when it comes to the BP utilization, but has very little free space available for blocks. {code} Decommission Status : Normal Configured Capacity: 3826907185152 (3.48 TB) DFS Used: 2817262833664 (2.56 TB) Non DFS Used: 1000621305856 (931.90 GB) DFS Remaining: 9023045632 (8.40 GB) DFS Used%: 73.62% DFS Remaining%: 0.24% Configured Cache Capacity: 8589934592 (8 GB) Cache Used: 0 (0 B) Cache Remaining: 8589934592 (8 GB) Cache
[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700530#comment-14700530 ] Hadoop QA commented on HDFS-6955: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 50s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 54s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 24s | The applied patch generated 3 new checkstyle issues (total was 154, now 155). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 24s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 176m 49s | Tests failed in hadoop-hdfs. | | | | 222m 14s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750865/HDFS-6955-02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c77bd6a | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12013/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12013/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12013/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12013/console | This message was automatically generated. DN should reserve disk space for a full block when creating tmp files - Key: HDFS-6955 URL: https://issues.apache.org/jira/browse/HDFS-6955 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: kanaka kumar avvaru Attachments: HDFS-6955-01.patch, HDFS-6955-02.patch HDFS-6898 is introducing disk space reservation for RBW files to avoid running out of disk space midway through block creation. This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8862) Improve BlockManager#excessReplicateMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700560#comment-14700560 ] Yi Liu commented on HDFS-8862: -- Thanks [~cmccabe] for the review. Will commit the patch later. {{LightWeightLinkedSetBlockInfo}} can shrink.. Improve BlockManager#excessReplicateMap --- Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8862) BlockManager#excessReplicateMap should use a HashMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8862: - Summary: BlockManager#excessReplicateMap should use a HashMap (was: Improve BlockManager#excessReplicateMap) BlockManager#excessReplicateMap should use a HashMap Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8912) Implement ShrinkableHashMap extends java HashMap and use properly
[ https://issues.apache.org/jira/browse/HDFS-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700620#comment-14700620 ] Yi Liu commented on HDFS-8912: -- Hi [~cmccabe], how do you think about it? Thanks. Implement ShrinkableHashMap extends java HashMap and use properly - Key: HDFS-8912 URL: https://issues.apache.org/jira/browse/HDFS-8912 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Currently {{LightWeightHashSet}} and {{LightWeightLinkedSet}} are used in hdfs, there are two advantages compared to java HashSet: one is the entry requires fewer memory, another is it's shrinkable. In real cluster, hdfs is a long running service, and {{set}} may become very large at some time and may become small after that, so shrinking the {{set}} when size hits the shrink threshold is necessary, it can improve the NN memory. Same situation for {{map}}, some HashMap used in BlockManager (e.g., the hashmap in CorruptReplicasMap), it's better to be shrinkable. I think it's worth to implement ShrinkableHashMap extends the java HashMap, for quick glance, seems few code is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700485#comment-14700485 ] Arpit Agarwal commented on HDFS-8911: - +1 pending Jenkins. NameNode Metric : Add WAL counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700500#comment-14700500 ] Andrew Wang commented on HDFS-8911: --- This is a small correction, but the edit log is write behind not write ahead. Could you update the JIRA summary to just say edit log rather than WAL? NameNode Metric : Add WAL counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8862) BlockManager#excessReplicateMap should use a HashMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8862: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) BlockManager#excessReplicateMap should use a HashMap Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8826) Balancer may not move blocks efficiently in some cases
[ https://issues.apache.org/jira/browse/HDFS-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700575#comment-14700575 ] Arpit Agarwal commented on HDFS-8826: - +1 for the patch. The test failures look unrelated although couple of the checkstyle issues look valid. Balancer may not move blocks efficiently in some cases -- Key: HDFS-8826 URL: https://issues.apache.org/jira/browse/HDFS-8826 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8826_20150811.patch, h8826_20150816.patch Balancer is inefficient in the following case: || Datanode || Utilization || Rack || | D1 | 95% | A | | D2 | 30% | B | | D3, D4, D5 | 0% | B | The average utilization is 25% so that D2 is within 10% threshold. However, Balancer currently will first move blocks from D2 to D3, D4 and D5 since they are under the same rack. Then, it will move blocks from D1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8278) HDFS Balancer should consider remaining storage % when checking for under-utilized machines
[ https://issues.apache.org/jira/browse/HDFS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8278: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks Jing for reviewing the patch. I have committed this. HDFS Balancer should consider remaining storage % when checking for under-utilized machines --- Key: HDFS-8278 URL: https://issues.apache.org/jira/browse/HDFS-8278 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Tsz Wo Nicholas Sze Fix For: 2.8.0 Attachments: h8278_20150817.patch DFS balancer mistakenly identifies a node with very little storage space remaining as an underutilized node and tries to move large amounts of data to that particular node. All these block moves fail to execute successfully, as the % utilization is less relevant than the dfs remaining storage on that node. {code} 15/04/24 04:25:55 INFO balancer.Balancer: 0 over-utilized: [] 15/04/24 04:25:55 INFO balancer.Balancer: 1 underutilized: [172.19.1.46:50010:DISK] 15/04/24 04:25:55 INFO balancer.Balancer: Need to move 47.68 GB to make the cluster balanced. 15/04/24 04:25:55 INFO balancer.Balancer: Decided to move 413.08 MB bytes from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK 15/04/24 04:25:55 INFO balancer.Balancer: Will move 413.08 MB in this iteration 15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed {code} The machine in concern is under-full when it comes to the BP utilization, but has very little free space available for blocks. {code} Decommission Status : Normal Configured Capacity: 3826907185152 (3.48 TB) DFS Used: 2817262833664 (2.56 TB) Non DFS Used: 1000621305856 (931.90 GB) DFS Remaining: 9023045632 (8.40 GB) DFS Used%: 73.62% DFS Remaining%: 0.24% Configured Cache Capacity: 8589934592 (8 GB) Cache Used: 0 (0 B) Cache Remaining: 8589934592 (8 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 3 Last contact: Fri Apr 24 04:28:36 PDT 2015 {code} The machine has 0.40 Gb of non-RAM storage available on that node, so it is futile to attempt to move any blocks to that particular machine. This is a similar concern when a machine loses disks, since the comparisons of utilization always compare percentages per-node. Even that scenario needs to cap data movement to that node to the DFS Remaining % variable. Trying to move any more data than that to a given node will always fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8880) NameNode metrics logging
[ https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700457#comment-14700457 ] Jitendra Nath Pandey commented on HDFS-8880: +1 NameNode metrics logging Key: HDFS-8880 URL: https://issues.apache.org/jira/browse/HDFS-8880 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log The NameNode can periodically log metrics to help debugging when the cluster is not setup with another metrics monitoring scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700527#comment-14700527 ] Andrew Wang commented on HDFS-8895: --- Quite some flakes but they all passed for me locally. Will commit shortly, thanks Eddy for reviewing. Remove deprecated BlockStorageLocation APIs --- Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-8895.001.patch HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8895: -- Resolution: Fixed Fix Version/s: 3.0.0 Release Note: This removes the deprecated DistributedFileSystem#getFileBlockStorageLocations API used for getting VolumeIds of block replicas. Applications interested in the volume of a replica can instead consult BlockLocation#getStorageIds to obtain equivalent information. (was: This removes the deprecated DistributedFileSystem#getFileBlockStorageLocations API used for getting VolumeIds of block replicas. Instead, use BlockLocation#getStorageIds to get very similar information.) Status: Resolved (was: Patch Available) Committed to trunk, thanks again for reviewing Eddy. Remove deprecated BlockStorageLocation APIs --- Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 3.0.0 Attachments: HDFS-8895.001.patch HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8862) BlockManager#excessReplicateMap should use a HashMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700577#comment-14700577 ] Yi Liu commented on HDFS-8862: -- One more discussion, do you think it's worth to extend java HashMap and implement the {{shrink}}? Since it's better to have the shrinked HashMap in some places. From my point of review, I think it's worth and for quick glance, seems few code is needed. BlockManager#excessReplicateMap should use a HashMap Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8826) Balancer may not move blocks efficiently in some cases
[ https://issues.apache.org/jira/browse/HDFS-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700576#comment-14700576 ] Arpit Agarwal commented on HDFS-8826: - Also it may be a good idea to add a separate option to source from the most over-utilized DataNodes first so the administrator does not have to pass the source DNs manually. We can add it in a separate Jira. Balancer may not move blocks efficiently in some cases -- Key: HDFS-8826 URL: https://issues.apache.org/jira/browse/HDFS-8826 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8826_20150811.patch, h8826_20150816.patch Balancer is inefficient in the following case: || Datanode || Utilization || Rack || | D1 | 95% | A | | D2 | 30% | B | | D3, D4, D5 | 0% | B | The average utilization is 25% so that D2 is within 10% threshold. However, Balancer currently will first move blocks from D2 to D3, D4 and D5 since they are under the same rack. Then, it will move blocks from D1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700380#comment-14700380 ] Hadoop QA commented on HDFS-8895: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 28s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 8m 9s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 42s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 37s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 8s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 175m 21s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 34s | Tests passed in hadoop-hdfs-client. | | | | 227m 43s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.server.namenode.ha.TestLossyRetryInvocationHandler | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750398/HDFS-8895.001.patch | | Optional Tests | javac unit javadoc findbugs checkstyle | | git revision | trunk / e535e0f | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12010/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12010/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12010/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12010/console | This message was automatically generated. Remove deprecated BlockStorageLocation APIs --- Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-8895.001.patch HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700454#comment-14700454 ] Zhe Zhang commented on HDFS-8833: - [~jingzhao] [~walter.k.su] [~andrew.wang] Thanks for the discussions. Again, the non-empty directory change is really simple so I left it as-is (allowing setting EC policy on non-empty dirs). Let's continue that discussion and reach a consensus. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch, HDFS-8833-HDFS-7285-merge.01.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700456#comment-14700456 ] Zhe Zhang commented on HDFS-8909: - [~jingzhao] I just realized we said {{HDFS-7285-rebase}} instead of {{HDFS-7285-merge}} above. The two branches have similar {{HEAD}} so it doesn't make much difference. But in general, which branch do you prefer we use for other pending tasks? Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8792) BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory
[ https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700543#comment-14700543 ] Yi Liu commented on HDFS-8792: -- Thanks [~cmccabe] for the review and commit! BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory Key: HDFS-8792 URL: https://issues.apache.org/jira/browse/HDFS-8792 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch, HDFS-8792.003.patch {{LightWeightHashSet}} requires fewer memory than java hashset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8912) Implement ShrinkableHashMap extends java HashMap and use properly
[ https://issues.apache.org/jira/browse/HDFS-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8912: - Description: Currently {{LightWeightHashSet}} and {{LightWeightLinkedSet}} are used in hdfs, there are two advantages compared to java HashSet: one is the entry requires fewer memory, another is it's shrinkable. In real cluster, hdfs is a long running service, and {{set}} may become large at some time and may become small after that, so shrinking the {{set}} when size hits the shrink threshold is necessary, it can improve the NN memory. Same situation for {{map}}, some HashMap used in BlockManager (e.g., the hashmap in CorruptReplicasMap), it's better to be shrinkable. I think it's worth to implement ShrinkableHashMap extends the java HashMap, for quick glance, seems few code is needed. was: Currently {{LightWeightHashSet}} and {{LightWeightLinkedSet}} are used in hdfs, there are two advantages compared to java HashSet: one is the entry requires fewer memory, another is it's shrinkable. In real cluster, hdfs is a long running service, and {{set}} may become very large at some time and may become small after that, so shrinking the {{set}} when size hits the shrink threshold is necessary, it can improve the NN memory. Same situation for {{map}}, some HashMap used in BlockManager (e.g., the hashmap in CorruptReplicasMap), it's better to be shrinkable. I think it's worth to implement ShrinkableHashMap extends the java HashMap, for quick glance, seems few code is needed. Implement ShrinkableHashMap extends java HashMap and use properly - Key: HDFS-8912 URL: https://issues.apache.org/jira/browse/HDFS-8912 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Currently {{LightWeightHashSet}} and {{LightWeightLinkedSet}} are used in hdfs, there are two advantages compared to java HashSet: one is the entry requires fewer memory, another is it's shrinkable. In real cluster, hdfs is a long running service, and {{set}} may become large at some time and may become small after that, so shrinking the {{set}} when size hits the shrink threshold is necessary, it can improve the NN memory. Same situation for {{map}}, some HashMap used in BlockManager (e.g., the hashmap in CorruptReplicasMap), it's better to be shrinkable. I think it's worth to implement ShrinkableHashMap extends the java HashMap, for quick glance, seems few code is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8792) BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory
[ https://issues.apache.org/jira/browse/HDFS-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700333#comment-14700333 ] Hudson commented on HDFS-8792: -- FAILURE: Integrated in Hadoop-trunk-Commit #8314 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8314/]) HDFS-8792. BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory (Yi Liu via Colin P. McCabe) (cmccabe: rev c77bd6af16cbc26f88a2c6d8220db83a3e1caa2c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestLightWeightHashSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/LightWeightHashSet.java BlockManager#postponedMisreplicatedBlocks should use a LightWeightHashSet to save memory Key: HDFS-8792 URL: https://issues.apache.org/jira/browse/HDFS-8792 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8792.001.patch, HDFS-8792.002.patch, HDFS-8792.003.patch {{LightWeightHashSet}} requires fewer memory than java hashset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8845) DiskChecker should not traverse the entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8845: --- Summary: DiskChecker should not traverse the entire tree (was: DiskChecker should not traverse entire tree) DiskChecker should not traverse the entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8845) DiskChecker should not traverse the entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700353#comment-14700353 ] Andrew Wang commented on HDFS-8845: --- Yea sounds good to me too, Colin explained to me offline that the BlockScanner processes suspect blocks like these first, so anything that would get caught by a path-checker will get quickly caught by the BlockScanner. Thanks all. DiskChecker should not traverse the entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Fix For: 2.8.0 Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700556#comment-14700556 ] Walter Su commented on HDFS-8909: - I'm afraid the patch makes the earlier branch commits unable to rebase. To [~zhz]: The patch should be squashed together with others. Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8909.000.patch HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8880) NameNode metrics logging
[ https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700606#comment-14700606 ] Hudson commented on HDFS-8880: -- FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8315/]) HDFS-8880. NameNode metrics logging. (Arpit Agarwal) (arp: rev a88f31ebf3433392419127816f168136de0a9e77) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetricsLogger.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/util/MBeans.java NameNode metrics logging Key: HDFS-8880 URL: https://issues.apache.org/jira/browse/HDFS-8880 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log The NameNode can periodically log metrics to help debugging when the cluster is not setup with another metrics monitoring scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8845) DiskChecker should not traverse the entire tree
[ https://issues.apache.org/jira/browse/HDFS-8845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700604#comment-14700604 ] Hudson commented on HDFS-8845: -- FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8315/]) HDFS-8845. DiskChecker should not traverse the entire tree (Chang Li via Colin P. McCabe) (cmccabe: rev ec183faadcf7edaf432aca3b25d24215d505c2ec) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java DiskChecker should not traverse the entire tree --- Key: HDFS-8845 URL: https://issues.apache.org/jira/browse/HDFS-8845 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Fix For: 2.8.0 Attachments: HDFS-8845.patch DiskChecker should not traverse entire tree because it's causing heavy disk load on checkDiskError() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8278) HDFS Balancer should consider remaining storage % when checking for under-utilized machines
[ https://issues.apache.org/jira/browse/HDFS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700607#comment-14700607 ] Hudson commented on HDFS-8278: -- FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8315/]) HDFS-8278. When computing max-size-to-move in Balancer, count only the storage with remaining = default block size. (szetszwo: rev 51a00964da0e399718d1cec25ff692a32d7642b7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS Balancer should consider remaining storage % when checking for under-utilized machines --- Key: HDFS-8278 URL: https://issues.apache.org/jira/browse/HDFS-8278 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Tsz Wo Nicholas Sze Fix For: 2.8.0 Attachments: h8278_20150817.patch DFS balancer mistakenly identifies a node with very little storage space remaining as an underutilized node and tries to move large amounts of data to that particular node. All these block moves fail to execute successfully, as the % utilization is less relevant than the dfs remaining storage on that node. {code} 15/04/24 04:25:55 INFO balancer.Balancer: 0 over-utilized: [] 15/04/24 04:25:55 INFO balancer.Balancer: 1 underutilized: [172.19.1.46:50010:DISK] 15/04/24 04:25:55 INFO balancer.Balancer: Need to move 47.68 GB to make the cluster balanced. 15/04/24 04:25:55 INFO balancer.Balancer: Decided to move 413.08 MB bytes from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK 15/04/24 04:25:55 INFO balancer.Balancer: Will move 413.08 MB in this iteration 15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed {code} The machine in concern is under-full when it comes to the BP utilization, but has very little free space available for blocks. {code} Decommission Status : Normal Configured Capacity: 3826907185152 (3.48 TB) DFS Used: 2817262833664 (2.56 TB) Non DFS Used: 1000621305856 (931.90 GB) DFS Remaining: 9023045632 (8.40 GB) DFS Used%: 73.62% DFS Remaining%: 0.24% Configured Cache Capacity: 8589934592 (8 GB) Cache Used: 0 (0 B) Cache Remaining: 8589934592 (8 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 3 Last contact: Fri Apr 24 04:28:36 PDT 2015 {code} The machine has 0.40 Gb of non-RAM storage available on that node, so it is futile to attempt to move any blocks to that particular machine. This is a similar concern when a machine loses disks, since the comparisons of utilization always compare percentages per-node. Even that scenario needs to cap data movement to that node to the DFS Remaining % variable. Trying to move any more data than that to a given node will always fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8862) BlockManager#excessReplicateMap should use a HashMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700603#comment-14700603 ] Hudson commented on HDFS-8862: -- FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8315/]) HDFS-8862. BlockManager#excessReplicateMap should use a HashMap. (yliu) (yliu: rev 71566e23820d33e0110ca55eded3299735e970b9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt BlockManager#excessReplicateMap should use a HashMap Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700605#comment-14700605 ] Hudson commented on HDFS-8895: -- FAILURE: Integrated in Hadoop-trunk-Commit #8315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8315/]) HDFS-8895. Remove deprecated BlockStorageLocation APIs. (wang: rev eee4d716b48074825e1afcd9c74038a393ddeb69) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/VolumeId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/BlockStorageLocation.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientDatanodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientDatanodeProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestVolumeId.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsBlocksMetadata.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/HdfsVolumeId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockStorageLocationUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java Remove deprecated BlockStorageLocation APIs --- Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 3.0.0 Attachments: HDFS-8895.001.patch HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
[ https://issues.apache.org/jira/browse/HDFS-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700325#comment-14700325 ] Hadoop QA commented on HDFS-8908: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 5m 54s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 6s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 175m 2s | Tests failed in hadoop-hdfs. | | | | 195m 10s | | \\ \\ || Reason || Tests || | Failed build | hadoop-hdfs | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750861/h8908_20150817.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / e535e0f | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12009/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12009/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12009/console | This message was automatically generated. TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode -- Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8908_20150817.patch See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700503#comment-14700503 ] Anu Engineer commented on HDFS-8911: [~andrew.wang] Thanks for letting me know. I will fix that. NameNode Metric : Add WAL counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8911) NameNode Metric : Add Editlog counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8911: --- Summary: NameNode Metric : Add Editlog counters as a JMX metric (was: NameNode Metric : Add WAL counters as a JMX metric) NameNode Metric : Add Editlog counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8911) NameNode Metric : Add Editlog counters as a JMX metric
[ https://issues.apache.org/jira/browse/HDFS-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8911: --- Description: Today we log editlog metrics in the log. This JIRA proposes to expose those metrics via JMX. (was: Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX.) NameNode Metric : Add Editlog counters as a JMX metric -- Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8911.001.patch Today we log editlog metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8862) BlockManager#excessReplicateMap should use a HashMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700577#comment-14700577 ] Yi Liu edited comment on HDFS-8862 at 8/18/15 1:39 AM: --- One more discussion, do you think it's worth to extend java HashMap and implement the {{shrink}}? Since it's better to have the shrinkable HashMap in some places. From my point of review, I think it's worth and for quick glance, seems few code is needed. was (Author: hitliuyi): One more discussion, do you think it's worth to extend java HashMap and implement the {{shrink}}? Since it's better to have the shrinked HashMap in some places. From my point of review, I think it's worth and for quick glance, seems few code is needed. BlockManager#excessReplicateMap should use a HashMap Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Fix For: 2.8.0 Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700610#comment-14700610 ] Walter Su commented on HDFS-8909: - bq. Instead of keeping doing git rebase, maybe we should switch to git merge now. We can skip HDFS-8801 when merging trunk changes. bq. a small change in trunk can cause conflicts for rebasing large number of commits in the feature branch --- quote from Jing Zhao \[jing.apa...@gmail.com\] in common-dev mailing list. Totally agree. HDFS-8801 is an example. HDFS-8909 just tries to merge some part from trunk to branch, which is no different from 'git merge'. Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8909.000.patch HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8912) Implement ShrinkableHashMap extends java HashMap and use properly
Yi Liu created HDFS-8912: Summary: Implement ShrinkableHashMap extends java HashMap and use properly Key: HDFS-8912 URL: https://issues.apache.org/jira/browse/HDFS-8912 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Currently {{LightWeightHashSet}} and {{LightWeightLinkedSet}} are used in hdfs, there are two advantages compared to java HashSet: one is the entry requires fewer memory, another is it's shrinkable. In real cluster, hdfs is a long running service, and {{set}} may become very large at some time and may become small after that, so shrinking the {{set}} when size hits the shrink threshold is necessary, it can improve the NN memory. Same situation for {{map}}, some HashMap used in BlockManager (e.g., the hashmap in CorruptReplicasMap), it's better to be shrinkable. I think it's worth to implement ShrinkableHashMap extends the java HashMap, for quick glance, seems few code is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700639#comment-14700639 ] Hadoop QA commented on HDFS-8823: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 24s | The applied patch generated 7 new checkstyle issues (total was 649, now 651). | | {color:green}+1{color} | whitespace | 0m 7s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 37s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 3s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 175m 5s | Tests failed in hadoop-hdfs. | | | | 219m 20s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.TestAppendSnapshotTruncate | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750885/HDFS-8823.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / ec183fa | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12014/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12014/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12014/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12014/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12014/console | This message was automatically generated. Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch, HDFS-8823.002.patch, HDFS-8823.003.patch, HDFS-8823.004.patch, HDFS-8823.005.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8862) Improve BlockManager#excessReplicateMap
[ https://issues.apache.org/jira/browse/HDFS-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700352#comment-14700352 ] Colin Patrick McCabe commented on HDFS-8862: I guess the set of datanodes is not going to shrink that much over the life of the cluster. So the fact that this data structure can't shrink should be OK. We may want to look into whether that {{LightWeightLinkedSetBlockInfo}} can shrink... but that is outside the scope of this JIRA. +1. Improve BlockManager#excessReplicateMap --- Key: HDFS-8862 URL: https://issues.apache.org/jira/browse/HDFS-8862 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8862.001.patch Per [~cmccabe]'s comments in HDFS-8792, this JIRA is to discuss improving {{BlockManager#excessReplicateMap}}. That's right HashMap don't ever shrink when elements are removed, but TreeMap entry needs to store more (memory) references (left, right, parent) than HashMap entry (only one reference next), even when there is element removing and cause some entry empty, the empty HashMap entry is just a {{null}} reference (4 bytes), so they are close at this point. On the other hand, the key of {{excessReplicateMap}} is datanode uuid, so the entries number is almost fixed, so HashMap memory is good than TreeMap memory in this case. I think the most important is the search/insert/remove performance, HashMap is absolutely better than TreeMap. Because we don't need to sort, we should use HashMap instead of TreeMap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing
[ https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700361#comment-14700361 ] Colin Patrick McCabe commented on HDFS-8846: Hi [~zhz], It seems that you left out the binary changes: {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-252-dfs-dir.tgz b/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-252-dfs-dir.tgz new file mode 100644 index 000..2aaab18 Binary files /dev/null and b/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-252-dfs-dir.tgz differ {code} You should create the patch with {{\-\-binary}} so that these are included Create edit log files with old layout version for upgrade testing - Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8846.00.patch Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Status: Open (was: Patch Available) createNonRecursive support needed in WebHdfsFileSystem to support HBase --- Key: HDFS-8435 URL: https://issues.apache.org/jira/browse/HDFS-8435 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Affects Versions: 2.6.0 Reporter: Vinoth Sathappan Assignee: Jakob Homan Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, HDFS-8435.002.patch, HDFS-8435.003.patch The WebHdfsFileSystem implementation doesn't support createNonRecursive. HBase extensively depends on that for proper functioning. Currently, when the region servers are started over web hdfs, they crash due with - createNonRecursive unsupported for this filesystem class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700589#comment-14700589 ] Jing Zhao commented on HDFS-8909: - Instead of keeping doing git rebase, maybe we should switch to git merge now. We can skip HDFS-8801 when merging trunk changes. Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8909.000.patch HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8911) NameNode Metric : Add WAL counters as a JMX metric
Anu Engineer created HDFS-8911: -- Summary: NameNode Metric : Add WAL counters as a JMX metric Key: HDFS-8911 URL: https://issues.apache.org/jira/browse/HDFS-8911 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Today we log Write Ahead Log metrics in the log. This JIRA proposes to expose those metrics via JMX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8846) Create edit log files with old layout version for upgrade testing
[ https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8846: Attachment: HDFS-8846.01.patch Thanks for the good catch Colin! Updating the patch with the binary diff. Create edit log files with old layout version for upgrade testing - Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8846.00.patch, HDFS-8846.01.patch Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700481#comment-14700481 ] Jing Zhao commented on HDFS-8909: - I think maybe to save us time from now on we can use git merge to merge trunk changes into the ec feature branch. Thus either feature branch is ok to me. Which one do you prefer? Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8909) Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature
[ https://issues.apache.org/jira/browse/HDFS-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8909: Attachment: HDFS-8909.000.patch Patch against HDFS-7285-REBASE branch. Erasure coding: update BlockInfoContiguousUC and BlockInfoStripedUC to use BlockUnderConstructionFeature Key: HDFS-8909 URL: https://issues.apache.org/jira/browse/HDFS-8909 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8909.000.patch HDFS-8801 converts {{BlockInfoUC}} as a feature. We should consolidate {{BlockInfoContiguousUC}} and {{BlockInfoStripedUC}} logics to use this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8880) NameNode metrics logging
[ https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8880: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: (was: 2.8.0) Status: Resolved (was: Patch Available) Committed to trunk and branch-2. NameNode metrics logging Key: HDFS-8880 URL: https://issues.apache.org/jira/browse/HDFS-8880 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log The NameNode can periodically log metrics to help debugging when the cluster is not setup with another metrics monitoring scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Attachment: HDFS-8435.004.patch Fixed javadoc and whitespace complaints. Unfortunately, as we're adding a deprecated API to WebHDFS, the javac warning is unavoidable. Unit tests that failed/timed-out on Jenkins pass repeatedly for me; I consider them spurious. createNonRecursive support needed in WebHdfsFileSystem to support HBase --- Key: HDFS-8435 URL: https://issues.apache.org/jira/browse/HDFS-8435 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Affects Versions: 2.6.0 Reporter: Vinoth Sathappan Assignee: Jakob Homan Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, HDFS-8435.002.patch, HDFS-8435.003.patch, HDFS-8435.004.patch The WebHdfsFileSystem implementation doesn't support createNonRecursive. HBase extensively depends on that for proper functioning. Currently, when the region servers are started over web hdfs, they crash due with - createNonRecursive unsupported for this filesystem class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8435) createNonRecursive support needed in WebHdfsFileSystem to support HBase
[ https://issues.apache.org/jira/browse/HDFS-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8435: -- Status: Patch Available (was: Open) createNonRecursive support needed in WebHdfsFileSystem to support HBase --- Key: HDFS-8435 URL: https://issues.apache.org/jira/browse/HDFS-8435 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Affects Versions: 2.6.0 Reporter: Vinoth Sathappan Assignee: Jakob Homan Attachments: HDFS-8435-branch-2.7.001.patch, HDFS-8435.001.patch, HDFS-8435.002.patch, HDFS-8435.003.patch, HDFS-8435.004.patch The WebHdfsFileSystem implementation doesn't support createNonRecursive. HBase extensively depends on that for proper functioning. Currently, when the region servers are started over web hdfs, they crash due with - createNonRecursive unsupported for this filesystem class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1137) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1112) at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1088) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.init(ProtobufLogWriter.java:85) at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createWriter(HLogFactory.java:198) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8901: Summary: Use ByteBuffer in striping positional read (was: Use ByteBuffer/DirectByteBuffer in striping positional read) Use ByteBuffer in striping positional read -- Key: HDFS-8901 URL: https://issues.apache.org/jira/browse/HDFS-8901 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Native erasure coder prefers to direct ByteBuffer for performance consideration. To prepare for it, this change uses ByteBuffer through the codes in implementing striping position read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699538#comment-14699538 ] LINTE commented on HDFS-8897: - Thank you for your attention. In fact the balancer seems to look at 2 values : - fs.defaultFS - dfs.nameservices I had fs.defaultFS = hdfs://sandbox/ and dfs.nameservices = sandbox. I remove the / at the end of fs.defaultFS and it solves my error. Balancer should use only one of these values. Regards, Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message
[jira] [Created] (HDFS-8905) Refactor DFSInputStream#ReaderStrategy
Kai Zheng created HDFS-8905: --- Summary: Refactor DFSInputStream#ReaderStrategy Key: HDFS-8905 URL: https://issues.apache.org/jira/browse/HDFS-8905 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng DFSInputStream#ReaderStrategy family don't look very good. This refactors a little bit to make them make more sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
[ https://issues.apache.org/jira/browse/HDFS-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699582#comment-14699582 ] Kai Zheng commented on HDFS-8902: - Thanks [~hitliuyi] for the pointer! I didn't notice HDFS-8668 and I thought I could use it to relate all the related issues. My dirty code indicated that all the required change isn't trivial and would be better to break down into pieces, at least two or three. Yes I can take HDFS-8668 and see how it would be better to proceed. Sounds good? Thanks. Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) - Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699488#comment-14699488 ] LINTE commented on HDFS-8897: - Hi, Below a part of hdfs-site.xml, namenode HA is used, maybe the origin for this issue ? It was working fine with hdfs 2.6.0. --- property namedfs.nameservices/name valuesandbox/value /property property namedfs.ha.namenodes.sandbox/name valuenn1,nn2/value /property --- Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath)); fs.delete(idPath, true); } final FSDataOutputStream fsout = fs.create(idPath, false); // mark balancer idPath to be deleted during filesystem closure fs.deleteOnExit(idPath); if (write2IdFile) { fsout.writeBytes(InetAddress.getLocalHost().getHostName()); fsout.hflush(); } return fsout; } catch(RemoteException e) { if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){ return null; } else { throw e; } } } Regards -- This message was sent by Atlassian JIRA
[jira] [Created] (HDFS-8904) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping recovery on DataNode side
Kai Zheng created HDFS-8904: --- Summary: Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping recovery on DataNode side Key: HDFS-8904 URL: https://issues.apache.org/jira/browse/HDFS-8904 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping recovery in DataNode side like the work to do in client side, for performance consideration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8907) Configurable striping read buffer threhold
[ https://issues.apache.org/jira/browse/HDFS-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8907: Description: In striping input stream, positional read merges all the possible strips together, while stateful read reads a strip a time. The former is efficient but may incur too large chunk buffers for a client to afford, the latter is simple good but can be improved for better throughput. This would consolidate the both and use a configurable (new or existing) buffer threshold to control how it goes. Fixed chunk buffers for the read will be allocated accordingly and reused time and time, as existing stateful read does. The aligned strips to read a time may be computed against the threshold. (was: In striping input stream, positional read merges all the possible strips together, while stateful read reads a strip a time. The former is efficient but may incur too large chunk buffers for a client to afford, the latter is simple good but can be improved for better throughput. This would consolidate the both and use a configurable (new or existing) buffer threshold to control how it goes. Fixed chunk buffers for the read will be allocated accordingly and reused time and time. The aligned strips to read a time may be computed against the threshold. ) Configurable striping read buffer threhold -- Key: HDFS-8907 URL: https://issues.apache.org/jira/browse/HDFS-8907 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng In striping input stream, positional read merges all the possible strips together, while stateful read reads a strip a time. The former is efficient but may incur too large chunk buffers for a client to afford, the latter is simple good but can be improved for better throughput. This would consolidate the both and use a configurable (new or existing) buffer threshold to control how it goes. Fixed chunk buffers for the read will be allocated accordingly and reused time and time, as existing stateful read does. The aligned strips to read a time may be computed against the threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8907) Configurable striping read buffer threhold
Kai Zheng created HDFS-8907: --- Summary: Configurable striping read buffer threhold Key: HDFS-8907 URL: https://issues.apache.org/jira/browse/HDFS-8907 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng In striping input stream, positional read merges all the possible strips together, while stateful read reads a strip a time. The former is efficient but may incur too large chunk buffers for a client to afford, the latter is simple good but can be improved for better throughput. This would consolidate the both and use a configurable (new or existing) buffer threshold to control how it goes. Fixed chunk buffers for the read will be allocated accordingly and reused time and time. The aligned strips to read a time may be computed against the threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
[ https://issues.apache.org/jira/browse/HDFS-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699579#comment-14699579 ] Yi Liu edited comment on HDFS-8902 at 8/17/15 2:15 PM: --- Kai, HDFS-8668 will handle the java bytebuffer and direct buffer for all EC related encoding/decoding, so I think several new JIRAs related to this you just created are duplicated. If you want, you can take that JIRA, but I think we can handle them all in one JIRA, no need to create separate JIRAs. was (Author: hitliuyi): Kai, HDFS-8668 will handle the java bytebuffer and direct buffer for all EC related encoding/decoding, so I think several new JIRAs you just created are duplicated. If you want, you can take that JIRA, but I think we can handle them all in one JIRA, no need to create separate JIRAs. Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) - Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
[ https://issues.apache.org/jira/browse/HDFS-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699602#comment-14699602 ] Kai Zheng commented on HDFS-8902: - Thank you! Yes HDFS-8668 looks great to contain all the pieces. Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) - Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8901: Description: Native erasure coder prefers to direct ByteBuffer for performance consideration. To prepare for it, this change uses ByteBuffer through the codes in implementing striping position read. It will also fix avoiding unnecessary data copying between striping read chunk buffers and decode input buffers. (was: Native erasure coder prefers to direct ByteBuffer for performance consideration. To prepare for it, this change uses ByteBuffer through the codes in implementing striping position read. ) Use ByteBuffer in striping positional read -- Key: HDFS-8901 URL: https://issues.apache.org/jira/browse/HDFS-8901 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Native erasure coder prefers to direct ByteBuffer for performance consideration. To prepare for it, this change uses ByteBuffer through the codes in implementing striping position read. It will also fix avoiding unnecessary data copying between striping read chunk buffers and decode input buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8668) Erasure Coding: revisit buffer used for encoding and decoding.
[ https://issues.apache.org/jira/browse/HDFS-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8668: - Assignee: Kai Zheng (was: Yi Liu) Erasure Coding: revisit buffer used for encoding and decoding. -- Key: HDFS-8668 URL: https://issues.apache.org/jira/browse/HDFS-8668 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Kai Zheng For encoding and decoding buffers, currently some places use java heap ByteBuffer, some use direct byteBUffer, and some use java byte array. If the coder implementation is native, we should use direct ByteBuffer. This jira is to revisit all encoding/decoding buffers and improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
[ https://issues.apache.org/jira/browse/HDFS-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699587#comment-14699587 ] Yi Liu commented on HDFS-8902: -- Sure, I just assigned it to you, thanks for working on it. Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) - Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8668) Erasure Coding: revisit buffer used for encoding and decoding.
[ https://issues.apache.org/jira/browse/HDFS-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699599#comment-14699599 ] Kai Zheng commented on HDFS-8668: - Thanks [~hitliuyi] for the issue and assigning it to me. As some part isn't easy and we have to change to use ByteBuffer from bytes array first, some part couples with buffer pool stuffs, I break this whole consideration into smaller issues and will work on them. Erasure Coding: revisit buffer used for encoding and decoding. -- Key: HDFS-8668 URL: https://issues.apache.org/jira/browse/HDFS-8668 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Kai Zheng For encoding and decoding buffers, currently some places use java heap ByteBuffer, some use direct byteBUffer, and some use java byte array. If the coder implementation is native, we should use direct ByteBuffer. This jira is to revisit all encoding/decoding buffers and improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8897) Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ...
[ https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699513#comment-14699513 ] Rakesh R commented on HDFS-8897: My observation about the case is - Balancer is seeing two nameservice IDs but both are pointing to the same cluster, one with {{hdfs://sandbox/}} slash and other {{hdfs://sandbox}}. While running balancer it will establish NameNodeConnectors and internally creates the idFilePath {{balancer.id}} to prevent simultaneous balancer operations. Since both {{nameservice IDs}} are pointing to the same cluster, for the first connector {{balancer.id}} creation will be succeeded and then again tries to create the {{balancer.id}} for the second connector it sees idFilePath already exists and resulting in failure. IMHO, we could find the reason for two occurrences of the same cluster ID to understand it well, right? bq.It was working fine with hdfs 2.6.0. The validation to prevent the simultaneous balancing has modified in 2.7.1, thats the reason you are not seeing any problem with 2.6.0 version. Loadbalancer always exits with : java.io.IOException: Another Balancer is running.. Exiting ... Key: HDFS-8897 URL: https://issues.apache.org/jira/browse/HDFS-8897 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.7.1 Environment: Centos 6.6 Reporter: LINTE When balancer is launched, it should test if there is already a /system/balancer.id file in HDFS. When the file doesn't exist, the balancer don't want to run : 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] 15/08/14 16:35:12 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec java.io.IOException: Another Balancer is running.. Exiting ... Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds Looking at the audit log file when trying to run the balancer, the balancer create the /system/balancer.id and then delete it on exiting ... 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=create src=/system/balancer.id dst=nullperm=hdfs:hadoop:rw-r- proto=rpc 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo src=/system/balancer.id dst=nullperm=null proto=rpc 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true ugi=hdfs@SANDBOX.HADOOP (auth:KERBEROS) ip=/x.x.x.x cmd=delete src=/system/balancer.id dst=nullperm=null proto=rpc The error seems to be located in org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java The function checkAndMarkRunning return null even if the /system/balancer.id doesn't exist before entering this function; if it exists, then it is deleted and the balancer exit with the same error. private OutputStream checkAndMarkRunning() throws IOException { try { if (fs.exists(idPath)) { // try appending to it so that it will fail fast if another balancer is // running. IOUtils.closeStream(fs.append(idPath));
[jira] [Created] (HDFS-8901) Use ByteBuffer/DirectByteBuffer in striping positional read
Kai Zheng created HDFS-8901: --- Summary: Use ByteBuffer/DirectByteBuffer in striping positional read Key: HDFS-8901 URL: https://issues.apache.org/jira/browse/HDFS-8901 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Native erasure coder prefers to direct ByteBuffer for performance consideration. To prepare for it, this change uses ByteBuffer through the codes in implementing striping position read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
Kai Zheng created HDFS-8902: --- Summary: Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8903) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping write
Kai Zheng created HDFS-8903: --- Summary: Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping write Key: HDFS-8903 URL: https://issues.apache.org/jira/browse/HDFS-8903 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping write, for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8902) Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful)
[ https://issues.apache.org/jira/browse/HDFS-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699579#comment-14699579 ] Yi Liu commented on HDFS-8902: -- Kai, HDFS-8668 will handle the java bytebuffer and direct buffer for all EC related encoding/decoding, so I think several new JIRAs you just created are duplicated. If you want, you can take that JIRA, but I think we can handle them all in one JIRA, no need to create separate JIRAs. Uses ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful) - Key: HDFS-8902 URL: https://issues.apache.org/jira/browse/HDFS-8902 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng We would choose ByteBuffer on heap or direct ByteBuffer according to used erasure coder in striping read (position and stateful), for performance consideration. Pure Java implemented coder favors on heap one, though native coder likes more direct one, avoiding data copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8705) BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in all locales
[ https://issues.apache.org/jira/browse/HDFS-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699478#comment-14699478 ] Walter Su commented on HDFS-8705: - You can use {{StringUtils.toLowerCase(String)}} instead. BlockStoragePolicySuite uses equalsIgnoreCase for name lookup, won't work in all locales Key: HDFS-8705 URL: https://issues.apache.org/jira/browse/HDFS-8705 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.8.0 Reporter: Steve Loughran Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-8705.patch Looking at {{BlockStoragePolicySuite.getPolicy(name)}}, is using {{equalsIgnoreCase()}} to find a policy which matches a name. This will not work in all locales. It must use {{toLowerCase(Locale.ENGLISH).equals(name)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8906) Non Authenticated Data node Allowed to Join HDFS
John J. Howard created HDFS-8906: Summary: Non Authenticated Data node Allowed to Join HDFS Key: HDFS-8906 URL: https://issues.apache.org/jira/browse/HDFS-8906 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 0.20.2 Environment: CentOS 6.7 Reporter: John J. Howard Priority: Minor An attacker with network access to a Hadoop cluster can create a spoof datanode that the namenode will accept into the cluster without authentication, allowing the attacker to run MapReduce jobs on the cluster in order to steal data. The spoof datanode is created by adding the namenode RSA SSH public key to the known hosts directory, starting Hadoop services, setting the IP address to be the same as a legitimate node on the Hadoop cluster and sending the namenode a heartbeat message with an empty namespace ID. This will cause the namenode to think that the spoof datanode is a node that had previously crashed and lost its data. The namenode will then connect to the spoof datanode using its SSH credentials and start replicating data on the spoof datanode, incorporating the spoof datanode into the cluster. Once incorporated, the spoof node can start issuing MapReduce jobs to retrieve cluster data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700751#comment-14700751 ] Hadoop QA commented on HDFS-8833: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750930/HDFS-8833-HDFS-7285-merge.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / b57c9a3 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12017/console | This message was automatically generated. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch, HDFS-8833-HDFS-7285-merge.01.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8846) Create edit log files with old layout version for upgrade testing
[ https://issues.apache.org/jira/browse/HDFS-8846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700746#comment-14700746 ] Hadoop QA commented on HDFS-8846: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 6m 31s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 14s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 46s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 46s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 6s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 193m 14s | Tests failed in hadoop-hdfs. | | | | 215m 13s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestNameNodeMetricsLogger | | | hadoop.hdfs.TestHDFSTrash | | Timed out tests | org.apache.hadoop.cli.TestHDFSCLI | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750924/HDFS-8846.01.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12015/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12015/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12015/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12015/console | This message was automatically generated. Create edit log files with old layout version for upgrade testing - Key: HDFS-8846 URL: https://issues.apache.org/jira/browse/HDFS-8846 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8846.00.patch, HDFS-8846.01.patch Per discussion under HDFS-8480, we should create some edit log files with old layout version, to test whether they can be correctly handled in upgrades. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8906) Non Authenticated Data node Allowed to Join HDFS
[ https://issues.apache.org/jira/browse/HDFS-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699627#comment-14699627 ] Allen Wittenauer commented on HDFS-8906: Hadoop 0.20.2 had no (real) security features in it. This is the least of its problems: setting hadoop.job.ugi would allow anyone to connect as anyone else. This and other issues have since been fixed in subsequent versions of Hadoop. Given that 0.20.2 is over 5 years old at this point and unless there is something else, I'll be closing this as won't fix. Non Authenticated Data node Allowed to Join HDFS Key: HDFS-8906 URL: https://issues.apache.org/jira/browse/HDFS-8906 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 0.20.2 Environment: CentOS 6.7 Reporter: John J. Howard Priority: Minor Labels: security An attacker with network access to a Hadoop cluster can create a spoof datanode that the namenode will accept into the cluster without authentication, allowing the attacker to run MapReduce jobs on the cluster in order to steal data. The spoof datanode is created by adding the namenode RSA SSH public key to the known hosts directory, starting Hadoop services, setting the IP address to be the same as a legitimate node on the Hadoop cluster and sending the namenode a heartbeat message with an empty namespace ID. This will cause the namenode to think that the spoof datanode is a node that had previously crashed and lost its data. The namenode will then connect to the spoof datanode using its SSH credentials and start replicating data on the spoof datanode, incorporating the spoof datanode into the cluster. Once incorporated, the spoof node can start issuing MapReduce jobs to retrieve cluster data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build copy list in distcp
[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699664#comment-14699664 ] Yongjun Zhang commented on HDFS-8828: - Hi [~jingzhao], Thanks for the review and good comments! Some thoughts: Currently {{DistCpOptions}} is currently the only vehicle to pass info between different stages of distcp. To address your comment 1 2, we need to add something new to pass additional info (which is derived data, and can be hold by a new class) to pass between sync and copyListing stages. To do this, there are two choices: 1. Pass an object of this new class as a standalone parameter between stages, which require changing quite some method signatures, most of the places don't use this new parameter. 2. Put the object of this class as a member of DistCpOptions, so there is no need to change method signatures. We can create another new class {{DistCpDerivedInput}} to hold derived input, and put all derived data as members of {{DistCpDerivedInput}}. If we define DistCpOptions as only holding command line options, then then this choice is not perfect; however, if we define DistcpOptions as may-contain derived input too, then it's ok. Which choice you like better? or additional thoughts? Thanks. Utilize Snapshot diff report to build copy list in distcp - Key: HDFS-8828 URL: https://issues.apache.org/jira/browse/HDFS-8828 Project: Hadoop HDFS Issue Type: Improvement Components: distcp, snapshots Reporter: Yufei Gu Assignee: Yufei Gu Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, HDFS-8828.006.patch, HDFS-8828.007.patch Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs. HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8713) Convert DatanodeDescriptor to use SLF4J logging
[ https://issues.apache.org/jira/browse/HDFS-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699845#comment-14699845 ] Andrew Wang commented on HDFS-8713: --- Thanks for reminding me about this one Yi, I ran the timed out test locally successfully. Will commit to trunk and branch-2 based on Eddy's earlier +1. Convert DatanodeDescriptor to use SLF4J logging --- Key: HDFS-8713 URL: https://issues.apache.org/jira/browse/HDFS-8713 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.6-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Attachments: hdfs-8713.001.patch Let's convert this class to use SLF4J -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
Tsz Wo Nicholas Sze created HDFS-8908: - Summary: TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8713) Convert DatanodeDescriptor to use SLF4J logging
[ https://issues.apache.org/jira/browse/HDFS-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8713: -- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks again Eddy + Yi. Convert DatanodeDescriptor to use SLF4J logging --- Key: HDFS-8713 URL: https://issues.apache.org/jira/browse/HDFS-8713 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.6-alpha Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Fix For: 2.8.0 Attachments: hdfs-8713.001.patch Let's convert this class to use SLF4J -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8895) Remove deprecated BlockStorageLocation APIs
[ https://issues.apache.org/jira/browse/HDFS-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699899#comment-14699899 ] Andrew Wang commented on HDFS-8895: --- Kicking a rebuild since Jenkins ate the test output (?). [~eddyxu] you mind reviewing this one too since you looked at HDFS-8887? Remove deprecated BlockStorageLocation APIs --- Key: HDFS-8895 URL: https://issues.apache.org/jira/browse/HDFS-8895 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-8895.001.patch HDFS-8887 supercedes DistributedFileSystem#getFileBlockStorageLocations, so it can be removed from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699910#comment-14699910 ] Jing Zhao commented on HDFS-8833: - Yes, a conversion tool can be helpful. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699916#comment-14699916 ] Zhe Zhang commented on HDFS-7285: - [~vinayrpet] Sure, let's solicit more feedback. Your rebase doesn't cause any additional failures: [Jenkins results | https://builds.apache.org/job/Hadoop-HDFS-7285-REBASE/]. Did you run Jenkins before posting the branch? Otherwise, a nice ace :) Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: Consolidated-20150707.patch, Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, HDFS-7285-merge-consolidated-trunk-01.patch, HDFS-7285-merge-consolidated.trunk.03.patch, HDFS-7285-merge-consolidated.trunk.04.patch, HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6955) DN should reserve disk space for a full block when creating tmp files
[ https://issues.apache.org/jira/browse/HDFS-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699926#comment-14699926 ] kanaka kumar avvaru commented on HDFS-6955: --- Updated patch for white space errors and a checkstyle issue {{FsVolumeImpl.java:462:69: 'reserved' hides a field.}}. DN should reserve disk space for a full block when creating tmp files - Key: HDFS-6955 URL: https://issues.apache.org/jira/browse/HDFS-6955 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Assignee: kanaka kumar avvaru Attachments: HDFS-6955-01.patch, HDFS-6955-02.patch HDFS-6898 is introducing disk space reservation for RBW files to avoid running out of disk space midway through block creation. This Jira is to introduce similar reservation for tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6407) Add sorting and pagination in the datanode tab of the NN Web UI
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6407: - Issue Type: Improvement (was: Bug) Add sorting and pagination in the datanode tab of the NN Web UI --- Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Haohui Mai Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting table.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8278) HDFS Balancer should consider remaining storage % when checking for under-utilized machines
[ https://issues.apache.org/jira/browse/HDFS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8278: -- Attachment: h8278_20150817.patch h8278_20150817.patch: counts only the storage with remaining storage = default block size. I also removes the use of threshold in computeMaxSize2Move(..). HDFS Balancer should consider remaining storage % when checking for under-utilized machines --- Key: HDFS-8278 URL: https://issues.apache.org/jira/browse/HDFS-8278 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Affects Versions: 2.8.0 Reporter: Gopal V Assignee: Tsz Wo Nicholas Sze Attachments: h8278_20150817.patch DFS balancer mistakenly identifies a node with very little storage space remaining as an underutilized node and tries to move large amounts of data to that particular node. All these block moves fail to execute successfully, as the % utilization is less relevant than the dfs remaining storage on that node. {code} 15/04/24 04:25:55 INFO balancer.Balancer: 0 over-utilized: [] 15/04/24 04:25:55 INFO balancer.Balancer: 1 underutilized: [172.19.1.46:50010:DISK] 15/04/24 04:25:55 INFO balancer.Balancer: Need to move 47.68 GB to make the cluster balanced. 15/04/24 04:25:55 INFO balancer.Balancer: Decided to move 413.08 MB bytes from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK 15/04/24 04:25:55 INFO balancer.Balancer: Will move 413.08 MB in this iteration 15/04/24 04:25:55 WARN balancer.Dispatcher: Failed to move blk_1078689321_1099517353638 with size=131146 from 172.19.1.52:50010:DISK to 172.19.1.46:50010:DISK through 172.19.1.53:50010: Got error, status message opReplaceBlock BP-942051088-172.18.1.41-1370508013893:blk_1078689321_1099517353638 received exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The volume with the most available space (=225042432 B) is less than the block size (=268435456 B)., block move is failed {code} The machine in concern is under-full when it comes to the BP utilization, but has very little free space available for blocks. {code} Decommission Status : Normal Configured Capacity: 3826907185152 (3.48 TB) DFS Used: 2817262833664 (2.56 TB) Non DFS Used: 1000621305856 (931.90 GB) DFS Remaining: 9023045632 (8.40 GB) DFS Used%: 73.62% DFS Remaining%: 0.24% Configured Cache Capacity: 8589934592 (8 GB) Cache Used: 0 (0 B) Cache Remaining: 8589934592 (8 GB) Cache Used%: 0.00% Cache Remaining%: 100.00% Xceivers: 3 Last contact: Fri Apr 24 04:28:36 PDT 2015 {code} The machine has 0.40 Gb of non-RAM storage available on that node, so it is futile to attempt to move any blocks to that particular machine. This is a similar concern when a machine loses disks, since the comparisons of utilization always compare percentages per-node. Even that scenario needs to cap data movement to that node to the DFS Remaining % variable. Trying to move any more data than that to a given node will always fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8883) NameNode Metrics : Add FSNameSystem lock Queue Length
[ https://issues.apache.org/jira/browse/HDFS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8883: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~anu] for the contribution. I've committed the patch to trunk and branch-2. NameNode Metrics : Add FSNameSystem lock Queue Length - Key: HDFS-8883 URL: https://issues.apache.org/jira/browse/HDFS-8883 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: HDFS-8883.001.patch FSNameSystemLock can have contention when NameNode is under load. This patch adds LockQueueLength -- the number of threads waiting on FSNameSystemLock -- as a metric in NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
[ https://issues.apache.org/jira/browse/HDFS-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8908: -- Attachment: h8908_20150817.patch TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode -- Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8908_20150817.patch See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8801) Convert BlockInfoUnderConstruction as a feature
[ https://issues.apache.org/jira/browse/HDFS-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699936#comment-14699936 ] Haohui Mai commented on HDFS-8801: -- The current approach requires exposing {{setGenerationStampAndVerifyReplicas()}} and {{commitBlock()}} into the {{BlockInfo}} class. It's not ideal and it requires further refactoring, but I think given the scope of the changes it's okay to address it in a separate jira. +1. Convert BlockInfoUnderConstruction as a feature --- Key: HDFS-8801 URL: https://issues.apache.org/jira/browse/HDFS-8801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Assignee: Jing Zhao Attachments: HDFS-8801.000.patch Per discussion under HDFS-8499, with the erasure coding feature, there will be 4 types of {{BlockInfo}} forming a multi-inheritance: {{complete+contiguous}}, {{complete+striping}}, {{UC+contiguous}}, {{UC+striped}}. We had the same challenge with {{INodeFile}} and the solution was building feature classes like {{FileUnderConstructionFeature}}. This JIRA aims to implement the same idea on {{BlockInfo}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
[ https://issues.apache.org/jira/browse/HDFS-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8908: -- Attachment: (was: h8908_20150817.patch) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode -- Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8908_20150817.patch See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6407) new namenode UI, lost ability to sort columns in datanode tab
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699893#comment-14699893 ] Benoy Antony commented on HDFS-6407: It will be good to specify the version information of the datatables component. This will help in maintaining this functionality. For other Js components, the version information is included in the file name. new namenode UI, lost ability to sort columns in datanode tab - Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Haohui Mai Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting table.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699906#comment-14699906 ] Zhe Zhang commented on HDFS-7285: - Thanks Walter for the questions. bq. Will you rebase HDFS-7285 weekly after squashes it? Yes that's my plan. Actually since we are close to merging I plan to rebase more frequently. Currently I'm rebasing HDFS-7285-merge daily bq. Should we squash HDFS-8854 and HDFS-8833 as well? To make the future rebasing easier, also to try to avoid a second squash. [~andrew.wang] has started a discussion thread on common-dev regarding the rebase workflow. I'll wait until we reach a consensus there before squashing the 2 new big patches. bq. The commit message is inaccurate because HDFS-7285 is not finished yet. Good point. I'll reword it in next rebase. Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: Consolidated-20150707.patch, Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, HDFS-7285-merge-consolidated-trunk-01.patch, HDFS-7285-merge-consolidated.trunk.03.patch, HDFS-7285-merge-consolidated.trunk.04.patch, HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6407) Add sorting and pagination in the datanode tab of the NN Web UI
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6407: - Summary: Add sorting and pagination in the datanode tab of the NN Web UI (was: new namenode UI, lost ability to sort columns in datanode tab) Add sorting and pagination in the datanode tab of the NN Web UI --- Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Haohui Mai Priority: Critical Labels: BB2015-05-TBR Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting table.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6407) Add sorting and pagination in the datanode tab of the NN Web UI
[ https://issues.apache.org/jira/browse/HDFS-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6407: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks all for the reviews and the contribution. Add sorting and pagination in the datanode tab of the NN Web UI --- Key: HDFS-6407 URL: https://issues.apache.org/jira/browse/HDFS-6407 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Nathan Roberts Assignee: Haohui Mai Priority: Critical Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: 002-datanodes-sorted-capacityUsed.png, 002-datanodes.png, 002-filebrowser.png, 002-snapshots.png, HDFS-6407-002.patch, HDFS-6407-003.patch, HDFS-6407.008.patch, HDFS-6407.009.patch, HDFS-6407.010.patch, HDFS-6407.011.patch, HDFS-6407.4.patch, HDFS-6407.5.patch, HDFS-6407.6.patch, HDFS-6407.7.patch, HDFS-6407.patch, browse_directory.png, datanodes.png, snapshots.png, sorting 2.png, sorting table.png old ui supported clicking on column header to sort on that column. The new ui seems to have dropped this very useful feature. There are a few tables in the Namenode UI to display datanodes information, directory listings and snapshots. When there are many items in the tables, it is useful to have ability to sort on the different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8908) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode
[ https://issues.apache.org/jira/browse/HDFS-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8908: -- Status: Patch Available (was: Open) TestAppendSnapshotTruncate may fail with IOException: Failed to replace a bad datanode -- Key: HDFS-8908 URL: https://issues.apache.org/jira/browse/HDFS-8908 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h8908_20150817.patch See https://builds.apache.org/job/PreCommit-HDFS-Build/12005/testReport/org.apache.hadoop.hdfs/TestAppendSnapshotTruncate/testAST/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)