[jira] [Commented] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251256#comment-14251256 ] Zhanwei Wang commented on HDFS-7018: Hi [~wheat9] and [~cmccabe] Would you please review the new patch? Thanks. > Implement C interface for libhdfs3 > -- > > Key: HDFS-7018 > URL: https://issues.apache.org/jira/browse/HDFS-7018 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch > > > Implement C interface for libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7018) Implement C interface for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7018: --- Attachment: HDFS-7018-pnative.002.patch > Implement C interface for libhdfs3 > -- > > Key: HDFS-7018 > URL: https://issues.apache.org/jira/browse/HDFS-7018 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch > > > Implement C interface for libhdfs3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7527: Attachment: HDFS-7527.002.patch > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang > Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test: > org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport > (2014-12-13 10:32:27) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport > (2014-12-13 03:30:01) > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport > (2014-12-11 03:30:01) > Failed test: > org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect > Failed test: > org.apache.hadoop.hdfs.server.n
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251244#comment-14251244 ] Binglin Chang commented on HDFS-7527: - Make sense, looks like the behavior is changed at some point. Update the patch to partially support dfs.datanode.hostname(if it is an ip address, or the hostname resolve to a proper ip address). And add change to test to properly wait for the excluded datanode become back again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node count). Note that too fully restore the old behavior requires a lot more changes, currently I only made minimal changes. > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang > Attachments: HDFS-7527.001.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName > Failed test: > org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect > Failed test: > org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport > (2014-12-13 10:32:27) > Failed test: > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegist
[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures
[ https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251147#comment-14251147 ] Akira AJISAKA commented on HDFS-7373: - +1 (binding). > Clean up temporary files after fsimage transfer failures > > > Key: HDFS-7373 > URL: https://issues.apache.org/jira/browse/HDFS-7373 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-7373.patch > > > When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in > each storage directory. If the size of name space is large, these files can > take up quite a bit of space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251112#comment-14251112 ] Hadoop QA commented on HDFS-7530: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687861/HDFS-7530.003.patch against trunk revision 9937eef. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9068//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9068//console This message is automatically generated. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, > HDFS-7530.003.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251106#comment-14251106 ] Colin Patrick McCabe commented on HDFS-7443: It appears that the old software could sometimes create a duplicate copy of the same block in two different {{subdir}} folders on the same volume. In all the cases in which we've seen this, the block files were identical. Two files, both for the same block id, in separate directories. This appears to be a bug, since obviously we don't want to store the same block twice on the same volume. This causes the {{EEXIST}} problem on upgrade, since the new block layout only has one place where each block ID can go. Unfortunately, the hardlink code doesn't print the name of the file which caused the problem, making diagnosis more difficult than it should be. One easy way around this is to check for duplicate block IDs on each volume before upgrading, and manually remove the duplicates. We should also consider logging an error message and continuing the upgrade process when we encounter this. [~kihwal], I'm not sure why, in your case, the DataNode retried the hard link process multiple times. I'm also not sure why you ended up with a jumbled {{previous.tmp}} directory. When we reproduced this on CDH5.2, we did not have that problem, for whatever reason. > Datanode upgrade to BLOCKID_BASED_LAYOUT sometimes fails > > > Key: HDFS-7443 > URL: https://issues.apache.org/jira/browse/HDFS-7443 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Kihwal Lee >Priority: Blocker > > When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of > datanodes were not coming up. They treid data file layout upgrade for > BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. > All failures were caused by {{NativeIO.link()}} throwing IOException saying > {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon > retried when the block pool initialization was retried whenever > {{BPServiceActor}} was registering with the namenode. After many retries, > datenodes terminated. This would leave {{previous.tmp}} and {{current}} with > no {{VERSION}} file in the block pool slice storage directory. > Although {{previous.tmp}} contained the old {{VERSION}} file, the content was > in the new layout and the subdirs were all newly created ones. This > shouldn't have happened because the upgrade-recovery logic in {{Storage}} > removes {{current}} and renames {{previous.tmp}} to {{current}} before > retrying. All successfully upgraded volumes had old state preserved in their > {{previous}} directory. > In summary there were two observed issues. > - Upgrade failure with {{link()}} failing with {{EEXIST}} > - {{previous.tmp}} contained not the content of original {{current}}, but > half-upgraded one. > We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251093#comment-14251093 ] Hadoop QA commented on HDFS-7543: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687868/HDFS-7543.000.patch against trunk revision 9937eef. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9067//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9067//console This message is automatically generated. > Avoid path resolution when getting FileStatus for audit logs > > > Key: HDFS-7543 > URL: https://issues.apache.org/jira/browse/HDFS-7543 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7543.000.patch > > > The current API of {{getAuditFileInfo()}} forces parsing the paths again when > generating the {{HdfsFileStatus}} for audit logs. This jira proposes to > avoid the repeated parsing by passing the {{INodesInPath}} object instead of > the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout
[ https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251085#comment-14251085 ] Suresh Srinivas commented on HDFS-7539: --- I think NN jvm parameters should be configured correctly. That said, DNs continue to reconnect, right? > Namenode can't leave safemode because of Datanodes' IPC socket timeout > -- > > Key: HDFS-7539 > URL: https://issues.apache.org/jira/browse/HDFS-7539 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.5.1 > Environment: 1 master, 1 secondary and 128 slaves, each node has x24 > cores, 48GB memory. fsimage is 4GB. >Reporter: hoelog > > During the starting of namenode, data nodes seem waiting namenode's response > through IPC to register block pools. > here is DN's log - > {code} > 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Acknowledging ACTIVE Namenode Block pool > BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid > 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 > {code} > But namenode is too busy to responde it, and datanodes occur socket timeout - > default is 1 minute. > {code} > 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > IOException in offerService > java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to > NN.x.com:9000 failed on socket timeout exception: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details > see: http://wiki.apache.org/hadoop/SocketTimeout > {code} > same events repeat and eventually NN drops most connecting trials from DN. So > NN can't leave safemode. > DN's log - > {code} > 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > IOException in offerService > java.io.IOException: failed on local exception java.io.ioexception connection > reset by peer > {code} > There is no troubles in the network, configuration or servers. I think NN is > too busy to respond to DN in a minute. > I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that > was helpful for my cluster. > {code} > > ipc.ping.interval > 90 > > {code} > In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request. > It will be helpful if there is more elegant solution. > {code} > 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Acknowledging ACTIVE Namenode Block pool > BP-877672386-10.x.x.143-1412666752827 (Datanode Uuid > c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to NN/10.x.x.143:9000 > 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and > 193738 msecs for RPC and NN processing. Got back commands > org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11 > 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827 > 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity > for map BlockMap > 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type = > 64-bit > 2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 > GB = 18.2 MB > 2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity = > 2^21 = 2097152 entries > 2014-12-16 23:31:32,046 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block > Verification Scanner initialized with interval 504 hours for block pool > BP-877672386-10.114.130.143-1412666752827 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout
[ https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251065#comment-14251065 ] hoelog commented on HDFS-7539: -- Actually NN hangs 1~2 minutes because of GC. This problem may not appear when NN have enough memory. > Namenode can't leave safemode because of Datanodes' IPC socket timeout > -- > > Key: HDFS-7539 > URL: https://issues.apache.org/jira/browse/HDFS-7539 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.5.1 > Environment: 1 master, 1 secondary and 128 slaves, each node has x24 > cores, 48GB memory. fsimage is 4GB. >Reporter: hoelog > > During the starting of namenode, data nodes seem waiting namenode's response > through IPC to register block pools. > here is DN's log - > {code} > 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Acknowledging ACTIVE Namenode Block pool > BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid > 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 > {code} > But namenode is too busy to responde it, and datanodes occur socket timeout - > default is 1 minute. > {code} > 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > IOException in offerService > java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to > NN.x.com:9000 failed on socket timeout exception: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details > see: http://wiki.apache.org/hadoop/SocketTimeout > {code} > same events repeat and eventually NN drops most connecting trials from DN. So > NN can't leave safemode. > DN's log - > {code} > 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > IOException in offerService > java.io.IOException: failed on local exception java.io.ioexception connection > reset by peer > {code} > There is no troubles in the network, configuration or servers. I think NN is > too busy to respond to DN in a minute. > I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that > was helpful for my cluster. > {code} > > ipc.ping.interval > 90 > > {code} > In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request. > It will be helpful if there is more elegant solution. > {code} > 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Acknowledging ACTIVE Namenode Block pool > BP-877672386-10.x.x.143-1412666752827 (Datanode Uuid > c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to NN/10.x.x.143:9000 > 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and > 193738 msecs for RPC and NN processing. Got back commands > org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11 > 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827 > 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity > for map BlockMap > 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type = > 64-bit > 2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 > GB = 18.2 MB > 2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity = > 2^21 = 2097152 entries > 2014-12-16 23:31:32,046 INFO > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block > Verification Scanner initialized with interval 504 hours for block pool > BP-877672386-10.114.130.143-1412666752827 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251036#comment-14251036 ] Hadoop QA commented on HDFS-7529: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687849/HDFS-7529.001.patch against trunk revision 0da1330. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9065//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9065//console This message is automatically generated. > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250973#comment-14250973 ] Hadoop QA commented on HDFS-7528: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687844/HDFS-7530.003.patch against trunk revision 316613b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9064//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9064//console This message is automatically generated. > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7528.000.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get
[ https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250970#comment-14250970 ] Hadoop QA commented on HDFS-7544: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687882/HDFS-7544.001.patch against trunk revision 3b173d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9069//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9069//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9069//console This message is automatically generated. > ChunkedArrayList: fix removal via iterator and implement get > > > Key: HDFS-7544 > URL: https://issues.apache.org/jira/browse/HDFS-7544 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7544.001.patch > > > ChunkedArrayList: implement removal via iterator and get. Previously, > calling remove on a ChunkedArrayList iterator would cause the returned size > to be incorrect later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250967#comment-14250967 ] Hadoop QA commented on HDFS-7542: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687843/HDFS-7542.001.patch against trunk revision 316613b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestRollingUpgradeRollback The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TesTests org.apache.hadoop.h> Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9063//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9063//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9063//console This message is automatically generated. > Add an option to DFSAdmin -safemode wait to ignore connection failures > -- > > Key: HDFS-7542 > URL: https://issues.apache.org/jira/browse/HDFS-7542 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 2.6.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Minor > Attachments: HDFS-7542.001.patch > > > Currently, the _dfsadmin -safemode wait_ command aborts when connection to > the NN fails (network glitch, ConnectException when NN is unreachable, > EOFException if network link shut down). > In certain situations, users have asked for an option to make the command > resilient to connection failures. This is useful so that the admin can > initiate the wait command despite the NN not being fully up or survive > intermittent network issues. With this option, the admin can rely on the wait > command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7431) log message for InvalidMagicNumberException may be incorrect
[ https://issues.apache.org/jira/browse/HDFS-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250957#comment-14250957 ] Yi Liu commented on HDFS-7431: -- The test failure is not related. > log message for InvalidMagicNumberException may be incorrect > > > Key: HDFS-7431 > URL: https://issues.apache.org/jira/browse/HDFS-7431 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-7431.001.patch, HDFS-7431.002.patch, > HDFS-7431.003.patch > > > For security mode, HDFS now supports that Datanodes don't require root or > jsvc if {{dfs.data.transfer.protection}} is configured. > Log message for {{InvalidMagicNumberException}}, we miss one case: > when the datanodes run on unprivileged port and > {{dfs.data.transfer.protection}} is configured to {{authentication}} but > {{dfs.encrypt.data.transfer}} is not configured. SASL handshake is required > and a low version dfs client is used, then {{InvalidMagicNumberException}} is > thrown and we write log: > {quote} > Failed to read expected encryption handshake from client at Perhaps the > client is running an older version of Hadoop which does not support encryption > {quote} > Recently I run HDFS built on trunk and security is enabled, but the client is > 2.5.1 version. Then I got the above log message, but actually I have not > configured encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7461) Reduce impact of laggards on Mover
[ https://issues.apache.org/jira/browse/HDFS-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250945#comment-14250945 ] Arpit Agarwal commented on HDFS-7461: - Hi Benoy, do you have a prototype/rough patch? > Reduce impact of laggards on Mover > -- > > Key: HDFS-7461 > URL: https://issues.apache.org/jira/browse/HDFS-7461 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: continuousmovement.pdf > > > The current Mover logic is as follows : > {code} > for (Path target : targetPaths) { > hasRemaining |= processPath(target.toUri().getPath()); > } > // wait for pending move to finish and retry the failed migration > hasRemaining |= Dispatcher.waitForMoveCompletion(storages.targets.values()); > {code} > The _processPath_ will schedule moves, but it is bounded by the number of > concurrent moves (default is 5 per node} . Once block moves are scheduled, > it will wait for ALL scheduled moves to finish in _waitForMoveCompletion_. > One slow move could keep the Mover idle for a long time. > It will be a performance improvement to schedule the next moves as soon as > any (source , target) slot is available instead of waiting for all the > scheduled moves to finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250936#comment-14250936 ] Lei (Eddy) Xu commented on HDFS-7531: - Thanks for the reviews, [~cmccabe] and [~wheat9]! > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Fix For: 2.7.0 > > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, > HDFS-7531.002.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250912#comment-14250912 ] Hudson commented on HDFS-7531: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6743 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6743/]) HDFS-7531. Improve the concurrent access on FsVolumeList (Lei Xu via Colin P. McCabe) (cmccabe: rev 3b173d95171d01ab55042b1162569d1cf14a8d43) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Fix For: 2.7.0 > > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, > HDFS-7531.002.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7545) Data striping support in HDFS client
Zhe Zhang created HDFS-7545: --- Summary: Data striping support in HDFS client Key: HDFS-7545 URL: https://issues.apache.org/jira/browse/HDFS-7545 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Attachments: DataStripingSupportinHDFSClient.pdf Data striping is a commonly used data layout with critical benefits in the context of erasure coding. This JIRA aims to extend HDFS client to work with striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7545) Data striping support in HDFS client
[ https://issues.apache.org/jira/browse/HDFS-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7545: Attachment: DataStripingSupportinHDFSClient.pdf > Data striping support in HDFS client > > > Key: HDFS-7545 > URL: https://issues.apache.org/jira/browse/HDFS-7545 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang > Attachments: DataStripingSupportinHDFSClient.pdf > > > Data striping is a commonly used data layout with critical benefits in the > context of erasure coding. This JIRA aims to extend HDFS client to work with > striped blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7531: --- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) +1. Thanks, Eddy > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Fix For: 2.7.0 > > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, > HDFS-7531.002.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get
[ https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250898#comment-14250898 ] Andrew Wang commented on HDFS-7544: --- Also looks like a nice change, +1 pending > ChunkedArrayList: fix removal via iterator and implement get > > > Key: HDFS-7544 > URL: https://issues.apache.org/jira/browse/HDFS-7544 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7544.001.patch > > > ChunkedArrayList: implement removal via iterator and get. Previously, > calling remove on a ChunkedArrayList iterator would cause the returned size > to be incorrect later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get
[ https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7544: --- Status: Patch Available (was: Open) > ChunkedArrayList: fix removal via iterator and implement get > > > Key: HDFS-7544 > URL: https://issues.apache.org/jira/browse/HDFS-7544 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7544.001.patch > > > ChunkedArrayList: implement removal via iterator and get. Previously, > calling remove on a ChunkedArrayList iterator would cause the returned size > to be incorrect later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get
[ https://issues.apache.org/jira/browse/HDFS-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7544: --- Attachment: HDFS-7544.001.patch > ChunkedArrayList: fix removal via iterator and implement get > > > Key: HDFS-7544 > URL: https://issues.apache.org/jira/browse/HDFS-7544 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7544.001.patch > > > ChunkedArrayList: implement removal via iterator and get. Previously, > calling remove on a ChunkedArrayList iterator would cause the returned size > to be incorrect later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7544) ChunkedArrayList: fix removal via iterator and implement get
Colin Patrick McCabe created HDFS-7544: -- Summary: ChunkedArrayList: fix removal via iterator and implement get Key: HDFS-7544 URL: https://issues.apache.org/jira/browse/HDFS-7544 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe ChunkedArrayList: implement removal via iterator and get. Previously, calling remove on a ChunkedArrayList iterator would cause the returned size to be incorrect later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250854#comment-14250854 ] Colin Patrick McCabe commented on HDFS-7411: * Can you rebase this on trunk? ChunkedArrayList has moved and this caused a patch application failure. * I would really prefer that size stay a O(1) operation for ChunkedArrayList. We should be able to do this by hooking into the iterator's remove() method, creating a custom iterator if needed. If that's too complex to do in this jira, then let's at least file a follow-on. {code} dfs.namenode.decommission.blocks.per.node 40 The approximate number of blocks per node. This affects the number of blocks processed per decommission interval, as defined in dfs.namenode.decommission.interval. This is multiplied by dfs.namenode.decommission.nodes.per.interval to define the actual processing rate. {code} * Why do we need this parameter? The NameNode already tracks how many blocks each DataNode has in each storage. That information is in DatanodeStorageInfo#size. {code} dfs.namenode.decommission.max.concurrent.tracked.nodes 100 The maximum number of decommission-in-progress datanodes nodes that will be tracked at one time by the namenode. Tracking a decommission-in-progress datanode consumes additional NN memory proportional to the number of blocks on the datnode. Having a conservative limit reduces the potential impact of decomissioning a large number of nodes at once. {code} * Should this be called something like dfs.namenode.decomission.max.concurrent.nodes? I'm confused by the mention of "tracking" here. It seems to imply that setting this too low would allow more nodes to be decommissioned, but we'd stop tracking the decomissioning? {code} - static final Log LOG = LogFactory.getLog(BlockManager.class); + static final Logger LOG = LoggerFactory.getLogger(BlockManager.class); {code} If you're going to change this, you need to change all the unit tests that are changing the log level, so that they use the correct function to do so. {code} hdoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java: ((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL); hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java: ((Log4JLogger)LogFactory.getLog(BlockManager.class)).getLogger().setLevel(Level.ALL); and many more... {code} I can see that you fixed it in TestPendingInvalidateBlock.java, but there's a lot more locations. You probably need something like the GenericTestUtils#disableLog function I created in the HDFS-7430 patch. I guess we could split that off into a separate patch if it's important enough. Or perhaps we could just put off changing this until a follow-on JIRA? {code} if (node.isAlive) { return true; } else { ... long block ... } {code} We can reduce the indentation by getting rid of the else block here. Similar with the other nested 'else'. {code} - LOG.fatal("ReplicationMonitor thread received Runtime exception. ", t); + LOG.error("ReplicationMonitor thread received Runtime exception. ", + t); {code} What's the rationale for changing the log level here? {code} /** - * Decommission the node if it is in exclude list. + * Decommission the node if it is in the host exclude list. + * + * @param nodeReg datanode */ - private void checkDecommissioning(DatanodeDescriptor nodeReg) { + void checkDecommissioning(DatanodeDescriptor nodeReg) { {code} I realize this isn't introduced by this patch, but this function seems misleadingly named. Perhaps it should be named something like "startDecomissioningIfExcluded"? It's definitely not just a "check." more comments coming... > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250855#comment-14250855 ] Charles Lamb commented on HDFS-7530: Thanks for the review and the kick in the head to Mr. Jenkins. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, > HDFS-7530.003.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250848#comment-14250848 ] Charles Lamb commented on HDFS-7529: Hi @wheat9, While the .001 patch fixes the formatting issues, the larger problem is that by calling provider.getMetadata() inside the lock, you're doing an RPC (inside the lock). While it is true that you may have been able to contact the KMS during ensureKeysAreInitialized, that may not be true when you try later and there can be an arbitrarily long delay. BTW, there's a plurality mismatch between ensureKeysAreInitialized (plural) and the method it calls (generateEncryptedDataEncryptionKey, which is singular). Charles > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250845#comment-14250845 ] Andrew Wang commented on HDFS-7530: --- Patch changes looks good, I'll rekick Jenkins and commit if it comes back clean. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, > HDFS-7530.003.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7543: - Issue Type: Sub-task (was: Improvement) Parent: HDFS-7508 > Avoid path resolution when getting FileStatus for audit logs > > > Key: HDFS-7543 > URL: https://issues.apache.org/jira/browse/HDFS-7543 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7543.000.patch > > > The current API of {{getAuditFileInfo()}} forces parsing the paths again when > generating the {{HdfsFileStatus}} for audit logs. This jira proposes to > avoid the repeated parsing by passing the {{INodesInPath}} object instead of > the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250839#comment-14250839 ] Yongjun Zhang commented on HDFS-6833: - Hi [~sinchii], Thanks for your new rev and sorry for late response. It looks good to me except for two minor things that you can take care of after getting committer's review: * {{public int getNumDeletingBlocks(String bpid)}} is not used anywhere. Consider removing. It might be possible that we need to have such an util in the future, if so, the method need to be implemented in the ReplicaMap protected with the internal mutex. * About {{if (m != null) {}} in {{void removeBlocks(String bpid, Set blockIds)}}, it's better to check if m is null and return if so right after getting m, instead of doing the check again and again in the loop. Or you can put the loop within {{if (m != null) {...}.}} HI [~cnauroth] and [~szetszwo], thanks for your earlier review. Wonder if any of you would have time to take a look at the latest? thanks much. > DirectoryScanner should not register a deleting block with memory of DataNode > - > > Key: HDFS-6833 > URL: https://issues.apache.org/jira/browse/HDFS-6833 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.5.0, 2.5.1 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita >Priority: Critical > Attachments: HDFS-6833-10.patch, HDFS-6833-11.patch, > HDFS-6833-12.patch, HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, > HDFS-6833-6.patch, HDFS-6833-7-2.patch, HDFS-6833-7.patch, HDFS-6833.8.patch, > HDFS-6833.9.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, > HDFS-6833.patch, HDFS-6833.patch > > > When a block is deleted in DataNode, the following messages are usually > output. > {code} > 2014-08-07 17:53:11,606 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > for deletion > 2014-08-07 17:53:11,617 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > {code} > However, DirectoryScanner may be executed when DataNode deletes the block in > the current implementation. And the following messsages are output. > {code} > 2014-08-07 17:53:30,519 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > for deletion > 2014-08-07 17:53:31,426 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata > files:0, missing block files:0, missing blocks in memory:1, mismatched > blocks:0 > 2014-08-07 17:53:31,426 WARN > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED > getNumBytes() = 21230663 > getBytesOnDisk() = 21230663 > getVisibleLength()= 21230663 > getVolume() = /hadoop/data1/dfs/data/current > getBlockFile()= > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > unlinked =false > 2014-08-07 17:53:31,531 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > {code} > Deleting block information is registered in DataNode's memory. > And when DataNode sends a block report, NameNode receives wrong block > information. > For example, when we execute recommission or change the number of > replication, NameNode may delete the right block as "ExcessReplicate" by this > problem. > And "Under-Replicated Blocks" and "Missing Blocks" occur. > When DataNode run DirectoryScanner, DataNode should not register a deleting > block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7509) Avoid resolving path multiple times
[ https://issues.apache.org/jira/browse/HDFS-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250835#comment-14250835 ] Konstantin Shvachko commented on HDFS-7509: --- Hey Jing. Looks like your commit message for this issue incorrectly references HDFS-7059 on both commits. > Avoid resolving path multiple times > --- > > Key: HDFS-7509 > URL: https://issues.apache.org/jira/browse/HDFS-7509 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: 2.7.0 > > Attachments: HDFS-7509.000.patch, HDFS-7509.001.patch, > HDFS-7509.002.patch, HDFS-7509.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7543: - Status: Patch Available (was: Open) > Avoid path resolution when getting FileStatus for audit logs > > > Key: HDFS-7543 > URL: https://issues.apache.org/jira/browse/HDFS-7543 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7543.000.patch > > > The current API of {{getAuditFileInfo()}} forces parsing the paths again when > generating the {{HdfsFileStatus}} for audit logs. This jira proposes to > avoid the repeated parsing by passing the {{INodesInPath}} object instead of > the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
[ https://issues.apache.org/jira/browse/HDFS-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7543: - Attachment: HDFS-7543.000.patch > Avoid path resolution when getting FileStatus for audit logs > > > Key: HDFS-7543 > URL: https://issues.apache.org/jira/browse/HDFS-7543 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7543.000.patch > > > The current API of {{getAuditFileInfo()}} forces parsing the paths again when > generating the {{HdfsFileStatus}} for audit logs. This jira proposes to > avoid the repeated parsing by passing the {{INodesInPath}} object instead of > the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7543) Avoid path resolution when getting FileStatus for audit logs
Haohui Mai created HDFS-7543: Summary: Avoid path resolution when getting FileStatus for audit logs Key: HDFS-7543 URL: https://issues.apache.org/jira/browse/HDFS-7543 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai The current API of {{getAuditFileInfo()}} forces parsing the paths again when generating the {{HdfsFileStatus}} for audit logs. This jira proposes to avoid the repeated parsing by passing the {{INodesInPath}} object instead of the path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads
[ https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250832#comment-14250832 ] Andrew Wang commented on HDFS-7430: --- Cool, looks like you hit a lot of these. I did another review pass: Nits: * DFSConfigKeys, I agree the spacing is erratic in this files, but adding some spaces to line up the variable names would agree with the variables immediately around the new keys * Still need javadoc {{}} tags in a lot of places. It's not a big deal, so if you do another pass and think it looks fine we can leave it. * TestFsDatasetImpl, FsVolumeImpl, FsDatasetSpi, FsDatasetImpl unused imports * @VisibleForTesting could be added to BlockScanner#Conf#INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD... * Still some lines longer than 80 chars Some more time conversions that could be done with TimeUnit: * VolumeScanner#positiveMsToHours, the else case * testScanRateImpl FsDatasetImpl * I'd still like to use JSON to save the iterator :) Pretty sure Jackson can pretty print it for you. * I also still like the iterator-of-iterators idea a lot, since we could probably use the same iterator implementation at each level. Iterating would be simpler, the serde would be harder, but overall I think simpler code and more friendly for Java programmers. * BlockIterator still implements Closeable, unnecessary? VolumeScanner {code} // Find out how many bytes per second we should scan. long neededBytesPerSec = conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR); {code} Still mismatched? * Guessing the JDK7 file listing goodness is coming in the next patch, since it's still using File#list Tests: * Did you look into the failed test I posted earlier? Any RCA? * The bugs found in my previous review seem worth unit testing, e.g the OBO with the binarySearch index and the neededBytesPerSec that still looks off, the {{<=}} in place of {{<}} that affected continuous scans. Might be fun trying to write some actual stripped down unit tests, rather than poking with a full minicluster. > Refactor the BlockScanner to use O(1) memory and use multiple threads > - > > Key: HDFS-7430 > URL: https://issues.apache.org/jira/browse/HDFS-7430 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, > HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png > > > We should update the BlockScanner to use a constant amount of memory by > keeping track of what block was scanned last, rather than by tracking the > scan status of all blocks in memory. Also, instead of having just one > thread, we should have a verification thread per hard disk (or other volume), > scanning at a configurable rate of bytes per second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250828#comment-14250828 ] Hadoop QA commented on HDFS-7531: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687810/HDFS-7531.002.patch against trunk revision f2d150e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9061//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9061//console This message is automatically generated. > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, > HDFS-7531.002.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6662) [ UI ] Not able to open file from UI if file path contains "%"
[ https://issues.apache.org/jira/browse/HDFS-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerson Carlos updated HDFS-6662: Attachment: hdfs-6662.001.patch > [ UI ] Not able to open file from UI if file path contains "%" > -- > > Key: HDFS-6662 > URL: https://issues.apache.org/jira/browse/HDFS-6662 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1 >Reporter: Brahma Reddy Battula >Priority: Critical > Attachments: hdfs-6662.001.patch, hdfs-6662.patch > > > 1. write a file into HDFS is such a way that, file name is like 1%2%3%4 > 2. using NameNode UI browse the file > throwing following Exception. > "Path does not exist on HDFS or WebHDFS is disabled. Please check your path > or enable WebHDFS" > HBase write its WAL files data in HDFS using % contains in file name > eg: > /hbase/WALs/HOST-,60020,1404731504691/HOST-***-130%2C60020%2C1404731504691.1404812663950.meta > > the above file info is not opening in the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6662) [ UI ] Not able to open file from UI if file path contains "%"
[ https://issues.apache.org/jira/browse/HDFS-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250805#comment-14250805 ] Gerson Carlos commented on HDFS-6662: - Thanks Haohui for noticing it. In fact, I had to add {{encodeURIComponent()}} with some adjustments, because it encodes even the separator {{/}}, thus broking the URI. But now it treats the slash and other reserved characters (&, =, +, for example) as well. This update is on the second patch version. I pretend to also add the unit test soon. > [ UI ] Not able to open file from UI if file path contains "%" > -- > > Key: HDFS-6662 > URL: https://issues.apache.org/jira/browse/HDFS-6662 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1 >Reporter: Brahma Reddy Battula >Priority: Critical > Attachments: hdfs-6662.001.patch, hdfs-6662.patch > > > 1. write a file into HDFS is such a way that, file name is like 1%2%3%4 > 2. using NameNode UI browse the file > throwing following Exception. > "Path does not exist on HDFS or WebHDFS is disabled. Please check your path > or enable WebHDFS" > HBase write its WAL files data in HDFS using % contains in file name > eg: > /hbase/WALs/HOST-,60020,1404731504691/HOST-***-130%2C60020%2C1404731504691.1404812663950.meta > > the above file info is not opening in the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250799#comment-14250799 ] Hadoop QA commented on HDFS-7530: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687861/HDFS-7530.003.patch against trunk revision 9937eef. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9066//console This message is automatically generated. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, > HDFS-7530.003.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7530: --- Attachment: HDFS-7530.003.patch [~andrew.wang], Good points. I think that .003 addresses them. Charles > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch, > HDFS-7530.003.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7528: --- Attachment: (was: HDFS-7530.003.patch) > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7528.000.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250769#comment-14250769 ] Hudson commented on HDFS-7528: -- FAILURE: Integrated in Hadoop-trunk-Commit #6738 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6738/]) HDFS-7528. Consolidate symlink-related implementation into a single class. Contributed by Haohui Mai. (wheat9: rev 0da1330bfd3080a7ad95a4b48ba7b7ac89c3608f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSymlinkOp.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250767#comment-14250767 ] Colin Patrick McCabe commented on HDFS-7527: I am -1 for removing this test right now, until we understand this issue better. Putting "registration names" in the host include and exclude files used to work. If it stopped working, then that's a bug that we should fix. Or, alternately, we should have a JIRA to remove registration names entirely. Last time we proposed that, it got rejected, though. See HDFS-5237. One example of where you might want to set registration names is if you're on an AWS instance with internal and external IP interfaces. On each datanode, you would set {{dfs.datanode.hostname}} to the internal IP address to ensure that traffic flowed over the internal interface, rather than the (expensive) external interfaces. In this case, you should be able to specify what nodes are in the cluster using these same registration names, even if doing reverse DNS on the datanode hostnames returns another IP address as the first entry. > TestDecommission.testIncludeByRegistrationName fails occassionally in trunk > --- > > Key: HDFS-7527 > URL: https://issues.apache.org/jira/browse/HDFS-7527 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: Yongjun Zhang >Assignee: Binglin Chang > Attachments: HDFS-7527.001.patch > > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ > {quote} > Error Message > test timed out after 36 milliseconds > Stacktrace > java.lang.Exception: test timed out after 36 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) > 2014-12-15 12:00:19,958 ERROR datanode.DataNode > (BPServiceActor.java:run(836)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565 Datanode denied communication with namenode because > the host is not in the include-list: DatanodeRegistration(127.0.0.1, > datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, > infoSecurePort=0, ipcPort=43726, > storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) > 2014-12-15 12:00:29,087 FATAL datanode.DataNode > (BPServiceActor.java:run(841)) - Initialization failed for Block pool > BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to > localhost/127.0.0.1:40565. Exiting. > java.io.IOException: DN shut down before block pool connected > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) > at java.lang.Thread.run(Thread.java:745) > {quote} > Found by tool proposed in HADOOP-11045: > {quote} > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j > Hadoop-Hdfs-trunk -n 5 | tee bt.log > Recently FAILED builds in url: > https://builds.apache.org//job/Hadoop-Hdfs-trunk > THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, > as listed below: > ===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport > (2014-12-15 03:30:01) > Failed test: > org.a
[jira] [Updated] (HDFS-7539) Namenode can't leave safemode because of Datanodes' IPC socket timeout
[ https://issues.apache.org/jira/browse/HDFS-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hoelog updated HDFS-7539: - Description: During the starting of namenode, data nodes seem waiting namenode's response through IPC to register block pools. here is DN's log - {code} 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode Block pool BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 {code} But namenode is too busy to responde it, and datanodes occur socket timeout - default is 1 minute. {code} 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to NN.x.com:9000 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout {code} same events repeat and eventually NN drops most connecting trials from DN. So NN can't leave safemode. DN's log - {code} 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: failed on local exception java.io.ioexception connection reset by peer {code} There is no troubles in the network, configuration or servers. I think NN is too busy to respond to DN in a minute. I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that was helpful for my cluster. {code} ipc.ping.interval 90 {code} In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request. It will be helpful if there is more elegant solution. {code} 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode Block pool BP-877672386-10.x.x.143-1412666752827 (Datanode Uuid c4f7beea-b8e9-404f-bc81-6e87e37263d2) service to NN/10.x.x.143:9000 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1 blockreports 2090961 blocks total. Took 1690 msec to generate and 193738 msecs for RPC and NN processing. Got back commands org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@20e68e11 2014-12-16 23:31:32,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize command for block pool BP-877672386-10.x.x.143-1412666752827 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlockMap 2014-12-16 23:31:32,032 INFO org.apache.hadoop.util.GSet: VM type = 64-bit 2014-12-16 23:31:32,044 INFO org.apache.hadoop.util.GSet: 0.5% max memory 3.6 GB = 18.2 MB 2014-12-16 23:31:32,045 INFO org.apache.hadoop.util.GSet: capacity = 2^21 = 2097152 entries 2014-12-16 23:31:32,046 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-877672386-10.114.130.143-1412666752827 {code} was: During the starting of namenode, data nodes seem waiting namenode's response through IPC to register block pools. here is DN's log - 2014-12-16 20:28:09,064 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode Block pool BP-877672386-10.114.130.143-1412666752827 (Datanode Uuid 2117395f-e034-4b4a-adec-8a28464f4796) service to NN.x.com/10.x.x143:9000 But namenode is too busy to responde it, and datanodes occur socket timeout - default is 1 minute. 2014-12-16 20:29:09,857 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.net.SocketTimeoutException: Call From DN1.x.com/10.x.x.84 to NN.x.com:9000 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.x.x.84:57924 remote=NN.x.com/10.x.x.143:9000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout same events repeat and eventually NN drops most connecting trials from DN. So NN can't leave safemode. DN's log - 2014-12-16 20:32:25,895 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: failed on local exception java.io.ioexception connection reset by peer There is no troubles in the network, configuration or servers. I think NN is too busy to respond to DN in a minute. I configured "ipc.ping.interval" to 15 mins In the core-site.xml, and that was helpful for my cluster. ipc.ping.interval 90 In my cluster, namenode responded 1 min ~ 5 mins for the DNs' request. It will be helpful if there is more elegant solution. 2014-12-16 23:28:16,598 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7528: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks Brandon for the review. > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250732#comment-14250732 ] Haohui Mai commented on HDFS-7529: -- The v1 patch fixes various formatting issues in the v0 patch. > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250726#comment-14250726 ] Brandon Li commented on HDFS-7528: -- +1 to Haohui's patch. > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7521) Refactor DN state management
[ https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250719#comment-14250719 ] Zhe Zhang commented on HDFS-7521: - [~mingma] OK, I missed the other arrow going to up.. Is this the only case where the 2 state machines are not independent? If so, how does this corner case affect potential formal verification? > Refactor DN state management > > > Key: HDFS-7521 > URL: https://issues.apache.org/jira/browse/HDFS-7521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: DNStateMachines.png, HDFS-7521.patch > > > There are two aspects w.r.t. DN state management in NN. > * State machine management within active NN > NN maintains states of each data node regarding whether it is running or > being decommissioned. But the state machine isn’t well defined. We have dealt > with some corner case bug in this area. It will be useful if we can refactor > the code to use clear state machine definition that define events, available > states and actions for state transitions. It has these benefits. > ** Make it easy to define correctness of DN state management. Currently some > of the state transitions aren't defined in the code. For example, if admins > remove a node from include host file while the node is being decommissioned, > it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the > intention. If we have state machine definition, we can identify this case. > ** Make it easy to add new state for DN later. For example, people discussed > about new “maintenance” state for DN to support the scenario where admins > need to take the machine/rack down for 30 minutes for repair. > We can refactor DN with clear state machine definition based on YARN state > related components. > * State machine consistency between active and standby NN > Another dimension of state machine management is consistency across NN pairs. > We have dealt with bugs due to different live nodes between active NN and > standby NN. Current design is to have each NN manage its own state based on > the events it receives. For example, DNs will send heartbeat to both NNs; > admins will issue decommission commands to both NNs. Alternative design > approach could be to have ZK manage the state. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7521) Refactor DN state management
[ https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250713#comment-14250713 ] Ming Ma commented on HDFS-7521: --- Folks, thanks for the comments. [~wheat9], I agree with you that simpler solution is better. This state machine lib has been used in YARN and MR and proves to be quite useful for debugging and especially when new state needs to be added. When we fixed corner cases in DN state management, we actually wanted to investigate ways to do formal checking on NN, but there is no good way to do that without state machine, as you mentioned. Definitely want to hear what others might want to say about the need of state machine lib. [~zhz], the main reason to have two states is the reduce the overall possible states. For most part, liveness and admin are independent. For the case you mentioned, it is specified in the diagram, In_Service can be transitioned to either Decommission_In_Progress or Decommissioned state upon receiving DECOMISSION_REQUESTED event. Yeah, you can't tell from the diagram how the decision is based; only source code has the answer. > Refactor DN state management > > > Key: HDFS-7521 > URL: https://issues.apache.org/jira/browse/HDFS-7521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: DNStateMachines.png, HDFS-7521.patch > > > There are two aspects w.r.t. DN state management in NN. > * State machine management within active NN > NN maintains states of each data node regarding whether it is running or > being decommissioned. But the state machine isn’t well defined. We have dealt > with some corner case bug in this area. It will be useful if we can refactor > the code to use clear state machine definition that define events, available > states and actions for state transitions. It has these benefits. > ** Make it easy to define correctness of DN state management. Currently some > of the state transitions aren't defined in the code. For example, if admins > remove a node from include host file while the node is being decommissioned, > it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the > intention. If we have state machine definition, we can identify this case. > ** Make it easy to add new state for DN later. For example, people discussed > about new “maintenance” state for DN to support the scenario where admins > need to take the machine/rack down for 30 minutes for repair. > We can refactor DN with clear state machine definition based on YARN state > related components. > * State machine consistency between active and standby NN > Another dimension of state machine management is consistency across NN pairs. > We have dealt with bugs due to different live nodes between active NN and > standby NN. Current design is to have each NN manage its own state based on > the events it receives. For example, DNs will send heartbeat to both NNs; > admins will issue decommission commands to both NNs. Alternative design > approach could be to have ZK manage the state. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250698#comment-14250698 ] Haohui Mai commented on HDFS-7529: -- bq. Wouldn't it be better to fail fast in this case? Did you copy the wrong code to #ensureKeysAreInitialized? Likewise, I think that the checks for nullness of provider, keyName, and metadata can be removed from #createEncryptionZoneInt, right? Duplicating the checks is intentional to define well-formed steps as it is implied by the name {{ensureKeysAreInitialized()}}. bq. are now inside the FSN#writeLock(). I suppose that's not the end of the world, but every little bit of extra code inside the writeLock() hurts. The performance benefit is minimal as {{getPermissionChecker()}} eventually synchronized in the {{UserGroupInformation#getCurrentUser()}}. Making it consistent with other operations allows the further refactoring. > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7529: - Attachment: HDFS-7529.001.patch > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch, HDFS-7529.001.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250667#comment-14250667 ] Andrew Wang commented on HDFS-7528: --- Charles, I think this was posted on the wrong JIRA :) > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250668#comment-14250668 ] Brandon Li commented on HDFS-7528: -- [~clamb], I guess you posted patch on a wrong JIRA. :-) > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7521) Refactor DN state management
[ https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250661#comment-14250661 ] Zhe Zhang commented on HDFS-7521: - Explicitly defining DN states sounds a great idea to me. It'd be very useful in supporting the increasingly complex management tasks. I'm not entirely sure if _liveness_ and _admin_ should be 2 independent state machines. For example, in the current transition diagram, upon receiving {{DECOMMISSION_REQUESTED}}, {{In_Service}} always transitions to {{Decommission_In_Progress}} (let me know if I'm understanding it wrong). I think it should rather depend on whether the DN is {{Running}} or {Dead}}. > Refactor DN state management > > > Key: HDFS-7521 > URL: https://issues.apache.org/jira/browse/HDFS-7521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: DNStateMachines.png, HDFS-7521.patch > > > There are two aspects w.r.t. DN state management in NN. > * State machine management within active NN > NN maintains states of each data node regarding whether it is running or > being decommissioned. But the state machine isn’t well defined. We have dealt > with some corner case bug in this area. It will be useful if we can refactor > the code to use clear state machine definition that define events, available > states and actions for state transitions. It has these benefits. > ** Make it easy to define correctness of DN state management. Currently some > of the state transitions aren't defined in the code. For example, if admins > remove a node from include host file while the node is being decommissioned, > it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the > intention. If we have state machine definition, we can identify this case. > ** Make it easy to add new state for DN later. For example, people discussed > about new “maintenance” state for DN to support the scenario where admins > need to take the machine/rack down for 30 minutes for repair. > We can refactor DN with clear state machine definition based on YARN state > related components. > * State machine consistency between active and standby NN > Another dimension of state machine management is consistency across NN pairs. > We have dealt with bugs due to different live nodes between active NN and > standby NN. Current design is to have each NN manage its own state based on > the events it receives. For example, DNs will send heartbeat to both NNs; > admins will issue decommission commands to both NNs. Alternative design > approach could be to have ZK manage the state. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7528) Consolidate symlink-related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7528: --- Attachment: HDFS-7530.003.patch [~andrew.wang], Good points. New diffs address them. > Consolidate symlink-related implementation into a single class > -- > > Key: HDFS-7528 > URL: https://issues.apache.org/jira/browse/HDFS-7528 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7528.000.patch, HDFS-7530.003.patch > > > The jira proposes to consolidate symlink-related implementation into a single > class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-7542: -- Status: Patch Available (was: Open) > Add an option to DFSAdmin -safemode wait to ignore connection failures > -- > > Key: HDFS-7542 > URL: https://issues.apache.org/jira/browse/HDFS-7542 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 2.6.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Minor > Attachments: HDFS-7542.001.patch > > > Currently, the _dfsadmin -safemode wait_ command aborts when connection to > the NN fails (network glitch, ConnectException when NN is unreachable, > EOFException if network link shut down). > In certain situations, users have asked for an option to make the command > resilient to connection failures. This is useful so that the admin can > initiate the wait command despite the NN not being fully up or survive > intermittent network issues. With this option, the admin can rely on the wait > command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
[ https://issues.apache.org/jira/browse/HDFS-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-7542: -- Attachment: HDFS-7542.001.patch > Add an option to DFSAdmin -safemode wait to ignore connection failures > -- > > Key: HDFS-7542 > URL: https://issues.apache.org/jira/browse/HDFS-7542 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 2.6.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Minor > Attachments: HDFS-7542.001.patch > > > Currently, the _dfsadmin -safemode wait_ command aborts when connection to > the NN fails (network glitch, ConnectException when NN is unreachable, > EOFException if network link shut down). > In certain situations, users have asked for an option to make the command > resilient to connection failures. This is useful so that the admin can > initiate the wait command despite the NN not being fully up or survive > intermittent network issues. With this option, the admin can rely on the wait > command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7542) Add an option to DFSAdmin -safemode wait to ignore connection failures
Stephen Chu created HDFS-7542: - Summary: Add an option to DFSAdmin -safemode wait to ignore connection failures Key: HDFS-7542 URL: https://issues.apache.org/jira/browse/HDFS-7542 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Currently, the _dfsadmin -safemode wait_ command aborts when connection to the NN fails (network glitch, ConnectException when NN is unreachable, EOFException if network link shut down). In certain situations, users have asked for an option to make the command resilient to connection failures. This is useful so that the admin can initiate the wait command despite the NN not being fully up or survive intermittent network issues. With this option, the admin can rely on the wait command continuing to poll instead of aborting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250593#comment-14250593 ] Andrew Wang commented on HDFS-7530: --- Hey Charles, note that Colin actually fixed binary diff application in HADOOP-10926. You just need to generate the diff with git and without --no-prefix. Doesn't matter here though since xml is text. For the CLI test, can we add an "ls" at the end so we can check the rename? An empty substring comparator is never going to trigger. I think that the existing test is supposed to test renaming an EZ file to a non-EZ, could we add that too? Thanks! > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7521) Refactor DN state management
[ https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250563#comment-14250563 ] Haohui Mai commented on HDFS-7521: -- bq. Regarding the state machine abstraction, the patch has DN's state modified asynchronously by the dispatcher thread instead of synchronously by the caller thread. The motivation is to have one thread modify DN's state and might help to simplify the lock management in NN. But not sure if that is really worthwhile. Changing from async to sync is pretty straightforward. This part of code has been quite complex. I'm concerned about the additional complexity. For example, what are the principles to ensure that the system should be properly synchronized? I suggest starting from simple implementation and to stabilize it, and moving towards a sophisticated solution if required. bq. IMO, reusing existing state machine lib is beneficial. It declares how states transition and any actions required. If you look at the state machine lib's internal implementation, it is similar to we would have implemented. ... Another nice thing of reusing existing state machine lib is you can generate the state machine diagram easily. I'm yet to be convinced that a dedicated library is required for this jira. For this use case, a well-formed state machine is so simple that there should no need for a library. A dedicated state machine library is hugely beneficial if you want to (1) write declarative programs, or (2) run some formal checking. (See P2 and MACE). I think this is out of the scope of this jira. I see a lot of value on simplifying DN state management using explicit state machines. Given the current complexity we have in the code, however, my suggestion is that to start simple. It would be great to start refactoring the code to make it closer to a state machine (which is anyway required), then we can explore additional issues once we get there. > Refactor DN state management > > > Key: HDFS-7521 > URL: https://issues.apache.org/jira/browse/HDFS-7521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: DNStateMachines.png, HDFS-7521.patch > > > There are two aspects w.r.t. DN state management in NN. > * State machine management within active NN > NN maintains states of each data node regarding whether it is running or > being decommissioned. But the state machine isn’t well defined. We have dealt > with some corner case bug in this area. It will be useful if we can refactor > the code to use clear state machine definition that define events, available > states and actions for state transitions. It has these benefits. > ** Make it easy to define correctness of DN state management. Currently some > of the state transitions aren't defined in the code. For example, if admins > remove a node from include host file while the node is being decommissioned, > it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the > intention. If we have state machine definition, we can identify this case. > ** Make it easy to add new state for DN later. For example, people discussed > about new “maintenance” state for DN to support the scenario where admins > need to take the machine/rack down for 30 minutes for repair. > We can refactor DN with clear state machine definition based on YARN state > related components. > * State machine consistency between active and standby NN > Another dimension of state machine management is consistency across NN pairs. > We have dealt with bugs due to different live nodes between active NN and > standby NN. Current design is to have each NN manage its own state based on > the events it receives. For example, DNs will send heartbeat to both NNs; > admins will issue decommission commands to both NNs. Alternative design > approach could be to have ZK manage the state. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250562#comment-14250562 ] Hadoop QA commented on HDFS-7540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687812/HDFS-7540.002.patch against trunk revision f2d150e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9062//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9062//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9062//console This message is automatically generated. > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250523#comment-14250523 ] Charles Lamb commented on HDFS-7530: The test failures appear to be unrelated. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7529) Consolidate encryption zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250501#comment-14250501 ] Charles Lamb commented on HDFS-7529: Hi [~wheat9], Thanks for looking into this. I have a few comments and then a bunch of formatting nits that are introduced as part of this patch. FSDirEncryptionZoneOp.java: In #ensureKeysAreInitialized, why do you return if provider, keyName, or metadata or null? The existing code would throw an exception, which the new code eventually does, but not before it has waited around to take the writeLock(). Wouldn't it be better to fail fast in this case? Did you copy the wrong code to #ensureKeysAreInitialized? Likewise, I think that the checks for nullness of provider, keyName, and metadata can be removed from #createEncryptionZoneInt, right? These two lines: {code} +final byte[][] pathComponents = + FSDirectory.getPathComponentsForReservedPath(src); +FSPermissionChecker pc = fsn.getPermissionChecker(); {code} are now inside the FSN#writeLock(). I suppose that's not the end of the world, but every little bit of extra code inside the writeLock() hurts. Same issue with the call to #logAuditEvent (for the failure case only) being inside the writeLock() now. IWBNI that call could be moved out of the scope of the lock. The same general comment for #getEZForPath. auditStat can be made final in that method. Formatting issues: You introduced a newline at the end of #createEncryptionZone. #getFileEncryptionInfo. The formatting for the call to getEZForPath is weird. Bump the 'iip);' up a line. In that same method, the call to unprotectedGetXAttrByName busts the 80 char limit. I realize that this was already in the codebase before this patch, but it was introduced in the previous Jira (the one which introduced FSDirXAttrOp) so we might as well fix it now for cleanliness purposes. In #createEncryptionZoneInt, the block comment did not get re-indented -2 when you moved it so it's out of alignment now. FSNamesystem.java: Call to FSDirEncryptionZoneOp.getFileEncryptionInfo could use some formatting. It exceeds the 80 char limit. Ditto the call to #generateEncryptedDataEncryptionKey. FSDirStatAndListingOp.java: Lines 204 and 423 exceed the 80 char limit. > Consolidate encryption zone related implementation into a single class > -- > > Key: HDFS-7529 > URL: https://issues.apache.org/jira/browse/HDFS-7529 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-7529.000.patch > > > This jira proposes to consolidate encryption zone related implementation to a > single class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250500#comment-14250500 ] Hadoop QA commented on HDFS-7530: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687763/HDFS-7530.002.patch against trunk revision e996a1b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.TestDecommission Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9058//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9058//console This message is automatically generated. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250463#comment-14250463 ] Andrew Wang commented on HDFS-7540: --- LGTM +1 pending Jenkins, thanks Colin > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7540: --- Attachment: HDFS-7540.002.patch > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250450#comment-14250450 ] Colin Patrick McCabe commented on HDFS-7540: bq. I wonder if we should really return a ChunkedArrayList here. It only implements a subset of the AbstractList interface, and this is a pretty general-purpose method. For huge dirs, we should probably just be using the DirectoryStream iterator directly. I do see the use of these helper functions for quick-and-dirty listings though. I think maybe later {{ChunkedArrayList}} will become more general-purpose. But you're right; for now, we better use {{ArrayList}}. bq. Need tag for javadoc linebreak ok bq. I read the docs at http://docs.oracle.com/javase/7/docs/api/java/nio/file/DirectoryStream.html and it'd be nice to do like the example and unwrap the DirectoryIteratorException into an IOException. Yeah, that's important... io errors should result in io exceptions. Looks like {{DirectoryIteratorException}} is a {{RuntimeException}}... probably in order to conform to the {{Iterator}} interface. I removed the variant that returns a list of File, since I found that the JDK6 file listing interfaces actually returned an array of String, so returning a list of String is compatible-ish. > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch, HDFS-7540.002.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads
[ https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250452#comment-14250452 ] Hadoop QA commented on HDFS-7430: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687802/HDFS-7430.006.patch against trunk revision 9b4ba40. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 13 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9060//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9060//console This message is automatically generated. > Refactor the BlockScanner to use O(1) memory and use multiple threads > - > > Key: HDFS-7430 > URL: https://issues.apache.org/jira/browse/HDFS-7430 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, > HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png > > > We should update the BlockScanner to use a constant amount of memory by > keeping track of what block was scanned last, rather than by tracking the > scan status of all blocks in memory. Also, instead of having just one > thread, we should have a verification thread per hard disk (or other volume), > scanning at a configurable rate of bytes per second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250441#comment-14250441 ] Hadoop QA commented on HDFS-7540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12687794/HDFS-7540.001.patch against trunk revision 4281c96. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9059//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9059//artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9059//console This message is automatically generated. > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7531: Attachment: HDFS-7531.002.patch Thanks for the suggestions [~cmccabe]. I updated the patch based on your comments. > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch, > HDFS-7531.002.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7541) Support for fast HDFS datanode rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7541: -- Attachment: SupportforfastHDFSdatanoderollingupgrade.pdf We ([~ctrezzo], [~jmeagher], [~lohit], [~l201514] and [~kihwal] and others) discussed ways to address this. Attached is the initial high level design document. * Upgrade domain support. HDFS-3566 outlines the idea, but it isn't applicable to hadoop 2 and it uses network topology to store upgrade domain definition. We can make load balancer to be more extensible to support different policies. * Have NN support for new "maintenance" datanode state. Under this state, the DN won't process read/write requests; But its replica will remains in BlockMaps and thus is still considered valid from block replication point of view. Appreciate any input. > Support for fast HDFS datanode rolling upgrade > -- > > Key: HDFS-7541 > URL: https://issues.apache.org/jira/browse/HDFS-7541 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: SupportforfastHDFSdatanoderollingupgrade.pdf > > > Current HDFS DN rolling upgrade step requires sequential DN restart to > minimize the impact on data availability and read/write operations. The side > effect is longer upgrade duration for large clusters. This might be > acceptable for DN JVM quick restart to update hadoop code/configuration. > However, for OS upgrade that requires machine reboot, the overall upgrade > duration will be too long if we continue to do sequential DN rolling restart. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7285: Attachment: HDFSErasureCodingDesign-20141217.pdf Based on feedbacks from the meetup and a deeper study of file size distribution (detailed report to be posted later), *Data Striping* is added to this updated design, mainly to support EC on small files. A few highlights compared to the first version: # _Client_: extended with striping and codec logic # _NameNode_: {{INodeFile}} extended to store both block and {{BlockGroup}} information; optimizations are proposed to reduce memory usage caused by striping and parity data # _DataNode_ remains mostly unchanged from the original EC design # Prioritizing _EC with striping_ as the focus of the initial phase, and putting _EC with contiguous (non-striping) layout_ to a 2nd phase > Erasure Coding Support inside HDFS > -- > > Key: HDFS-7285 > URL: https://issues.apache.org/jira/browse/HDFS-7285 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Weihua Jiang >Assignee: Zhe Zhang > Attachments: HDFSErasureCodingDesign-20141028.pdf, > HDFSErasureCodingDesign-20141217.pdf > > > Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice > of data reliability, comparing to the existing HDFS 3-replica approach. For > example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, > with storage overhead only being 40%. This makes EC a quite attractive > alternative for big data storage, particularly for cold data. > Facebook had a related open source project called HDFS-RAID. It used to be > one of the contribute packages in HDFS but had been removed since Hadoop 2.0 > for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends > on MapReduce to do encoding and decoding tasks; 2) it can only be used for > cold files that are intended not to be appended anymore; 3) the pure Java EC > coding implementation is extremely slow in practical use. Due to these, it > might not be a good idea to just bring HDFS-RAID back. > We (Intel and Cloudera) are working on a design to build EC into HDFS that > gets rid of any external dependencies, makes it self-contained and > independently maintained. This design lays the EC feature on the storage type > support and considers compatible with existing HDFS features like caching, > snapshot, encryption, high availability and etc. This design will also > support different EC coding schemes, implementations and policies for > different deployment scenarios. By utilizing advanced libraries (e.g. Intel > ISA-L library), an implementation can greatly improve the performance of EC > encoding/decoding and makes the EC solution even more attractive. We will > post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6403: Labels: supportability (was: ) > Add metrics for log warnings reported by JVM pauses > --- > > Key: HDFS-6403 > URL: https://issues.apache.org/jira/browse/HDFS-6403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Labels: supportability > Fix For: 2.5.0 > > Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch, > HDFS-6403.003.patch, HDFS-6403.004.patch, HDFS-6403.005.patch > > > HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed > as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6959) Make the HDFS home directory location customizable.
[ https://issues.apache.org/jira/browse/HDFS-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6959: Labels: supportability (was: ) > Make the HDFS home directory location customizable. > --- > > Key: HDFS-6959 > URL: https://issues.apache.org/jira/browse/HDFS-6959 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Kevin Odell >Assignee: Yongjun Zhang >Priority: Minor > Labels: supportability > Fix For: 2.6.0 > > Attachments: HADOOP-10334.001.patch, HADOOP-10334.002.patch, > HADOOP-10334.002.patch, HDFS-6959.001.patch, HDFS-6959.002.patch > > > The path is currently hardcoded: > public Path getHomeDirectory() { > return makeQualified(new Path("/user/" + dfs.ugi.getShortUserName())); > } > It would be nice to have that as a customizable value. > Thank you -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250393#comment-14250393 ] Plamen Jeliazkov commented on HDFS-7056: I just checked all the failures reported by Jenkins BOT. # FindBugs is unrelated -- we don't modify getBlockLocations. Issue comes from HDFS-7463. # JavaDoc warnings are from stuff we've never touched. # Java warnings are from files we've never touched. # All the "failed / timed out" tests passed on my local machine. (Except TestOfflineEditsViewer, which passes once you have the correct edits files loaded) > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250388#comment-14250388 ] Andrew Wang commented on HDFS-7540: --- Thanks for working on this Colin. It'll be nice to swap this in where we can, JDK7 does a much better job at exposing filesystem APIs. I wonder if we should really return a ChunkedArrayList here. It only implements a subset of the AbstractList interface, and this is a pretty general-purpose method. For huge dirs, we should probably just be using the DirectoryStream iterator directly. I do see the use of these helper functions for quick-and-dirty listings though. I'd be okay providing variants of these functions that return a ChunkedArrayList, but it seems like the default should just be a normal ArrayList. Couple other things: * Need {{}} tag for javadoc linebreak * I read the docs at http://docs.oracle.com/javase/7/docs/api/java/nio/file/DirectoryStream.html and it'd be nice to do like the example and unwrap the DirectoryIteratorException into an IOException. > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads
[ https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7430: --- Attachment: HDFS-7430.006.patch updated > Refactor the BlockScanner to use O(1) memory and use multiple threads > - > > Key: HDFS-7430 > URL: https://issues.apache.org/jira/browse/HDFS-7430 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, > HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png > > > We should update the BlockScanner to use a constant amount of memory by > keeping track of what block was scanned last, rather than by tracking the > scan status of all blocks in memory. Also, instead of having just one > thread, we should have a verification thread per hard disk (or other volume), > scanning at a configurable rate of bytes per second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7541) Support for fast HDFS datanode rolling upgrade
Ming Ma created HDFS-7541: - Summary: Support for fast HDFS datanode rolling upgrade Key: HDFS-7541 URL: https://issues.apache.org/jira/browse/HDFS-7541 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Current HDFS DN rolling upgrade step requires sequential DN restart to minimize the impact on data availability and read/write operations. The side effect is longer upgrade duration for large clusters. This might be acceptable for DN JVM quick restart to update hadoop code/configuration. However, for OS upgrade that requires machine reboot, the overall upgrade duration will be too long if we continue to do sequential DN rolling restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7540: --- Attachment: HDFS-7540.001.patch > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7540) Add IOUtils#listDirectory
[ https://issues.apache.org/jira/browse/HDFS-7540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7540: --- Status: Patch Available (was: Open) > Add IOUtils#listDirectory > - > > Key: HDFS-7540 > URL: https://issues.apache.org/jira/browse/HDFS-7540 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-7540.001.patch > > > We should have a drop-in replacement for File#listDir that doesn't hide > IOExceptions, and which returns a ChunkedArrayList rather than a single large > array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7540) Add IOUtils#listDirectory
Colin Patrick McCabe created HDFS-7540: -- Summary: Add IOUtils#listDirectory Key: HDFS-7540 URL: https://issues.apache.org/jira/browse/HDFS-7540 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe We should have a drop-in replacement for File#listDir that doesn't hide IOExceptions, and which returns a ChunkedArrayList rather than a single large array. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7531) Improve the concurrent access on FsVolumeList
[ https://issues.apache.org/jira/browse/HDFS-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250291#comment-14250291 ] Colin Patrick McCabe commented on HDFS-7531: Oops, just found two minor comment issues when I was going to commit this. {code} /** * Read access to this atomic reference array is not synchronized. * This list is replaced on modification holding "this" lock. */ {code} Can you remove this comment? It is no longer accurate because we don't hold the "this" lock when replacing the atomic reference. I don't think we need the first part, either... it's assumed that objects in {{AtomicReference}} are accessed locklessly unless otherwise noted. {code} /** * Returns a unmodifiable list view of all the volumes. * Note that this list is unmodifiable. */ {code} This comment is a bit redundant. If it's an "unmodifiable list" then we don't need to also note that it is unmodifiable. Let's get rid of the second line, and change unmodifiable to "immutable" since that's more idiomatic thanks > Improve the concurrent access on FsVolumeList > - > > Key: HDFS-7531 > URL: https://issues.apache.org/jira/browse/HDFS-7531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7531.000.patch, HDFS-7531.001.patch > > > {{FsVolumeList}} uses {{synchronized}} to protect the update on > {{FsVolumeList#volumes}}, while various operations (e.g., {{checkDirs()}}, > {{getAvailable()}}) iterate {{volumes}} without protection. > This JIRA proposes to use {{AtomicReference}} to encapture {{volumes}} to > provide better concurrent access. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250253#comment-14250253 ] Frantisek Vacek commented on HDFS-7392: --- I'm too bussy to implement promissed patch so I'm adding part of log to show what is wrong with connection timeout. Please let me know if it helped. Fanda Everlasting attempt to open nonexisting hdfs uri hdfs://share.merck.com/OneLevelHeader.xlsx opening path: /OneLevelHeader.xlsx ... DEBUG [main] (Client.java:426) - The ping interval is 6 ms. DEBUG [main] (Client.java:695) - Connecting to share.merck.com/54.40.29.223:8020 INFO [main] (Client.java:814) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45 WARN [main] (Client.java:568) - Address change detected. Old: share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020 INFO [main] (Client.java:814) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 0 time(s); maxRetries=45 INFO [main] (Client.java:814) - Retrying connect to server: share.merck.com/54.40.29.65:8020. Already tried 1 time(s); maxRetries=45 WARN [main] (Client.java:568) - Address change detected. Old: share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020 INFO [main] (Client.java:814) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45 INFO [main] (Client.java:814) - Retrying connect to server: share.merck.com/54.40.29.223:8020. Already tried 1 time(s); maxRetries=45 > org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever > - > > Key: HDFS-7392 > URL: https://issues.apache.org/jira/browse/HDFS-7392 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Frantisek Vacek >Assignee: Yi Liu > Attachments: 1.png, 2.png > > > In some specific circumstances, > org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts > and last forever. > What are specific circumstances: > 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point > to valid IP address but without name node service running on it. > 2) There should be at least 2 IP addresses for such a URI. See output below: > {quote} > [~/proj/quickbox]$ nslookup share.example.com > Server: 127.0.1.1 > Address:127.0.1.1#53 > share.example.com canonical name = > internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com. > Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com > Address: 192.168.1.223 > Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com > Address: 192.168.1.65 > {quote} > In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() > returns sometimes true (even if address didn't actually changed see img. 1) > and the timeoutFailures counter is set to 0 (see img. 2). The > maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is > repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7521) Refactor DN state management
[ https://issues.apache.org/jira/browse/HDFS-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250228#comment-14250228 ] Yongjun Zhang commented on HDFS-7521: - HI [~mingma], Thanks for the info. I checked with [~andrew.wang], he stated that it's intentional to leave a DN to be in state, so when the DN is revived, decommissioning can continue, which makes sense to me. Thanks Andrew. > Refactor DN state management > > > Key: HDFS-7521 > URL: https://issues.apache.org/jira/browse/HDFS-7521 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: DNStateMachines.png, HDFS-7521.patch > > > There are two aspects w.r.t. DN state management in NN. > * State machine management within active NN > NN maintains states of each data node regarding whether it is running or > being decommissioned. But the state machine isn’t well defined. We have dealt > with some corner case bug in this area. It will be useful if we can refactor > the code to use clear state machine definition that define events, available > states and actions for state transitions. It has these benefits. > ** Make it easy to define correctness of DN state management. Currently some > of the state transitions aren't defined in the code. For example, if admins > remove a node from include host file while the node is being decommissioned, > it will be transitioned to DEAD and DECOMM_IN_PROGRESS. That might not be the > intention. If we have state machine definition, we can identify this case. > ** Make it easy to add new state for DN later. For example, people discussed > about new “maintenance” state for DN to support the scenario where admins > need to take the machine/rack down for 30 minutes for repair. > We can refactor DN with clear state machine definition based on YARN state > related components. > * State machine consistency between active and standby NN > Another dimension of state machine management is consistency across NN pairs. > We have dealt with bugs due to different live nodes between active NN and > standby NN. Current design is to have each NN manage its own state based on > the events it receives. For example, DNs will send heartbeat to both NNs; > admins will issue decommission commands to both NNs. Alternative design > approach could be to have ZK manage the state. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7373) Clean up temporary files after fsimage transfer failures
[ https://issues.apache.org/jira/browse/HDFS-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250144#comment-14250144 ] Kihwal Lee commented on HDFS-7373: -- [~ajisakaa] Can I get an official binding +1? :) > Clean up temporary files after fsimage transfer failures > > > Key: HDFS-7373 > URL: https://issues.apache.org/jira/browse/HDFS-7373 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-7373.patch > > > When a fsimage (e.g. checkpoint) transfer fails, a temporary file is left in > each storage directory. If the size of name space is large, these files can > take up quite a bit of space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7530) Allow renaming an Encryption Zone root
[ https://issues.apache.org/jira/browse/HDFS-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7530: --- Attachment: HDFS-7530.002.patch [~andrew.wang], Thank you for the review. The .002 patch fixes the failing test. It turns out that that CLI test was testing exactly what this patch is for so I just modified the description and the expected output to reflect success rather than failure. I also added the message to the assertTrue. As you know, Jenkins will continue to fail the TestCryptoAdminCLI since test-patch.sh will not apply the testCryptoConf.xml file. > Allow renaming an Encryption Zone root > -- > > Key: HDFS-7530 > URL: https://issues.apache.org/jira/browse/HDFS-7530 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Minor > Attachments: HDFS-7530.001.patch, HDFS-7530.002.patch > > > It should be possible to do > hdfs dfs -mv /ezroot /newnameforezroot -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
[ https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250007#comment-14250007 ] Hudson commented on HDFS-7536: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/]) HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs > -- > > Key: HDFS-7536 > URL: https://issues.apache.org/jira/browse/HDFS-7536 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.6.0 >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: HADOOP-11413.001.patch > > > in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can > remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization
[ https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250005#comment-14250005 ] Hudson commented on HDFS-7494: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/]) HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev a97a1e73177974cff8afafad6ca43a96563f3c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Checking of closed in DFSInputStream#pread() should be protected by > synchronization > --- > > Key: HDFS-7494 > URL: https://issues.apache.org/jira/browse/HDFS-7494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch > > > {code} > private int pread(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > {code} > Checking of closed should be protected by holding lock on > "DFSInputStream.this" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250006#comment-14250006 ] Hudson commented on HDFS-6425: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/]) HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport latency. Contributed by Ming Ma. (kihwal: rev b7923a356e9f111619375b94d12749d634069347) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.7.0 > > Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, > HDFS-6425-Test-Case.pdf, HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249951#comment-14249951 ] Hudson commented on HDFS-6425: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/]) HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport latency. Contributed by Ming Ma. (kihwal: rev b7923a356e9f111619375b94d12749d634069347) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.7.0 > > Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, > HDFS-6425-Test-Case.pdf, HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization
[ https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249950#comment-14249950 ] Hudson commented on HDFS-7494: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/]) HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev a97a1e73177974cff8afafad6ca43a96563f3c61) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Checking of closed in DFSInputStream#pread() should be protected by > synchronization > --- > > Key: HDFS-7494 > URL: https://issues.apache.org/jira/browse/HDFS-7494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch > > > {code} > private int pread(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > {code} > Checking of closed should be protected by holding lock on > "DFSInputStream.this" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
[ https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249952#comment-14249952 ] Hudson commented on HDFS-7536: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/]) HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs > -- > > Key: HDFS-7536 > URL: https://issues.apache.org/jira/browse/HDFS-7536 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.6.0 >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: HADOOP-11413.001.patch > > > in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can > remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249908#comment-14249908 ] Hudson commented on HDFS-6425: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/]) HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport latency. Contributed by Ming Ma. (kihwal: rev b7923a356e9f111619375b94d12749d634069347) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.7.0 > > Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, > HDFS-6425-Test-Case.pdf, HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
[ https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249909#comment-14249909 ] Hudson commented on HDFS-7536: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/]) HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java > Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs > -- > > Key: HDFS-7536 > URL: https://issues.apache.org/jira/browse/HDFS-7536 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.6.0 >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: HADOOP-11413.001.patch > > > in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can > remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization
[ https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249907#comment-14249907 ] Hudson commented on HDFS-7494: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/]) HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev a97a1e73177974cff8afafad6ca43a96563f3c61) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > Checking of closed in DFSInputStream#pread() should be protected by > synchronization > --- > > Key: HDFS-7494 > URL: https://issues.apache.org/jira/browse/HDFS-7494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch > > > {code} > private int pread(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > {code} > Checking of closed should be protected by holding lock on > "DFSInputStream.this" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7536) Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs
[ https://issues.apache.org/jira/browse/HDFS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249899#comment-14249899 ] Hudson commented on HDFS-7536: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/]) HDFS-7536. Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs. Contributed by Haohui Mai. (wheat9: rev 565d72fe6e15fa104a623defe7f96446a13d268c) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Remove unused CryptoCodec in org.apache.hadoop.fs.Hdfs > -- > > Key: HDFS-7536 > URL: https://issues.apache.org/jira/browse/HDFS-7536 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.6.0 >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Fix For: 2.7.0 > > Attachments: HADOOP-11413.001.patch > > > in org.apache.hadoop.fs.Hdfs, the {{CryptoCodec}} is unused, and we can > remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7494) Checking of closed in DFSInputStream#pread() should be protected by synchronization
[ https://issues.apache.org/jira/browse/HDFS-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249897#comment-14249897 ] Hudson commented on HDFS-7494: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/]) HDFS-7494. Checking of closed in DFSInputStream#pread() should be protected by synchronization (Ted Yu via Colin P. McCabe) (cmccabe: rev a97a1e73177974cff8afafad6ca43a96563f3c61) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > Checking of closed in DFSInputStream#pread() should be protected by > synchronization > --- > > Key: HDFS-7494 > URL: https://issues.apache.org/jira/browse/HDFS-7494 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7494-001.patch, hdfs-7494-002.patch > > > {code} > private int pread(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > {code} > Checking of closed should be protected by holding lock on > "DFSInputStream.this" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6425) Large postponedMisreplicatedBlocks has impact on blockReport latency
[ https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249898#comment-14249898 ] Hudson commented on HDFS-6425: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/]) HDFS-6425. Large postponedMisreplicatedBlocks has impact on blockReport latency. Contributed by Ming Ma. (kihwal: rev b7923a356e9f111619375b94d12749d634069347) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencingWithReplication.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java > Large postponedMisreplicatedBlocks has impact on blockReport latency > > > Key: HDFS-6425 > URL: https://issues.apache.org/jira/browse/HDFS-6425 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 2.7.0 > > Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, > HDFS-6425-Test-Case.pdf, HDFS-6425.patch > > > Sometimes we have large number of over replicates when NN fails over. When > the new active NN took over, over replicated blocks will be put to > postponedMisreplicatedBlocks until all DNs for that block aren't stale > anymore. > We have a case where NNs flip flop. Before postponedMisreplicatedBlocks > became empty, NN fail over again and again. So postponedMisreplicatedBlocks > just kept increasing until the cluster is stable. > In addition, large postponedMisreplicatedBlocks could make > rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks > takes write lock. So it could slow down the block report processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs
[ https://issues.apache.org/jira/browse/HDFS-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-7501: -- Status: In Progress (was: Patch Available) Resetting state awaiting improved test. > TransactionsSinceLastCheckpoint can be negative on SBNs > --- > > Key: HDFS-7501 > URL: https://issues.apache.org/jira/browse/HDFS-7501 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.0 >Reporter: Harsh J >Assignee: Gautam Gopalakrishnan >Priority: Trivial > Attachments: HDFS-7501.patch > > > The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus > NNStorage.mostRecentCheckpointTxId. > In Standby mode, the former does not increment beyond the loaded or > last-when-active value, but the latter does change due to checkpoints done > regularly in this mode. Thereby, the SBN will eventually end up showing > negative values for TransactionsSinceLastCheckpoint. > This is not an issue as the metric only makes sense to be monitored on the > Active NameNode, but we should perhaps just show the value 0 by detecting if > the NN is in SBN form, as allowing a negative number is confusing to view > within a chart that tracks it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)