[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569413#comment-13569413 ] Hadoop QA commented on HDFS-4404: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567696/hdfs-4404.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3941//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3941//console This message is automatically generated. > Create file failure when the machine of first attempted NameNode is down > > > Key: HDFS-4404 > URL: https://issues.apache.org/jira/browse/HDFS-4404 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: liaowenrui >Assignee: Todd Lipcon >Priority: Critical > Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, > hdfs-4404.txt, hdfs-4404.txt > > > test Environment: NN1,NN2,DN1,DN2,DN3 > machine1:NN1,DN1 > machine2:NN2,DN2 > machine3:DN3 > mathine1 is down. > 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - > Connecting to /160.161.0.155:8020 > 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing > ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting > for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > java.net.SocketTimeoutException: 1 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286) > at org.apache.hadoop.ipc.Client.call(Client.java:1156) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) > at $Proxy10.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715) > at test.TestLease.mai
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569385#comment-13569385 ] Hudson commented on HDFS-4452: -- Integrated in Hadoop-trunk-Commit #3314 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3314/]) HDFS-4452. getAdditionalBlock() can create multiple blocks if the client times out and retries. Contributed by Konstantin Shvachko. (Revision 1441681) Result = SUCCESS shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441681 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Fix For: 2.0.3-alpha > > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha Status: Resolved (was: Patch Available) I just committed this. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Fix For: 2.0.3-alpha > > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569379#comment-13569379 ] Hadoop QA commented on HDFS-4452: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567681/getAdditionalBlock.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3940//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3940//console This message is automatically generated. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569371#comment-13569371 ] Hadoop QA commented on HDFS-4462: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567678/HDFS-4462.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//console This message is automatically generated. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, > HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
[ https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-4464. -- Resolution: Fixed Fix Version/s: Snapshot (HDFS-2802) Hadoop Flags: Reviewed I have committed this. > Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot > > > Key: HDFS-4464 > URL: https://issues.apache.org/jira/browse/HDFS-4464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: Snapshot (HDFS-2802) > > Attachments: h4464_20120201b.patch, h4464_20120201.patch > > > Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive > methods for deleting inodes and collecting blocks for further block > deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569366#comment-13569366 ] Aaron T. Myers commented on HDFS-4404: -- The latest patch looks good to me. +1 pending Jenkins. > Create file failure when the machine of first attempted NameNode is down > > > Key: HDFS-4404 > URL: https://issues.apache.org/jira/browse/HDFS-4404 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: liaowenrui >Assignee: Todd Lipcon >Priority: Critical > Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, > hdfs-4404.txt, hdfs-4404.txt > > > test Environment: NN1,NN2,DN1,DN2,DN3 > machine1:NN1,DN1 > machine2:NN2,DN2 > machine3:DN3 > mathine1 is down. > 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - > Connecting to /160.161.0.155:8020 > 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing > ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting > for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > java.net.SocketTimeoutException: 1 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286) > at org.apache.hadoop.ipc.Client.call(Client.java:1156) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) > at $Proxy10.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715) > at test.TestLease.main(TestLease.java:45) > 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC > Client (31594013) connection to /160.161.0.155:8020 from > hdfs/had...@hadoop.com: closed > 2013-01-12 09:52:47,834 WARN retry.RetryInvocationHandler > (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. > Not retrying because the invoked method is not idempotent, and unable to > determine whether it was invoked > java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to > vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: > 1 millis timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=/160.161.0.155:8020]; For more details see: > http://wiki.apache.org/hadoop/SocketTimeout > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743) > at org.apache.hadoop.ipc.Client.call(Client.java:1180) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTran
[jira] [Updated] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-4404: -- Attachment: hdfs-4404.txt Fix the javadoc warning (missed a '}' character) > Create file failure when the machine of first attempted NameNode is down > > > Key: HDFS-4404 > URL: https://issues.apache.org/jira/browse/HDFS-4404 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: liaowenrui >Assignee: Todd Lipcon >Priority: Critical > Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, > hdfs-4404.txt, hdfs-4404.txt > > > test Environment: NN1,NN2,DN1,DN2,DN3 > machine1:NN1,DN1 > machine2:NN2,DN2 > machine3:DN3 > mathine1 is down. > 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - > Connecting to /160.161.0.155:8020 > 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing > ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting > for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > java.net.SocketTimeoutException: 1 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286) > at org.apache.hadoop.ipc.Client.call(Client.java:1156) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) > at $Proxy10.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715) > at test.TestLease.main(TestLease.java:45) > 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC > Client (31594013) connection to /160.161.0.155:8020 from > hdfs/had...@hadoop.com: closed > 2013-01-12 09:52:47,834 WARN retry.RetryInvocationHandler > (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. > Not retrying because the invoked method is not idempotent, and unable to > determine whether it was invoked > java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to > vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: > 1 millis timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=/160.161.0.155:8020]; For more details see: > http://wiki.apache.org/hadoop/SocketTimeout > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743) > at org.apache.hadoop.ipc.Client.call(Client.java:1180) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.N
[jira] [Commented] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature
[ https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569354#comment-13569354 ] Colin Patrick McCabe commented on HDFS-4453: bq. Regarding /var/lib/hadoop-hdfs vs /var/run/hadoop-hdfs – why's it problematic if /var/run is a tmpfs? We shouldn't need it to persist cross-reboot, and /var/run is generally not cleaned by a tmpwatch process. tmpfs is also better in that it will continue to work even if a local disk dies. If the /var/run/hadoop-hdfs directory gets removed, hdfs itself can't recreate it (you need root permissions to create a directory in /var/run.) So after a reboot, things would probably stop working. > Make a simple doc to describe the usage and design of the shortcircuit read > feature > --- > > Key: HDFS-4453 > URL: https://issues.apache.org/jira/browse/HDFS-4453 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Brandon Li >Assignee: Colin Patrick McCabe > Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch > > > It would be nice to have a document to describe the configuration and design > of this feature. Also its relationship with previous short circuit read > implementation(HDFS-2246), for example, can they co-exist, or this one is > planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix > domain socket is not supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
[ https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4464: - Attachment: h4464_20120201b.patch Sure, destroySubtreeAndCollectBlocks sounds like a better name. h4464_20120201b.patch: renames the method and also revises the javadoc. > Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot > > > Key: HDFS-4464 > URL: https://issues.apache.org/jira/browse/HDFS-4464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4464_20120201b.patch, h4464_20120201.patch > > > Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive > methods for deleting inodes and collecting blocks for further block > deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569341#comment-13569341 ] Chris Nauroth commented on HDFS-4462: - +1 for the new patch I confirmed that it fixed the test failure. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, > HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569329#comment-13569329 ] Suresh Srinivas commented on HDFS-4465: --- Aaron, given you have worked on it, if you want, feel free to assign this jira to yourself. > Optimize datanode ReplicasMap and ReplicaInfo > - > > Key: HDFS-4465 > URL: https://issues.apache.org/jira/browse/HDFS-4465 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: dn-memory-improvements.patch > > > In Hadoop a lot of optimization has been done in namenode data structures to > be memory efficient. Similar optimizations are necessary for Datanode > process. With the growth in storage per datanode and number of blocks hosted > on datanode, this jira intends to optimize long lived ReplicasMap and > ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature
[ https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569328#comment-13569328 ] Todd Lipcon commented on HDFS-4453: --- Section 4: I think you have some typos there: - you mean GetBlockLocalPathInfo, not GetBlockLocations - you said 'clients' twice where I think one of them should read 'servers' {code} + To configure short-circuit local reads, you will need to put + <<>> in your <<>>. You can check if you have done + this by running + {code} Users don't generally set this via {{LD_LIBRARY_PATH}}. Instead, it goes in the appropriate directory inside the install tree. Do we have other docs already about how to enable the native code? Might be better to refer to those. - Regarding {{/var/lib/hadoop-hdfs}} vs {{/var/run/hadoop-hdfs}} -- why's it problematic if /var/run is a tmpfs? We shouldn't need it to persist cross-reboot, and /var/run is generally _not_ cleaned by a tmpwatch process. tmpfs is also better in that it will continue to work even if a local disk dies. - In the example config, I would not use the _PORT trick - it only really makes sense for dev setups like the minicluster, and otherwise may just confuse the user. - Please specify that you need to set these two configurations both on clients and on servers. > Make a simple doc to describe the usage and design of the shortcircuit read > feature > --- > > Key: HDFS-4453 > URL: https://issues.apache.org/jira/browse/HDFS-4453 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Brandon Li >Assignee: Colin Patrick McCabe > Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch > > > It would be nice to have a document to describe the configuration and design > of this feature. Also its relationship with previous short circuit read > implementation(HDFS-2246), for example, can they co-exist, or this one is > planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix > domain socket is not supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
[ https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569326#comment-13569326 ] Jing Zhao commented on HDFS-4464: - The name of deleteSubtreeAndCollectBlocks may be a little bit of confusing, since when the parameter snapshot is null the function is more like a destructor of the subtree. Maybe we can rename the function and make its javadoc more clear. Besides of that +1 for the patch. > Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot > > > Key: HDFS-4464 > URL: https://issues.apache.org/jira/browse/HDFS-4464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4464_20120201.patch > > > Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive > methods for deleting inodes and collecting blocks for further block > deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP
[ https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569307#comment-13569307 ] Alejandro Abdelnur commented on HDFS-4457: -- Daryn, Regarding your concern about having a proxy in the middle, I don't see that as a problem, when you have an HTTP proxy in between, the client still targets the real server hostname, it is the HTTP stack in the client the one that redirects the request to the proxy with the real server hostname used by the client. Then in the server side (webhdfsNN) from the HTTP request you can infer the exact name of the server used by the client (proxy or not). Regarding NAT in the middle doing port redirection, that should not be an issue either as the host:port information used by the client is transmitted in the HTTP 'Host' header which contains both host:port used by the client when opening the connection (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html, section '14.23 Host') and this is required in HTTP/1.1. Regarding your second concern, Is that really a problem, tokens are transient and I would not expect the to be valid across system updates. On the *fetchdt*, got it. Still, that requires spawning a JVM to get the token. > WebHDFS obtains/sets delegation token service hostname using wrong config > leading to issues when NN is configured with 0.0.0.0 RPC IP > - > > Key: HDFS-4457 > URL: https://issues.apache.org/jira/browse/HDFS-4457 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Critical > Attachments: HDFS_4457.patch, HDFS_4457.patch > > > If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then > delegationotkens are configured with 0.0.0.0 as service and this breaks > clients trying to use those tokens. > Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is > SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back > what is being used to resolve getHttpAddress() the NameNodeHttpServer is > resolving the httpAddress doing a httpAddress = new > InetSocketAddress(bindAddress.getAddress(), httpServer.getPort()); > , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from > bindAddress.getAddress(). > Normally (non webhdfs) this is not an issue because it is the responsibility > of the client, but in the case of WebHDFS, WebHDFS does it before returning > the string version of the token (it must be this way because the client may > not be a java client at all and cannot manipulate the DelegationToken as > such). > The solution (thanks to Eric Sammer for helping figure this out) is for > WebHDFS to use the exacty hostname that came in the HTTP request as the > service to set in the delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Attachment: getAdditionalBlock.patch Patch for trunk. Incorporated Cos's comments. Corrected couple comment and log messages, minor code cleanup. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Hadoop Flags: Reviewed Status: Patch Available (was: Open) > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Attachment: getAdditionalBlock-branch2.patch Patch for branch 2. Somebody "conveniently" renamed local variables and changed types. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Status: Open (was: Patch Available) > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock-branch2.patch, > getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4462: - Attachment: HDFS-4462.patch Missed a test failure from the last patch. This patch should fix the test failure, > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, > HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569265#comment-13569265 ] Hadoop QA commented on HDFS-4404: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567643/hdfs-4404.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3938//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3938//console This message is automatically generated. > Create file failure when the machine of first attempted NameNode is down > > > Key: HDFS-4404 > URL: https://issues.apache.org/jira/browse/HDFS-4404 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: liaowenrui >Assignee: Todd Lipcon >Priority: Critical > Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, > hdfs-4404.txt > > > test Environment: NN1,NN2,DN1,DN2,DN3 > machine1:NN1,DN1 > machine2:NN2,DN2 > machine3:DN3 > mathine1 is down. > 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - > Connecting to /160.161.0.155:8020 > 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing > ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting > for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > java.net.SocketTimeoutException: 1 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286) > at org.apache.hadoop.ipc.Client.call(Client.java:1156) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) > at $Proxy10.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715) > at test.TestLease.main(TestLease.
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569238#comment-13569238 ] Hadoop QA commented on HDFS-4462: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567641/HDFS-4462.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestStartupOptionUpgrade {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//console This message is automatically generated. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
[ https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4465: - Attachment: dn-memory-improvements.patch Hey Suresh, thanks a lot for filing this issue. A little while back I threw together a few changes to see how much memory overhead improvement we could get in the DN with minimal effort. Here's a little patch (not necessarily ready for commit) which shows the changes I made. This patch does three things: # Reduce the number of repeated String/char[] objects by storing a single reference to a base path and then per replica it stores an int[] containing integers denoting the subdirs from base dir to replica file, e.g. "1, 34, 2". # Switch to using the LighWeightGSet instead of standard java.util structures where possible in the DN. We already did this in the NN, but with a little adaptation we can do it for some of the DN's data structures as well. # Intern File objects where possible. Even though interning repeated Strings/char[] underlying file objects is a step in the right direction, we can do a little bit better by doing our own interning of File objects to further reduce overhead from repeated objects. Using this patch I was able to see per-replica heap usage go from ~650 bytes per replica in my test setup to ~250 bytes per replica. Feel free to take this patch and run with it, use it for ideas, or ignore it entirely. > Optimize datanode ReplicasMap and ReplicaInfo > - > > Key: HDFS-4465 > URL: https://issues.apache.org/jira/browse/HDFS-4465 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Attachments: dn-memory-improvements.patch > > > In Hadoop a lot of optimization has been done in namenode data structures to > be memory efficient. Similar optimizations are necessary for Datanode > process. With the growth in storage per datanode and number of blocks hosted > on datanode, this jira intends to optimize long lived ReplicasMap and > ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569213#comment-13569213 ] Konstantin Boudnik commented on HDFS-4452: -- The patch looks mostly good. A few minor comments: - the logger in the class should be initialized for {{TestAddBlockRetry}}, not {{TestFSDirectory}} - formatting only change {noformat} LocatedBlock getAdditionalBlock(String src, - String clientName, - ExtendedBlock previous, - HashMap excludedNodes - ) + String clientName, + ExtendedBlock previous, + HashMap excludedNodes) {noformat} Test is failing without the corresponding change in the code, so it seems right on the money. +1 > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
[ https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3119: - Fix Version/s: 0.23.7 Committed to branch-0.23. > Overreplicated block is not deleted even after the replication factor is > reduced after sync follwed by closing that file > > > Key: HDFS-3119 > URL: https://issues.apache.org/jira/browse/HDFS-3119 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.24.0 >Reporter: J.Andreina >Assignee: Ashish Singhi >Priority: Minor > Labels: patch > Fix For: 0.24.0, 2.0.0-alpha, 0.23.7 > > Attachments: HDFS-3119-1.patch, HDFS-3119-1.patch, HDFS-3119.patch > > > cluster setup: > -- > 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB > step1: write a file "filewrite.txt" of size 90bytes with sync(not closed) > step2: change the replication factor to 1 using the command: "./hdfs dfs > -setrep 1 /filewrite.txt" > step3: close the file > * At the NN side the file "Decreasing replication from 2 to 1 for > /filewrite.txt" , logs has occured but the overreplicated blocks are not > deleted even after the block report is sent from DN > * while listing the file in the console using "./hdfs dfs -ls " the > replication factor for that file is mentioned as 1 > * In fsck report for that files displays that the file is replicated to 2 > datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
[ https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4464: - Attachment: h4464_20120201.patch h4464_20120201.patch: combine those methods to deleteSubtreeAndCollectBlocks. > Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot > > > Key: HDFS-4464 > URL: https://issues.apache.org/jira/browse/HDFS-4464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4464_20120201.patch > > > Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive > methods for deleting inodes and collecting blocks for further block > deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo
Suresh Srinivas created HDFS-4465: - Summary: Optimize datanode ReplicasMap and ReplicaInfo Key: HDFS-4465 URL: https://issues.apache.org/jira/browse/HDFS-4465 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Suresh Srinivas Assignee: Suresh Srinivas In Hadoop a lot of optimization has been done in namenode data structures to be memory efficient. Similar optimizations are necessary for Datanode process. With the growth in storage per datanode and number of blocks hosted on datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569198#comment-13569198 ] Suresh Srinivas commented on HDFS-4461: --- I think my earlier comments perhaps are not clear. Let me give it another try :) +1 for optimizing the data structures in datanode. bq. Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO. This is again not the point I am making either. I know and understand that number of blocks in DN is growing. Data structures in datanode need to be optimized. At the same time, as the DNs support more storage, the DN heap also needs to be suitably increased. What my previous comments are related to the assertion that DirectoryScanner is causing OOM. OOM is not caused by the scanner. It is caused by incorrectly sizing the datanode JVM heap, unless one shows a leak in DirectoryScanner. So the comment was to edit the description to reflect it. We need to also optimize the long lived data structures in datanode. I thought one would start with that instead of DirectoryScanner, which creates short lived objects. Create HDFS-4465 to track that. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated HDFS-: Fix Version/s: 0.23.7 > Add space between total transaction time and number of transactions in > FSEditLog#printStatistics > > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Trivial > Fix For: 1.2.0, 2.0.3-alpha, 0.23.7 > > Attachments: HDFS-.patch.001, HDFS-.patch.branch-1 > > > Currently, when we log statistics, we see something like > {code} > 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total > time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number > of syncs: 0 SyncTimes(ms): 0 > {code} > Notice how the value for total transactions time and "Number of transactions > batched in Syncs" needs a space to separate them. > FSEditLog#printStatistics: > {code} > private void printStatistics(boolean force) { > long now = now(); > if (lastPrintTime + 6 > now && !force) { > return; > } > lastPrintTime = now; > StringBuilder buf = new StringBuilder(); > buf.append("Number of transactions: "); > buf.append(numTransactions); > buf.append(" Total time for transactions(ms): "); > buf.append(totalTimeTransactions); > buf.append("Number of transactions batched in Syncs: "); > buf.append(numTransactionsBatchedInSync); > buf.append(" Number of syncs: "); > buf.append(editLogStream.getNumSync()); > buf.append(" SyncTimes(ms): "); > buf.append(journalSet.getSyncTimes()); > LOG.info(buf); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP
[ https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569183#comment-13569183 ] Daryn Sharp commented on HDFS-4457: --- Ignoring the issue of the client relying on the server to set the service - which is what I don't approve of since it's a big step backwards - you still have the problem if a proxy is between the client and the webhdfs server. The token will contain the hostname that the proxy used to contact the server, not the hostname the client used to contact the proxy. The proxy, or even some form of NAT may be redirecting the port. The server doesn't know this, only the client knows what port it thinks it contacted. The remote server also doesn't have the ability to know if the client has use_ip enabled or disabled. Basically, only the client that requested the token knows the exact host:port authority it used to request the token. When it attempts to re-contact that service, it needs to match the service with the authority. My second concern is that you must be assuming the key to store the token in the credentials. It currently happens to be the token's service, but it's a private implementation detail. If the key format changes, and the passed-along token is added to the credentials with the old format, then job submission will attempt to reacquire the token and fail. Fetchdt solves this by allowing you to acquire tokens and opaquely pass them along in binary form. What error are you encountering with fetchdt? It's working for me on a production cluster: {noformat} $ hdfs fetchdt -fs webhdfs://host /tmp/tokens Fetched token for host:50070 into file:/tmp/tokens {noformat} > WebHDFS obtains/sets delegation token service hostname using wrong config > leading to issues when NN is configured with 0.0.0.0 RPC IP > - > > Key: HDFS-4457 > URL: https://issues.apache.org/jira/browse/HDFS-4457 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Critical > Attachments: HDFS_4457.patch, HDFS_4457.patch > > > If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then > delegationotkens are configured with 0.0.0.0 as service and this breaks > clients trying to use those tokens. > Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is > SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back > what is being used to resolve getHttpAddress() the NameNodeHttpServer is > resolving the httpAddress doing a httpAddress = new > InetSocketAddress(bindAddress.getAddress(), httpServer.getPort()); > , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from > bindAddress.getAddress(). > Normally (non webhdfs) this is not an issue because it is the responsibility > of the client, but in the case of WebHDFS, WebHDFS does it before returning > the string version of the token (it must be this way because the client may > not be a java client at all and cannot manipulate the DelegationToken as > such). > The solution (thanks to Eric Sammer for helping figure this out) is for > WebHDFS to use the exacty hostname that came in the HTTP request as the > service to set in the delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated HDFS-: Fix Version/s: (was: 0.23.7) > Add space between total transaction time and number of transactions in > FSEditLog#printStatistics > > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Trivial > Fix For: 1.2.0, 2.0.3-alpha > > Attachments: HDFS-.patch.001, HDFS-.patch.branch-1 > > > Currently, when we log statistics, we see something like > {code} > 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total > time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number > of syncs: 0 SyncTimes(ms): 0 > {code} > Notice how the value for total transactions time and "Number of transactions > batched in Syncs" needs a space to separate them. > FSEditLog#printStatistics: > {code} > private void printStatistics(boolean force) { > long now = now(); > if (lastPrintTime + 6 > now && !force) { > return; > } > lastPrintTime = now; > StringBuilder buf = new StringBuilder(); > buf.append("Number of transactions: "); > buf.append(numTransactions); > buf.append(" Total time for transactions(ms): "); > buf.append(totalTimeTransactions); > buf.append("Number of transactions batched in Syncs: "); > buf.append(numTransactionsBatchedInSync); > buf.append(" Number of syncs: "); > buf.append(editLogStream.getNumSync()); > buf.append(" SyncTimes(ms): "); > buf.append(journalSet.getSyncTimes()); > LOG.info(buf); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics
[ https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated HDFS-: Fix Version/s: 0.23.7 > Add space between total transaction time and number of transactions in > FSEditLog#printStatistics > > > Key: HDFS- > URL: https://issues.apache.org/jira/browse/HDFS- > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Stephen Chu >Assignee: Stephen Chu >Priority: Trivial > Fix For: 1.2.0, 2.0.3-alpha, 0.23.7 > > Attachments: HDFS-.patch.001, HDFS-.patch.branch-1 > > > Currently, when we log statistics, we see something like > {code} > 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total > time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number > of syncs: 0 SyncTimes(ms): 0 > {code} > Notice how the value for total transactions time and "Number of transactions > batched in Syncs" needs a space to separate them. > FSEditLog#printStatistics: > {code} > private void printStatistics(boolean force) { > long now = now(); > if (lastPrintTime + 6 > now && !force) { > return; > } > lastPrintTime = now; > StringBuilder buf = new StringBuilder(); > buf.append("Number of transactions: "); > buf.append(numTransactions); > buf.append(" Total time for transactions(ms): "); > buf.append(totalTimeTransactions); > buf.append("Number of transactions batched in Syncs: "); > buf.append(numTransactionsBatchedInSync); > buf.append(" Number of syncs: "); > buf.append(editLogStream.getNumSync()); > buf.append(" SyncTimes(ms): "); > buf.append(journalSet.getSyncTimes()); > LOG.info(buf); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
Tsz Wo (Nicholas), SZE created HDFS-4464: Summary: Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot Key: HDFS-4464 URL: https://issues.apache.org/jira/browse/HDFS-4464 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive methods for deleting inodes and collecting blocks for further block deletion/update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569151#comment-13569151 ] Chris Nauroth commented on HDFS-4462: - +1 for the new patch Tests pass with the new patch too. Thank you for addressing the extremely paranoid feedback. :-) > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down
[ https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-4404: -- Attachment: hdfs-4404.txt Attached patch adds a unit test and addresses some of the feedback above. Uma -- I didn't change the "Local Exception" wrapping case to use the new code, since that would be a behavioral change which I think is outside the scope of this bug fix. > Create file failure when the machine of first attempted NameNode is down > > > Key: HDFS-4404 > URL: https://issues.apache.org/jira/browse/HDFS-4404 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.2-alpha >Reporter: liaowenrui >Assignee: Todd Lipcon >Priority: Critical > Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, > hdfs-4404.txt > > > test Environment: NN1,NN2,DN1,DN2,DN3 > machine1:NN1,DN1 > machine2:NN2,DN2 > machine3:DN3 > mathine1 is down. > 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - > Connecting to /160.161.0.155:8020 > 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing > ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting > for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > java.net.SocketTimeoutException: 1 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020] > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286) > at org.apache.hadoop.ipc.Client.call(Client.java:1156) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) > at $Proxy9.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) > at $Proxy10.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715) > at test.TestLease.main(TestLease.java:45) > 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC > Client (31594013) connection to /160.161.0.155:8020 from > hdfs/had...@hadoop.com: closed > 2013-01-12 09:52:47,834 WARN retry.RetryInvocationHandler > (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. > Not retrying because the invoked method is not idempotent, and unable to > determine whether it was invoked > java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to > vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: > 1 millis timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=/160.161.0.155:8020]; For more details see: > http://wiki.apache.org/hadoop/SocketTimeout > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743) > at org.apache.hadoop.ipc.Client.call(Client.java:1180) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184) >
[jira] [Updated] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-4462: - Attachment: HDFS-4462.patch Thanks a lot for the review, Chris, and for running those additional tests. Your suggestion does seem pretty paranoid (odds are 1 over 2^31), but better to be overly conservative in cases such as this. :) Please take a look at the updated patch. This patch expressly checks to see if the local metadata's layout version supports federation or not, and only compares the namespace IDs if it doesn't support federation. If federation is supported, all three fields are compared. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569083#comment-13569083 ] Andy Isaacson commented on HDFS-4461: - The actual OOM backtrace is on the DN thread: {noformat} at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:25) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat()Lorg/apache/hadoop/hdfs/server/protocol/HeartbeatResponse; (BPServiceActor.java:434) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService()V (BPServiceActor.java:520) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run()V (BPServiceActor.java:673) at java.lang.Thread.run()V (Thread.java:662) {noformat} > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569063#comment-13569063 ] Chris Nauroth commented on HDFS-4462: - Hi, Aaron. The code looks good. I applied the patch to branch-2 and ran multiple test suites related to checkpoints and 2NN. {code} - boolean isSameCluster(FSImage si) { -return namespaceID == si.getStorage().namespaceID && - clusterID.equals(si.getClusterID()) && - blockpoolID.equals(si.getBlockPoolID()); + boolean namespaceIdMatches(FSImage si) { +return namespaceID == si.getStorage().namespaceID; } {code} Considering that namespace ID is an integer, whereas cluster ID is based on a GUID, it seems there is higher likelihood of accidental collision. Then, {{CheckpointSignature#validateStorageInfo}} could misidentify a match. It's still highly unlikely (but non-zero). I'm wondering if a safer change would be (pseudo-code): {code} if namespace ID + cluster ID + blockpool ID are defined on both compare all 3 fields else if only namespace ID is defined on one of them compare only namespace ID {code} This would keep the logic the same for upgrades between 2 post-federation versions, and just change the logic for the case of pre-fed -> post-fed. Or am I being too paranoid? :-) > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569018#comment-13569018 ] Hudson commented on HDFS-4456: -- Integrated in Hadoop-trunk-Commit #3311 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3311/]) HDFS-4456. Add concat to HttpFS and WebHDFS REST API docs. (plamenj2003 via tucu) (Revision 1441603) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441603 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ConcatSourcesParam.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Add concat to HttpFS and WebHDFS REST API docs > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013 ] Andy Isaacson commented on HDFS-4461: - bq. A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well. The cluster in question has about 1.5 million blocks per DN, across 12 datadirs. This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB. bq. ScanInfo is a short lived object, unlike other data structures that are long lived. It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value. In this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it OOMs. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-4456: - Resolution: Fixed Fix Version/s: (was: 3.0.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Plamen. Committed to trunk and branch-2. > Add concat to HttpFS and WebHDFS REST API docs > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4459) command manual dfsadmin missing entry for restoreFailedStorage option
[ https://issues.apache.org/jira/browse/HDFS-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569009#comment-13569009 ] Hadoop QA commented on HDFS-4459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567514/hdfs4459.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3936//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3936//console This message is automatically generated. > command manual dfsadmin missing entry for restoreFailedStorage option > - > > Key: HDFS-4459 > URL: https://issues.apache.org/jira/browse/HDFS-4459 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Thomas Graves >Assignee: Andy Isaacson > Attachments: hdfs4459.txt > > > Generating the latest site docs it doesn't show the -restoreFailedStorage > option under the dfsadmin section of commands_manual.html > Also it appears the table header is concatenated with the first row: > COMMAND_OPTION -report -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-4456: - Summary: Add concat to HttpFS and WebHDFS REST API docs (was: Add concat to WebHDFS REST API) > Add concat to HttpFS and WebHDFS REST API docs > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569001#comment-13569001 ] Alejandro Abdelnur commented on HDFS-4456: -- got it, +1 for https://issues.apache.org/jira/secure/attachment/12567454/HDFS-4456.trunk.patch then. > Add concat to WebHDFS REST API > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568994#comment-13568994 ] Plamen Jeliazkov commented on HDFS-4456: The one that introduces the extra warning had no failing tests. > Add concat to WebHDFS REST API > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568989#comment-13568989 ] Todd Lipcon commented on HDFS-4461: --- Looks like we can cut the memory usage in half again -- storing both the metafile path and the block file path is redundant, since you can always compute the block path from the meta path by chopping off the "_.meta" prefix. Suresh -- we routinely see users with millions of replicas per DN now that 48TB+ configurations have become commodity. Sure, we should also encourage users to use things like HAR to coalesce into larger blocks, but easy wins on DN memory usage are a no-brainer IMO. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4461: --- Description: In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. (was: In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This has been causing out-of-memory conditions for users who pick such long volume paths.) > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568985#comment-13568985 ] Alejandro Abdelnur commented on HDFS-4456: -- No the other way around, the one that introduces an extra warning. > Add concat to WebHDFS REST API > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568984#comment-13568984 ] Plamen Jeliazkov commented on HDFS-4456: The unit test failure associated with the Generics removal patch does not appear to be related by the way; code-wise. I will verify with a full test run on my own local machine though and get back to you with those results. > Add concat to WebHDFS REST API > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb
[ https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568983#comment-13568983 ] Todd Lipcon commented on HDFS-4463: --- Good work figuring this one out. I've seen it once or twice but hadn't been able to track down the bug. > ActiveStandbyElector can join election even before Service HEALTHY, and > results in null data at ActiveBreadCrumb > > > Key: HDFS-4463 > URL: https://issues.apache.org/jira/browse/HDFS-4463 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.2-alpha >Reporter: Vinay >Assignee: Vinay >Priority: Critical > > ActiveStandbyElector can store null at ActiveBreadCrumb in the below race > condition. At further all failovers will fail resulting NPE. > 1. ZKFC restarted. > 2. due to less machine busy, first zk connection is expired even before the > health monitoring returned the status. > 3. On re-establishment transitionToActive will be called, at this time > appData will be null, > 4. So now ActiveBreadCrumb will have null. > 5. After this any failovers will fail throwing > {noformat}java.lang.NullPointerException > at > org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat} > Should not join the election before service is HEALTHY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API
[ https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568982#comment-13568982 ] Plamen Jeliazkov commented on HDFS-4456: Are you saying you would like to push the Generics removal patch then rather than the other one? I will check the tests and make sure it is passing normally with the Generics removal patch. > Add concat to WebHDFS REST API > -- > > Key: HDFS-4456 > URL: https://issues.apache.org/jira/browse/HDFS-4456 > Project: Hadoop HDFS > Issue Type: New Feature > Components: webhdfs >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Plamen Jeliazkov > Fix For: 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3598.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, > HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, > HDFS-4456.trunk.patch > > > HDFS-3598 adds the concat feature to WebHDFS. The REST API should be updated > accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568979#comment-13568979 ] Suresh Srinivas commented on HDFS-4461: --- bq. If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug" I think you are missing the point I made earlier. In the description you say: bq. This has been causing out-of-memory conditions for users who pick such long volume paths. It is not correct to attribute the inefficiency in memory of DirectoryScanner to OOM. So please update the description to say DirectoryScanner can be made more efficient. bq. I saw more than 1 million ScanInfo objects I am interested in seeing the number of blocks in this particular setup and if we are leaking these objects. I am more leaning towards incorrect datanode configuration in the setup where you saw OOM. Can you provide details on what the heap size of datanode is, the number of blocks on the datanode etc.? > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4459) command manual dfsadmin missing entry for restoreFailedStorage option
[ https://issues.apache.org/jira/browse/HDFS-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4459: Status: Patch Available (was: Open) > command manual dfsadmin missing entry for restoreFailedStorage option > - > > Key: HDFS-4459 > URL: https://issues.apache.org/jira/browse/HDFS-4459 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Thomas Graves >Assignee: Andy Isaacson > Attachments: hdfs4459.txt > > > Generating the latest site docs it doesn't show the -restoreFailedStorage > option under the dfsadmin section of commands_manual.html > Also it appears the table header is concatenated with the first row: > COMMAND_OPTION -report -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4450) Duplicate data node on the name node after formatting data node
[ https://issues.apache.org/jira/browse/HDFS-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568973#comment-13568973 ] Suresh Srinivas commented on HDFS-4450: --- Also please provide from your configuration, what you have set to the parameter "dfs.datanode.address" to?s > Duplicate data node on the name node after formatting data node > --- > > Key: HDFS-4450 > URL: https://issues.apache.org/jira/browse/HDFS-4450 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: WenJin Ma > Attachments: exception.bmp, normal.bmp > > Original Estimate: 168h > Remaining Estimate: 168h > > Duplicate data node on the name node after formatting data node。 > When we registered data node,use nodeReg.getXferPort() to find > DatanodeDescriptor. > {code} > DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr( > nodeReg.getIpAddr(), nodeReg.getXferPort()); > {code} > but add data node use node.getIpAddr(). > {code} > /** add node to the map >* return true if the node is added; false otherwise. >*/ > boolean add(DatanodeDescriptor node) { > hostmapLock.writeLock().lock(); > try { > if (node==null || contains(node)) { > return false; > } > > String ipAddr = node.getIpAddr(); > DatanodeDescriptor[] nodes = map.get(ipAddr); > DatanodeDescriptor[] newNodes; > if (nodes==null) { > newNodes = new DatanodeDescriptor[1]; > newNodes[0]=node; > } else { // rare case: more than one datanode on the host > newNodes = new DatanodeDescriptor[nodes.length+1]; > System.arraycopy(nodes, 0, newNodes, 0, nodes.length); > newNodes[nodes.length] = node; > } > map.put(ipAddr, newNodes); > return true; > } finally { > hostmapLock.writeLock().unlock(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568959#comment-13568959 ] Colin Patrick McCabe commented on HDFS-4461: If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run. That's what we should be focusing on here-- the efficiency improvement. After all, that is why I marked this JIRA as "improvement" rather than "bug" :) bq. Or at least the number of ScanInfo objects you saw. I saw more than 1 million {{ScanInfo}} objects. This means that either the number of blocks on the DN is much higher than we recommend, or there is another leak in the {{DirectoryScanner}}. I am trying to get confirmation that the number of blocks is really that high. If it isn't, then we will start looking more closely for memory leaks in the scanner. We've found that the block scanner often delivers the finishing blow to DNs that are already overloaded. This makes sense-- if your heap is already near max size, asking you to allocate a few hundred megabytes might finish you off. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1765) Block Replication should respect under-replication block priority
[ https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-1765: - Target Version/s: 0.23.3, 0.24.0 (was: 0.24.0, 0.23.3) Fix Version/s: 0.23.7 Committed to branch-0.23. > Block Replication should respect under-replication block priority > - > > Key: HDFS-1765 > URL: https://issues.apache.org/jira/browse/HDFS-1765 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Uma Maheswara Rao G > Fix For: 2.0.0-alpha, 0.23.7 > > Attachments: HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.patch, > HDFS-1765.patch, HDFS-1765.pdf, underReplicatedQueue.pdf > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently under-replicated blocks are assigned different priorities depending > on how many replicas a block has. However the replication monitor works on > blocks in a round-robin fashion. So the newly added high priority blocks > won't get replicated until all low-priority blocks are done. One example is > that on decommissioning datanode WebUI we often observe that "blocks with > only decommissioning replicas" do not get scheduled to replicate before other > blocks, so risking data availability if the node is shutdown for repair > before decommission completes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568946#comment-13568946 ] Aaron T. Myers commented on HDFS-4462: -- [~acmurthy] Blocker? Probably not. Pretty good to have? I think so. There's a pretty simple work-around: when upgrading from a pre-federation version of HDFS, blow away your 2NN checkpoint dirs before starting up your 2NN again. A problem will arise if an admin doesn't notice that all of their 2NN checkpoints are failing post-upgrade. Regardless, it's a pretty simple change - I'm hoping it can get committed today. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb
[ https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568938#comment-13568938 ] Colin Patrick McCabe commented on HDFS-4463: moving to HDFS, since it's about ZKFC. > ActiveStandbyElector can join election even before Service HEALTHY, and > results in null data at ActiveBreadCrumb > > > Key: HDFS-4463 > URL: https://issues.apache.org/jira/browse/HDFS-4463 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.2-alpha >Reporter: Vinay >Assignee: Vinay >Priority: Critical > > ActiveStandbyElector can store null at ActiveBreadCrumb in the below race > condition. At further all failovers will fail resulting NPE. > 1. ZKFC restarted. > 2. due to less machine busy, first zk connection is expired even before the > health monitoring returned the status. > 3. On re-establishment transitionToActive will be called, at this time > appData will be null, > 4. So now ActiveBreadCrumb will have null. > 5. After this any failovers will fail throwing > {noformat}java.lang.NullPointerException > at > org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat} > Should not join the election before service is HEALTHY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb
[ https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe moved HADOOP-9275 to HDFS-4463: Component/s: (was: ha) ha Affects Version/s: (was: 2.0.2-alpha) 2.0.2-alpha Key: HDFS-4463 (was: HADOOP-9275) Project: Hadoop HDFS (was: Hadoop Common) > ActiveStandbyElector can join election even before Service HEALTHY, and > results in null data at ActiveBreadCrumb > > > Key: HDFS-4463 > URL: https://issues.apache.org/jira/browse/HDFS-4463 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.0.2-alpha >Reporter: Vinay >Assignee: Vinay >Priority: Critical > > ActiveStandbyElector can store null at ActiveBreadCrumb in the below race > condition. At further all failovers will fail resulting NPE. > 1. ZKFC restarted. > 2. due to less machine busy, first zk connection is expired even before the > health monitoring returned the status. > 3. On re-establishment transitionToActive will be called, at this time > appData will be null, > 4. So now ActiveBreadCrumb will have null. > 5. After this any failovers will fail throwing > {noformat}java.lang.NullPointerException > at > org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171) > at > org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat} > Should not join the election before service is HEALTHY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568937#comment-13568937 ] Arun C Murthy commented on HDFS-4462: - [~atm] Is this a 2.0.3 blocker? Tx. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP
[ https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568929#comment-13568929 ] Aaron T. Myers commented on HDFS-4457: -- Daryn, does Tucu's explanation address your concerns? I think Tucu's latest comment makes sense - you're right that the client should be setting the token service, and in this case the client is effectively doing just that since the server is using the host/port as sent by the client when creating the DT. The patch looks good to me, but I don't want to commit it if you have more pending comments. Please let me know. > WebHDFS obtains/sets delegation token service hostname using wrong config > leading to issues when NN is configured with 0.0.0.0 RPC IP > - > > Key: HDFS-4457 > URL: https://issues.apache.org/jira/browse/HDFS-4457 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 1.1.1, 2.0.2-alpha >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur >Priority: Critical > Attachments: HDFS_4457.patch, HDFS_4457.patch > > > If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then > delegationotkens are configured with 0.0.0.0 as service and this breaks > clients trying to use those tokens. > Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is > SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back > what is being used to resolve getHttpAddress() the NameNodeHttpServer is > resolving the httpAddress doing a httpAddress = new > InetSocketAddress(bindAddress.getAddress(), httpServer.getPort()); > , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from > bindAddress.getAddress(). > Normally (non webhdfs) this is not an issue because it is the responsibility > of the client, but in the case of WebHDFS, WebHDFS does it before returning > the string version of the token (it must be this way because the client may > not be a java client at all and cannot manipulate the DelegationToken as > such). > The solution (thanks to Eric Sammer for helping figure this out) is for > WebHDFS to use the exacty hostname that came in the HTTP request as the > service to set in the delegation tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568930#comment-13568930 ] Suresh Srinivas commented on HDFS-4461: --- bq. we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that java.lang.String was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump. A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well. This patch indeed saves few bytes. However I do not think this is either the cause of the OOME or is likely to solve that issue. ScanInfo is a short lived object, unlike other data structures that are long lived. Can you answer the following question, I previously asked: bq. How many blocks per storage directory do you have, when OOME happened? Or at least the number of ScanInfo objects you saw. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4450) Duplicate data node on the name node after formatting data node
[ https://issues.apache.org/jira/browse/HDFS-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568924#comment-13568924 ] Suresh Srinivas commented on HDFS-4450: --- bq. Can you post the lines from the logs that corresponds to datanode dn0 registration corresponding to before format and after format? I should have been more clear. What I asked for is, from the namenode logs, please get the two registration requests from dn0, one before you shut it down and one after you restart. The log lines should look like: {noformat} 2013-02-01 10:11:10,522 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from DatanodeRegistration(10.28.176.234, storageID=DS-685519412-10.28.176.234-50010-1359684666375, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-fe3b5079-a34a-4912-b8a8-50443d038749;nsid=1321646662;c=0) storage DS-685519412-10.28.176.234-50010-1359684666375 {noformat} > Duplicate data node on the name node after formatting data node > --- > > Key: HDFS-4450 > URL: https://issues.apache.org/jira/browse/HDFS-4450 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: WenJin Ma > Attachments: exception.bmp, normal.bmp > > Original Estimate: 168h > Remaining Estimate: 168h > > Duplicate data node on the name node after formatting data node。 > When we registered data node,use nodeReg.getXferPort() to find > DatanodeDescriptor. > {code} > DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr( > nodeReg.getIpAddr(), nodeReg.getXferPort()); > {code} > but add data node use node.getIpAddr(). > {code} > /** add node to the map >* return true if the node is added; false otherwise. >*/ > boolean add(DatanodeDescriptor node) { > hostmapLock.writeLock().lock(); > try { > if (node==null || contains(node)) { > return false; > } > > String ipAddr = node.getIpAddr(); > DatanodeDescriptor[] nodes = map.get(ipAddr); > DatanodeDescriptor[] newNodes; > if (nodes==null) { > newNodes = new DatanodeDescriptor[1]; > newNodes[0]=node; > } else { // rare case: more than one datanode on the host > newNodes = new DatanodeDescriptor[nodes.length+1]; > System.arraycopy(nodes, 0, newNodes, 0, nodes.length); > newNodes[nodes.length] = node; > } > map.put(ipAddr, newNodes); > return true; > } finally { > hostmapLock.writeLock().unlock(); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568920#comment-13568920 ] Colin Patrick McCabe commented on HDFS-4461: bq. I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened? we analyzed a DN heap dump from a production cluster with eclipse memory analyzer and found that the memory was full of ScanInfo objects. The memory histogram showed that {{java.lang.String}} was the third-largest consumer of memory in the system. Unfortunately I can't share the heap dump. bq. I have hard time understanding the picture. How many bytes are we saving per ScanInfo? In the particular case shown in memory-analysis.png, we save 86 characters in each string. The volume prefix that we avoid storing is {{/home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/}}. Java uses 2 bytes per character (UCS-2 encoding), and we store both metaPath and blockPath, so multiply that by 4 to get 344. Then add the overhead of using two objects File that contain the path string instead of just the string itself-- probably around an extra 16 bytes per object, for 376 bytes in total saved per {{ScanInfo}}. You might think that {{/home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/}} is an unrealistically long volume path, but here is an example of a real volume path in use on a production cluster: {{/mnt/hdfs/hdfs01/10769eef-a23a-4300-b45b-749221786109/dfs/dn}}. Putting the disk UUID into the volume is an obvious thing to do if you're a system administrator. > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
TYCHUN�]T.Y. CHUN ���}���^ is out of the office.
I will be out of the office starting 2013/02/02 and will not return until 2013/02/19. I will respond to your message when I return. --- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568901#comment-13568901 ] Suresh Srinivas commented on HDFS-4461: --- bq. This has been causing out-of-memory conditions for users who pick such long volume paths. I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened? bq. here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class I have hard time understanding the picture. How many bytes are we saving per ScanInfo? > DirectoryScanner: volume path prefix takes up memory for every block that is > scanned > - > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.3-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, > memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. > This object contains two File objects-- one for the metadata file, and one > for the block file. Since those File objects contain full paths, users who > pick a lengthly path for their volume roots will end up using an extra > N_blocks * path_prefix bytes per block scanned. We also don't really need to > store File objects-- storing strings and then creating File objects as needed > would be cheaper. This has been causing out-of-memory conditions for users > who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks
[ https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2476: - Target Version/s: 0.23.3, 0.24.0 (was: 0.24.0, 0.23.3) Fix Version/s: 0.23.7 > More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks > > > Key: HDFS-2476 > URL: https://issues.apache.org/jira/browse/HDFS-2476 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.0 >Reporter: Tomasz Nykiel >Assignee: Tomasz Nykiel > Fix For: 2.0.0-alpha, 0.23.7 > > Attachments: hashStructures.patch, hashStructures.patch-2, > hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, > hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, > hashStructures.patch-9 > > > This patch introduces two hash data structures for storing under-replicated, > over-replicated and invalidated blocks. > 1. LightWeightHashSet > 2. LightWeightLinkedSet > Currently in all these cases we are using java.util.TreeSet which adds > unnecessary overhead. > The main bottlenecks addressed by this patch are: > -cluster instability times, when these queues (especially under-replicated) > tend to grow quite drastically, > -initial cluster startup, when the queues are initialized, after leaving > safemode, > -block reports, > -explicit acks for block addition and deletion > 1. The introduced structures are CPU-optimized. > 2. They shrink and expand according to current capacity. > 3. Add/contains/delete ops are performed in O(1) time (unlike current log n > for TreeSet). > 4. The sets are equipped with fast access methods for polling a number of > elements (get+remove), which are used for handling the queues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks
[ https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568767#comment-13568767 ] Kihwal Lee commented on HDFS-2476: -- Committed to the current branch-0.23. > More CPU efficient data structure for > under-replicated/over-replicated/invalidate blocks > > > Key: HDFS-2476 > URL: https://issues.apache.org/jira/browse/HDFS-2476 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.0 >Reporter: Tomasz Nykiel >Assignee: Tomasz Nykiel > Fix For: 2.0.0-alpha > > Attachments: hashStructures.patch, hashStructures.patch-2, > hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, > hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, > hashStructures.patch-9 > > > This patch introduces two hash data structures for storing under-replicated, > over-replicated and invalidated blocks. > 1. LightWeightHashSet > 2. LightWeightLinkedSet > Currently in all these cases we are using java.util.TreeSet which adds > unnecessary overhead. > The main bottlenecks addressed by this patch are: > -cluster instability times, when these queues (especially under-replicated) > tend to grow quite drastically, > -initial cluster startup, when the queues are initialized, after leaving > safemode, > -block reports, > -explicit acks for block addition and deletion > 1. The introduced structures are CPU-optimized. > 2. They shrink and expand according to current capacity. > 3. Add/contains/delete ops are performed in O(1) time (unlike current log n > for TreeSet). > 4. The sets are equipped with fast access methods for polling a number of > elements (get+remove), which are used for handling the queues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0
[ https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568750#comment-13568750 ] Hudson commented on HDFS-4451: -- Integrated in Hadoop-Mapreduce-trunk #1331 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1331/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > hdfs balancer command returns exit code 1 on success instead of 0 > - > > Key: HDFS-4451 > URL: https://issues.apache.org/jira/browse/HDFS-4451 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.2-alpha > Environment: Centos 6.3, JDK 1.6.0_25 >Reporter: Joshua Blatt > Fix For: 2.0.3-alpha > > Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch > > > Though the org.apache.hadoop.util.Tool interface javadocs indicate > implementations should return 0 on success, the > org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the > int values of this enum instead: > // Exit status > enum ReturnStatus { > SUCCESS(1), > IN_PROGRESS(0), > ALREADY_RUNNING(-1), > NO_MOVE_BLOCK(-2), > NO_MOVE_PROGRESS(-3), > IO_EXCEPTION(-4), > ILLEGAL_ARGS(-5), > INTERRUPTED(-6); > This created an issue for us when we tried to run the hdfs balancer as a cron > job. Cron sends emails whenever a executable it runs exits non-zero. We'd > either have to disable all emails and miss real issues or fix this bug. > I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0. > Marking this change as incompatible because existing scripts which interpret > exit 1 as success will be broken (unless they defensively/liberally interpret > both exit 1 and exit 0 as success). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568746#comment-13568746 ] Hudson commented on HDFS-4151: -- Integrated in Hadoop-Mapreduce-trunk #1331 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1331/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. Contributed by Joshua Blatt. (Revision 1441113) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > Passing INodesInPath instead of INode[] in FSDirectory > -- > > Key: HDFS-4151 > URL: https://issues.apache.org/jira/browse/HDFS-4151 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 3.0.0 > > Attachments: h4151_20121104.patch, h4151_20121105.patch > > > Currently, many methods in FSDirectory pass INode[] as a parameter. It is > better to pass INodesInPath so that we can add more path information later > on. This is especially useful in Snapshot implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4189) rename getter method getMutableX and getXMutable to getXAndEnsureMutable
[ https://issues.apache.org/jira/browse/HDFS-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568734#comment-13568734 ] Hudson commented on HDFS-4189: -- Integrated in Hadoop-Hdfs-Snapshots-Branch-build #88 (See [https://builds.apache.org/job/Hadoop-Hdfs-Snapshots-Branch-build/88/]) HDFS-4189. Renames the getMutableXxx methods to getXxx4Write and fix a bug that some getExistingPathINodes calls should be getINodesInPath4Write. (Revision 1441193) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441193 Files : * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-2802.txt * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java > rename getter method getMutableX and getXMutable to getXAndEnsureMutable > > > Key: HDFS-4189 > URL: https://issues.apache.org/jira/browse/HDFS-4189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: Snapshot (HDFS-2802) > > Attachments: h4189_20130130.patch, h4189_20130131.patch > > > The method names with the form "getMutableXxx" may be misleading. Let's > rename them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4361) When listing snapshottable directories, only return those where the user has permission to take snapshots
[ https://issues.apache.org/jira/browse/HDFS-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568735#comment-13568735 ] Hudson commented on HDFS-4361: -- Integrated in Hadoop-Hdfs-Snapshots-Branch-build #88 (See [https://builds.apache.org/job/Hadoop-Hdfs-Snapshots-Branch-build/88/]) HDFS-4361. When listing snapshottable directories, only return those where the user has permission to take snapshots. Contributed by Jing Zhao (Revision 1441202) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441202 Files : * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-2802.txt * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * /hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshottableDirListing.java > When listing snapshottable directories, only return those where the user has > permission to take snapshots > - > > Key: HDFS-4361 > URL: https://issues.apache.org/jira/browse/HDFS-4361 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Jing Zhao >Assignee: Jing Zhao > Fix For: Snapshot (HDFS-2802) > > Attachments: HDFS-4361.001.patch, HDFS-4361.002.patch, > HDFS-4361.003.patch, HDFS-4361.004.patch > > > Currently, all snapshottable directories are returned for any user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568733#comment-13568733 ] Hadoop QA commented on HDFS-4452: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567572/getAdditionalBlock.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3935//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3935//console This message is automatically generated. > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0
[ https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568729#comment-13568729 ] Hudson commented on HDFS-4451: -- Integrated in Hadoop-Hdfs-trunk #1303 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1303/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > hdfs balancer command returns exit code 1 on success instead of 0 > - > > Key: HDFS-4451 > URL: https://issues.apache.org/jira/browse/HDFS-4451 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.2-alpha > Environment: Centos 6.3, JDK 1.6.0_25 >Reporter: Joshua Blatt > Fix For: 2.0.3-alpha > > Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch > > > Though the org.apache.hadoop.util.Tool interface javadocs indicate > implementations should return 0 on success, the > org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the > int values of this enum instead: > // Exit status > enum ReturnStatus { > SUCCESS(1), > IN_PROGRESS(0), > ALREADY_RUNNING(-1), > NO_MOVE_BLOCK(-2), > NO_MOVE_PROGRESS(-3), > IO_EXCEPTION(-4), > ILLEGAL_ARGS(-5), > INTERRUPTED(-6); > This created an issue for us when we tried to run the hdfs balancer as a cron > job. Cron sends emails whenever a executable it runs exits non-zero. We'd > either have to disable all emails and miss real issues or fix this bug. > I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0. > Marking this change as incompatible because existing scripts which interpret > exit 1 as success will be broken (unless they defensively/liberally interpret > both exit 1 and exit 0 as success). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568725#comment-13568725 ] Hudson commented on HDFS-4151: -- Integrated in Hadoop-Hdfs-trunk #1303 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1303/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. Contributed by Joshua Blatt. (Revision 1441113) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > Passing INodesInPath instead of INode[] in FSDirectory > -- > > Key: HDFS-4151 > URL: https://issues.apache.org/jira/browse/HDFS-4151 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 3.0.0 > > Attachments: h4151_20121104.patch, h4151_20121105.patch > > > Currently, many methods in FSDirectory pass INode[] as a parameter. It is > better to pass INodesInPath so that we can add more path information later > on. This is especially useful in Snapshot implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2495) Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock
[ https://issues.apache.org/jira/browse/HDFS-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568711#comment-13568711 ] Hudson commented on HDFS-2495: -- Integrated in Hadoop-Hdfs-0.23-Build #512 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/]) merge -r 1199023:1199024 Merging from trunk to branch-0.23 to fix HDFS-2495 (Revision 1441249) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441249 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > Increase granularity of write operations in ReplicationMonitor thus reducing > contention for write lock > -- > > Key: HDFS-2495 > URL: https://issues.apache.org/jira/browse/HDFS-2495 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.0 >Reporter: Tomasz Nykiel >Assignee: Tomasz Nykiel > Fix For: 2.0.0-alpha, 0.23.7 > > Attachments: replicationMon.patch, replicationMon.patch-1 > > > For processing blocks in ReplicationMonitor > (BlockManager.computeReplicationWork), we first obtain a list of blocks to be > replicated by calling chooseUnderReplicatedBlocks, and then for each block > which was found, we call computeReplicationWorkForBlock. The latter processes > a block in three stages, acquiring the writelock twice per call: > 1. obtaining block related info (livenodes, srcnode, etc.) under lock > 2. choosing target for replication > 3. scheduling replication (under lock) > We would like to change this behaviour and decrease contention for the write > lock, by batching blocks and executing 1,2,3, for sets of blocks, rather than > for each one separately. This would decrease the number of writeLock to 2, > from 2*numberofblocks. > Also, the info level logging can be pushed outside the writelock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.
[ https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568710#comment-13568710 ] Hudson commented on HDFS-2477: -- Integrated in Hadoop-Hdfs-0.23-Build #512 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/]) Merging r1196676 and r1197801 from trunk to branch-0.23 to fix HDFS-2477 (Revision 1441131) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441131 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java > Optimize computing the diff between a block report and the namenode state. > -- > > Key: HDFS-2477 > URL: https://issues.apache.org/jira/browse/HDFS-2477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 0.23.0 >Reporter: Tomasz Nykiel >Assignee: Tomasz Nykiel > Fix For: 2.0.0-alpha, 0.23.7 > > Attachments: reportDiff.patch, reportDiff.patch-2, > reportDiff.patch-3, reportDiff.patch-4, reportDiff.patch-5 > > > When a block report is processed at the NN, the BlockManager.reportDiff > traverses all blocks contained in the report, and for each one block, which > is also present in the corresponding datanode descriptor, the block is moved > to the head of the list of the blocks in this datanode descriptor. > With HDFS-395 the huge majority of the blocks in the report, are also present > in the datanode descriptor, which means that almost every block in the report > will have to be moved to the head of the list. > Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, > which removes a block from a list and then inserts it. In this process, we > call findDatanode several times (afair 6 times for each moveBlockToHead > call). findDatanode is relatively expensive, since it linearly goes through > the triplets to locate the given datanode. > With this patch, we do some memoization of findDatanode, so we can reclaim 2 > findDatanode calls. Our experiments show that this can improve the reportDiff > (which is executed under write lock) by around 15%. Currently with HDFS-395, > reportDiff is responsible for almost 100% of the block report processing time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports
[ https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568706#comment-13568706 ] Hudson commented on HDFS-395: - Integrated in Hadoop-Hdfs-0.23-Build #512 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/]) merge -r 1161991:1161992 Merging from trunk to branch-0.23 to fix HDFS-395 (Revision 1441117) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441117 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockCommand.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReceivedDeletedBlockInfo.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java > DFS Scalability: Incremental block reports > -- > > Key: HDFS-395 > URL: https://issues.apache.org/jira/browse/HDFS-395 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: dhruba borthakur >Assignee: Tomasz Nykiel > Fix For: 2.0.0-alpha, 0.23.7 > > Attachments: blockReportPeriod.patch, explicitAcks.patch-3, > explicitAcks.patch-4, explicitAcks.patch-5, explicitAcks.patch-6, > explicitDeleteAcks.patch > > > I have a cluster that has 1800 datanodes. Each datanode has around 5 > blocks and sends a block report to the namenode once every hour. This means > that the namenode processes a block report once every 2 seconds. Each block > report contains all blocks that the datanode currently hosts. This makes the > namenode compare a huge number of blocks that practically remains the same > between two consecutive reports. This wastes CPU on the namenode. > The problem becomes worse when the number of datanodes increases. > One proposal is to make succeeding block reports (after a successful send of > a full block report) be incremental. This will make the namenode process only > those blocks that were added/deleted in the last period. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.
[ https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-4452: -- Attachment: getAdditionalBlock.patch The same chamges now with the test case which succeeds with the patch and fails on trunk with the expected error: {code} java.lang.AssertionError: Must be one block expected:<1> but was:<2> {code} > getAdditionalBlock() can create multiple blocks if the client times out and > retries. > > > Key: HDFS-4452 > URL: https://issues.apache.org/jira/browse/HDFS-4452 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Critical > Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, > TestAddBlockRetry.java > > > HDFS client tries to addBlock() to a file. If NameNode is busy the client can > timeout and will reissue the same request again. The two requests will race > with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in > creating two new blocks on the NameNode while the client will know of only > one of them. This eventually results in {{NotReplicatedYetException}} because > the extra block is never reported by any DataNode, which stalls file creation > and puts it in invalid state with an empty block in the middle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data
[ https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568655#comment-13568655 ] Liang Xie commented on HDFS-347: Hi [~cmccabe], would you mind giving a patch against branch-2 if possible ? It'll be appreciated:) I could be a volunteer to do a simple performance test on our hbase test cluster which is built with branch-2, to see whether there is a performance imporement on application-side or not, thanks in advance. > DFS read performance suboptimal when client co-located on nodes with data > - > > Key: HDFS-347 > URL: https://issues.apache.org/jira/browse/HDFS-347 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client, performance >Reporter: George Porter >Assignee: Colin Patrick McCabe > Attachments: 2013.01.28.design.pdf, 2013.01.31.consolidated2.patch, > 2013.01.31.consolidated.patch, all.tsv, BlockReaderLocal1.txt, full.patch, > HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, > HDFS-347-016_cleaned.patch, HDFS-347.016.patch, HDFS-347.017.clean.patch, > HDFS-347.017.patch, HDFS-347.018.clean.patch, HDFS-347.018.patch2, > HDFS-347.019.patch, HDFS-347.020.patch, HDFS-347.021.patch, > HDFS-347.022.patch, HDFS-347.024.patch, HDFS-347.025.patch, > HDFS-347.026.patch, HDFS-347.027.patch, HDFS-347.029.patch, > HDFS-347.030.patch, HDFS-347.033.patch, HDFS-347.035.patch, > HDFS-347-branch-20-append.txt, hdfs-347-merge.txt, hdfs-347-merge.txt, > hdfs-347-merge.txt, hdfs-347.png, hdfs-347.txt, local-reads-doc > > > One of the major strategies Hadoop uses to get scalable data processing is to > move the code to the data. However, putting the DFS client on the same > physical node as the data blocks it acts on doesn't improve read performance > as much as expected. > After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem > is due to the HDFS streaming protocol causing many more read I/O operations > (iops) than necessary. Consider the case of a DFSClient fetching a 64 MB > disk block from the DataNode process (running in a separate JVM) running on > the same machine. The DataNode will satisfy the single disk block request by > sending data back to the HDFS client in 64-KB chunks. In BlockSender.java, > this is done in the sendChunk() method, relying on Java's transferTo() > method. Depending on the host O/S and JVM implementation, transferTo() is > implemented as either a sendfilev() syscall or a pair of mmap() and write(). > In either case, each chunk is read from the disk by issuing a separate I/O > operation for each chunk. The result is that the single request for a 64-MB > block ends up hitting the disk as over a thousand smaller requests for 64-KB > each. > Since the DFSClient runs in a different JVM and process than the DataNode, > shuttling data from the disk to the DFSClient also results in context > switches each time network packets get sent (in this case, the 64-kb chunk > turns into a large number of 1500 byte packet send operations). Thus we see > a large number of context switches for each block send operation. > I'd like to get some feedback on the best way to address this, but I think > providing a mechanism for a DFSClient to directly open data blocks that > happen to be on the same machine. It could do this by examining the set of > LocatedBlocks returned by the NameNode, marking those that should be resident > on the local host. Since the DataNode and DFSClient (probably) share the > same hadoop configuration, the DFSClient should be able to find the files > holding the block data, and it could directly open them and send data back to > the client. This would avoid the context switches imposed by the network > layer, and would allow for much larger read buffers than 64KB, which should > reduce the number of iops imposed by each read block operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory
[ https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568647#comment-13568647 ] Hudson commented on HDFS-4151: -- Integrated in Hadoop-Yarn-trunk #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/114/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. Contributed by Joshua Blatt. (Revision 1441113) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > Passing INodesInPath instead of INode[] in FSDirectory > -- > > Key: HDFS-4151 > URL: https://issues.apache.org/jira/browse/HDFS-4151 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 3.0.0 > > Attachments: h4151_20121104.patch, h4151_20121105.patch > > > Currently, many methods in FSDirectory pass INode[] as a parameter. It is > better to pass INodesInPath so that we can add more path information later > on. This is especially useful in Snapshot implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0
[ https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568651#comment-13568651 ] Hudson commented on HDFS-4451: -- Integrated in Hadoop-Yarn-trunk #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/114/]) Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to incompatible section. (Revision 1441123) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > hdfs balancer command returns exit code 1 on success instead of 0 > - > > Key: HDFS-4451 > URL: https://issues.apache.org/jira/browse/HDFS-4451 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 2.0.2-alpha > Environment: Centos 6.3, JDK 1.6.0_25 >Reporter: Joshua Blatt > Fix For: 2.0.3-alpha > > Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch > > > Though the org.apache.hadoop.util.Tool interface javadocs indicate > implementations should return 0 on success, the > org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the > int values of this enum instead: > // Exit status > enum ReturnStatus { > SUCCESS(1), > IN_PROGRESS(0), > ALREADY_RUNNING(-1), > NO_MOVE_BLOCK(-2), > NO_MOVE_PROGRESS(-3), > IO_EXCEPTION(-4), > ILLEGAL_ARGS(-5), > INTERRUPTED(-6); > This created an issue for us when we tried to run the hdfs balancer as a cron > job. Cron sends emails whenever a executable it runs exits non-zero. We'd > either have to disable all emails and miss real issues or fix this bug. > I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0. > Marking this change as incompatible because existing scripts which interpret > exit 1 as success will be broken (unless they defensively/liberally interpret > both exit 1 and exit 0 as success). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
[ https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568560#comment-13568560 ] Hadoop QA commented on HDFS-4462: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567534/HDFS-4462.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//console This message is automatically generated. > 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation > version of HDFS > --- > > Key: HDFS-4462 > URL: https://issues.apache.org/jira/browse/HDFS-4462 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Attachments: HDFS-4462.patch, HDFS-4462.patch > > > The 2NN currently has logic to detect when its on-disk FS metadata needs an > upgrade with respect to the NN's metadata (i.e. the layout versions are > different) and in this case it will proceed with the checkpoint despite > storage signatures not matching precisely if the BP ID and Cluster ID do > match exactly. However, in situations where we're upgrading from versions of > HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints > will always fail with an error like the following: > {noformat} > 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent > checkpoint fields. > LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = > CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = > BP-1520616013-172.21.3.106-1359680537136. > Expecting respectively: -19; 403832480; 0; ; . > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira