[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569413#comment-13569413
 ] 

Hadoop QA commented on HDFS-4404:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567696/hdfs-4404.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3941//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3941//console

This message is automatically generated.

> Create file failure when the machine of first attempted NameNode is down
> 
>
> Key: HDFS-4404
> URL: https://issues.apache.org/jira/browse/HDFS-4404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: liaowenrui
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, 
> hdfs-4404.txt, hdfs-4404.txt
>
>
> test Environment: NN1,NN2,DN1,DN2,DN3
> machine1:NN1,DN1
> machine2:NN2,DN2
> machine3:DN3
> mathine1 is down.
> 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - 
> Connecting to /160.161.0.155:8020
> 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing 
> ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting 
> for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
> java.net.SocketTimeoutException: 1 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568)
>  at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1156)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>  at $Proxy10.create(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715)
>  at test.TestLease.mai

[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569385#comment-13569385
 ] 

Hudson commented on HDFS-4452:
--

Integrated in Hadoop-trunk-Commit #3314 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3314/])
HDFS-4452. getAdditionalBlock() can create multiple blocks if the client 
times out and retries. Contributed by Konstantin Shvachko. (Revision 1441681)

 Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441681
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java


> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Fix For: 2.0.3-alpha
>
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha
   Status: Resolved  (was: Patch Available)

I just committed this.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Fix For: 2.0.3-alpha
>
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569379#comment-13569379
 ] 

Hadoop QA commented on HDFS-4452:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12567681/getAdditionalBlock.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3940//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3940//console

This message is automatically generated.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569371#comment-13569371
 ] 

Hadoop QA commented on HDFS-4462:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567678/HDFS-4462.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3939//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3939//console

This message is automatically generated.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, 
> HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot

2013-02-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-4464.
--

   Resolution: Fixed
Fix Version/s: Snapshot (HDFS-2802)
 Hadoop Flags: Reviewed

I have committed this.

> Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
> 
>
> Key: HDFS-4464
> URL: https://issues.apache.org/jira/browse/HDFS-4464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4464_20120201b.patch, h4464_20120201.patch
>
>
> Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive 
> methods for deleting inodes and collecting blocks for further block 
> deletion/update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down

2013-02-01 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569366#comment-13569366
 ] 

Aaron T. Myers commented on HDFS-4404:
--

The latest patch looks good to me. +1 pending Jenkins.

> Create file failure when the machine of first attempted NameNode is down
> 
>
> Key: HDFS-4404
> URL: https://issues.apache.org/jira/browse/HDFS-4404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: liaowenrui
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, 
> hdfs-4404.txt, hdfs-4404.txt
>
>
> test Environment: NN1,NN2,DN1,DN2,DN3
> machine1:NN1,DN1
> machine2:NN2,DN2
> machine3:DN3
> mathine1 is down.
> 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - 
> Connecting to /160.161.0.155:8020
> 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing 
> ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting 
> for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
> java.net.SocketTimeoutException: 1 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568)
>  at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1156)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>  at $Proxy10.create(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715)
>  at test.TestLease.main(TestLease.java:45)
> 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC 
> Client (31594013) connection to /160.161.0.155:8020 from 
> hdfs/had...@hadoop.com: closed
> 2013-01-12 09:52:47,834 WARN  retry.RetryInvocationHandler 
> (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. 
> Not retrying because the invoked method is not idempotent, and unable to 
> determine whether it was invoked
> java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to 
> vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 
> 1 millis timeout while waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending 
> remote=/160.161.0.155:8020]; For more details see:  
> http://wiki.apache.org/hadoop/SocketTimeout
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1180)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTran

[jira] [Updated] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down

2013-02-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-4404:
--

Attachment: hdfs-4404.txt

Fix the javadoc warning (missed a '}' character)

> Create file failure when the machine of first attempted NameNode is down
> 
>
> Key: HDFS-4404
> URL: https://issues.apache.org/jira/browse/HDFS-4404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: liaowenrui
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, 
> hdfs-4404.txt, hdfs-4404.txt
>
>
> test Environment: NN1,NN2,DN1,DN2,DN3
> machine1:NN1,DN1
> machine2:NN2,DN2
> machine3:DN3
> mathine1 is down.
> 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - 
> Connecting to /160.161.0.155:8020
> 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing 
> ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting 
> for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
> java.net.SocketTimeoutException: 1 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568)
>  at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1156)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>  at $Proxy10.create(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715)
>  at test.TestLease.main(TestLease.java:45)
> 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC 
> Client (31594013) connection to /160.161.0.155:8020 from 
> hdfs/had...@hadoop.com: closed
> 2013-01-12 09:52:47,834 WARN  retry.RetryInvocationHandler 
> (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. 
> Not retrying because the invoked method is not idempotent, and unable to 
> determine whether it was invoked
> java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to 
> vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 
> 1 millis timeout while waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending 
> remote=/160.161.0.155:8020]; For more details see:  
> http://wiki.apache.org/hadoop/SocketTimeout
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1180)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.N

[jira] [Commented] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature

2013-02-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569354#comment-13569354
 ] 

Colin Patrick McCabe commented on HDFS-4453:


bq. Regarding /var/lib/hadoop-hdfs vs /var/run/hadoop-hdfs – why's it 
problematic if /var/run is a tmpfs? We shouldn't need it to persist 
cross-reboot, and /var/run is generally not cleaned by a tmpwatch process. 
tmpfs is also better in that it will continue to work even if a local disk dies.

If the /var/run/hadoop-hdfs directory gets removed, hdfs itself can't recreate 
it (you need root permissions to create a directory in /var/run.)  So after a 
reboot, things would probably stop working. 

> Make a simple doc to describe the usage and design of the shortcircuit read 
> feature
> ---
>
> Key: HDFS-4453
> URL: https://issues.apache.org/jira/browse/HDFS-4453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Brandon Li
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch
>
>
> It would be nice to have a document to describe the configuration and design 
> of this feature. Also its relationship with previous short circuit read 
> implementation(HDFS-2246), for example, can they co-exist, or this one is 
> planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix 
> domain socket is not supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot

2013-02-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4464:
-

Attachment: h4464_20120201b.patch

Sure, destroySubtreeAndCollectBlocks sounds like a better name.

h4464_20120201b.patch: renames the method and also revises the javadoc.

> Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
> 
>
> Key: HDFS-4464
> URL: https://issues.apache.org/jira/browse/HDFS-4464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4464_20120201b.patch, h4464_20120201.patch
>
>
> Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive 
> methods for deleting inodes and collecting blocks for further block 
> deletion/update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569341#comment-13569341
 ] 

Chris Nauroth commented on HDFS-4462:
-

+1 for the new patch

I confirmed that it fixed the test failure.


> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, 
> HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569329#comment-13569329
 ] 

Suresh Srinivas commented on HDFS-4465:
---

Aaron, given you have worked on it, if you want, feel free to assign this jira 
to yourself. 

> Optimize datanode ReplicasMap and ReplicaInfo
> -
>
> Key: HDFS-4465
> URL: https://issues.apache.org/jira/browse/HDFS-4465
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: dn-memory-improvements.patch
>
>
> In Hadoop a lot of optimization has been done in namenode data structures to 
> be memory efficient. Similar optimizations are necessary for Datanode 
> process. With the growth in storage per datanode and number of blocks hosted 
> on datanode, this jira intends to optimize long lived ReplicasMap and 
> ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature

2013-02-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569328#comment-13569328
 ] 

Todd Lipcon commented on HDFS-4453:
---

Section 4: I think you have some typos there:
- you mean GetBlockLocalPathInfo, not GetBlockLocations
- you said 'clients' twice where I think one of them should read 'servers'

{code}
+  To configure short-circuit local reads, you will need to put
+  <<>> in your <<>>.  You can check if you have 
done
+  this by running 
+
{code}

Users don't generally set this via {{LD_LIBRARY_PATH}}. Instead, it goes in the 
appropriate directory inside the install tree. Do we have other docs already 
about how to enable the native code? Might be better to refer to those.


- Regarding {{/var/lib/hadoop-hdfs}} vs {{/var/run/hadoop-hdfs}} -- why's it 
problematic if /var/run is a tmpfs? We shouldn't need it to persist 
cross-reboot, and /var/run is generally _not_ cleaned by a tmpwatch process. 
tmpfs is also better in that it will continue to work even if a local disk dies.

- In the example config, I would not use the _PORT trick - it only really makes 
sense for dev setups like the minicluster, and otherwise may just confuse the 
user.

- Please specify that you need to set these two configurations both on clients 
and on servers.

> Make a simple doc to describe the usage and design of the shortcircuit read 
> feature
> ---
>
> Key: HDFS-4453
> URL: https://issues.apache.org/jira/browse/HDFS-4453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client
>Reporter: Brandon Li
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch
>
>
> It would be nice to have a document to describe the configuration and design 
> of this feature. Also its relationship with previous short circuit read 
> implementation(HDFS-2246), for example, can they co-exist, or this one is 
> planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix 
> domain socket is not supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot

2013-02-01 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569326#comment-13569326
 ] 

Jing Zhao commented on HDFS-4464:
-

The name of deleteSubtreeAndCollectBlocks may be a little bit of confusing, 
since when the parameter snapshot is null the function is more like a 
destructor of the subtree. Maybe we can rename the function and make its 
javadoc more clear. 

Besides of that +1 for the patch. 

> Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
> 
>
> Key: HDFS-4464
> URL: https://issues.apache.org/jira/browse/HDFS-4464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4464_20120201.patch
>
>
> Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive 
> methods for deleting inodes and collecting blocks for further block 
> deletion/update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP

2013-02-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569307#comment-13569307
 ] 

Alejandro Abdelnur commented on HDFS-4457:
--

Daryn,

Regarding your concern about having a proxy in the middle, I don't see that as 
a problem, when you have an HTTP proxy in between, the client still targets the 
real server hostname, it is the HTTP stack in the client the one that redirects 
the request to the proxy with the real server hostname used by the client. Then 
in the server side (webhdfsNN) from the HTTP request you can infer the exact 
name of the server used by the client (proxy or not).

Regarding NAT in the middle doing port redirection, that should not be an issue 
either as the host:port information used by the client is transmitted in the 
HTTP 'Host' header which contains both host:port used by the client when 
opening the connection (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html, 
section '14.23 Host') and this is required in HTTP/1.1.

Regarding your second concern, Is that really a problem, tokens are transient 
and I would not expect the to be valid across system updates.

On the *fetchdt*, got it. Still, that requires spawning a JVM to get the token.



> WebHDFS obtains/sets delegation token service hostname using wrong config 
> leading to issues when NN is configured with 0.0.0.0 RPC IP
> -
>
> Key: HDFS-4457
> URL: https://issues.apache.org/jira/browse/HDFS-4457
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Attachments: HDFS_4457.patch, HDFS_4457.patch
>
>
> If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then 
> delegationotkens are configured with 0.0.0.0 as service and this breaks 
> clients trying to use those tokens.
> Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is 
> SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back 
> what is being used to resolve getHttpAddress() the NameNodeHttpServer is 
> resolving the httpAddress doing a httpAddress = new 
> InetSocketAddress(bindAddress.getAddress(), httpServer.getPort());
> , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from 
> bindAddress.getAddress().
> Normally (non webhdfs) this is not an issue because it is the responsibility 
> of the client, but in the case of WebHDFS, WebHDFS does it before returning 
> the string version of the token (it must be this way because the client may 
> not be a java client at all and cannot manipulate the DelegationToken as 
> such).
> The solution (thanks to Eric Sammer for helping figure this out) is for 
> WebHDFS to use the exacty hostname that came in the HTTP request as the 
> service to set in the delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

Attachment: getAdditionalBlock.patch

Patch for trunk. Incorporated Cos's comments. Corrected couple comment and log 
messages, minor code cleanup.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

Attachment: getAdditionalBlock-branch2.patch

Patch for branch 2. Somebody "conveniently" renamed local variables and changed 
types.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

Status: Open  (was: Patch Available)

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock-branch2.patch, 
> getAdditionalBlock.patch, getAdditionalBlock.patch, TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4462:
-

Attachment: HDFS-4462.patch

Missed a test failure from the last patch. This patch should fix the test 
failure,

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch, 
> HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569265#comment-13569265
 ] 

Hadoop QA commented on HDFS-4404:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567643/hdfs-4404.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3938//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3938//console

This message is automatically generated.

> Create file failure when the machine of first attempted NameNode is down
> 
>
> Key: HDFS-4404
> URL: https://issues.apache.org/jira/browse/HDFS-4404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: liaowenrui
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, 
> hdfs-4404.txt
>
>
> test Environment: NN1,NN2,DN1,DN2,DN3
> machine1:NN1,DN1
> machine2:NN2,DN2
> machine3:DN3
> mathine1 is down.
> 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - 
> Connecting to /160.161.0.155:8020
> 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing 
> ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting 
> for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
> java.net.SocketTimeoutException: 1 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568)
>  at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1156)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>  at $Proxy10.create(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715)
>  at test.TestLease.main(TestLease.

[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569238#comment-13569238
 ] 

Hadoop QA commented on HDFS-4462:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567641/HDFS-4462.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.TestStartupOptionUpgrade

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3937//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3937//console

This message is automatically generated.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-02-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4465:
-

Attachment: dn-memory-improvements.patch

Hey Suresh, thanks a lot for filing this issue. A little while back I threw 
together a few changes to see how much memory overhead improvement we could get 
in the DN with minimal effort. Here's a little patch (not necessarily ready for 
commit) which shows the changes I made. This patch does three things:

# Reduce the number of repeated String/char[] objects by storing a single 
reference to a base path and then per replica it stores an int[] containing 
integers denoting the subdirs from base dir to replica file, e.g. "1, 34, 2".
# Switch to using the LighWeightGSet instead of standard java.util structures 
where possible in the DN. We already did this in the NN, but with a little 
adaptation we can do it for some of the DN's data structures as well.
# Intern File objects where possible. Even though interning repeated 
Strings/char[] underlying file objects is a step in the right direction, we can 
do a little bit better by doing our own interning of File objects to further 
reduce overhead from repeated objects.

Using this patch I was able to see per-replica heap usage go from ~650 bytes 
per replica in my test setup to ~250 bytes per replica.

Feel free to take this patch and run with it, use it for ideas, or ignore it 
entirely.

> Optimize datanode ReplicasMap and ReplicaInfo
> -
>
> Key: HDFS-4465
> URL: https://issues.apache.org/jira/browse/HDFS-4465
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: dn-memory-improvements.patch
>
>
> In Hadoop a lot of optimization has been done in namenode data structures to 
> be memory efficient. Similar optimizations are necessary for Datanode 
> process. With the growth in storage per datanode and number of blocks hosted 
> on datanode, this jira intends to optimize long lived ReplicasMap and 
> ReplicaInfo objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569213#comment-13569213
 ] 

Konstantin Boudnik commented on HDFS-4452:
--

The patch looks mostly good. A few minor comments:
- the logger in the class should be initialized for {{TestAddBlockRetry}}, not 
{{TestFSDirectory}}
- formatting only change
{noformat}
   LocatedBlock getAdditionalBlock(String src,
- String clientName,
- ExtendedBlock previous,
- HashMap excludedNodes
- ) 
+  String clientName,
+  ExtendedBlock previous,
+  HashMap excludedNodes)
{noformat}

Test is failing without the corresponding change in the code, so it seems right 
on the money.
+1

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file

2013-02-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-3119:
-

Fix Version/s: 0.23.7

Committed to branch-0.23.

> Overreplicated block is not deleted even after the replication factor is 
> reduced after sync follwed by closing that file
> 
>
> Key: HDFS-3119
> URL: https://issues.apache.org/jira/browse/HDFS-3119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.24.0
>Reporter: J.Andreina
>Assignee: Ashish Singhi
>Priority: Minor
>  Labels: patch
> Fix For: 0.24.0, 2.0.0-alpha, 0.23.7
>
> Attachments: HDFS-3119-1.patch, HDFS-3119-1.patch, HDFS-3119.patch
>
>
> cluster setup:
> --
> 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB
> step1: write a file "filewrite.txt" of size 90bytes with sync(not closed) 
> step2: change the replication factor to 1  using the command: "./hdfs dfs 
> -setrep 1 /filewrite.txt"
> step3: close the file
> * At the NN side the file "Decreasing replication from 2 to 1 for 
> /filewrite.txt" , logs has occured but the overreplicated blocks are not 
> deleted even after the block report is sent from DN
> * while listing the file in the console using "./hdfs dfs -ls " the 
> replication factor for that file is mentioned as 1
> * In fsck report for that files displays that the file is replicated to 2 
> datanodes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot

2013-02-01 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4464:
-

Attachment: h4464_20120201.patch

h4464_20120201.patch: combine those methods to deleteSubtreeAndCollectBlocks.

> Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot
> 
>
> Key: HDFS-4464
> URL: https://issues.apache.org/jira/browse/HDFS-4464
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h4464_20120201.patch
>
>
> Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive 
> methods for deleting inodes and collecting blocks for further block 
> deletion/update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4465) Optimize datanode ReplicasMap and ReplicaInfo

2013-02-01 Thread Suresh Srinivas (JIRA)
Suresh Srinivas created HDFS-4465:
-

 Summary: Optimize datanode ReplicasMap and ReplicaInfo
 Key: HDFS-4465
 URL: https://issues.apache.org/jira/browse/HDFS-4465
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas


In Hadoop a lot of optimization has been done in namenode data structures to be 
memory efficient. Similar optimizations are necessary for Datanode process. 
With the growth in storage per datanode and number of blocks hosted on 
datanode, this jira intends to optimize long lived ReplicasMap and ReplicaInfo 
objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569198#comment-13569198
 ] 

Suresh Srinivas commented on HDFS-4461:
---

I think my earlier comments perhaps are not clear. Let me give it another try :)

+1 for optimizing the data structures in datanode.

bq. Suresh – we routinely see users with millions of replicas per DN now that 
48TB+ configurations have become commodity. Sure, we should also encourage 
users to use things like HAR to coalesce into larger blocks, but easy wins on 
DN memory usage are a no-brainer IMO.
This is again not the point I am making either. I know and understand that 
number of blocks in DN is growing. Data structures in datanode need to be 
optimized. At the same time, as the DNs support more storage, the DN heap also 
needs to be suitably increased.

What my previous comments are related to the assertion that DirectoryScanner is 
causing OOM. OOM is not caused by the scanner. It is caused by incorrectly 
sizing the datanode JVM heap, unless one shows a leak in DirectoryScanner. So 
the comment was to edit the description to reflect it.

We need to also optimize the long lived data structures in datanode. I thought 
one would start with that instead of DirectoryScanner, which creates short 
lived objects. Create HDFS-4465 to track that.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics

2013-02-01 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated HDFS-:


Fix Version/s: 0.23.7

> Add space between total transaction time and number of transactions in 
> FSEditLog#printStatistics
> 
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Trivial
> Fix For: 1.2.0, 2.0.3-alpha, 0.23.7
>
> Attachments: HDFS-.patch.001, HDFS-.patch.branch-1
>
>
> Currently, when we log statistics, we see something like
> {code}
> 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total 
> time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number 
> of syncs: 0 SyncTimes(ms): 0
> {code}
> Notice how the value for total transactions time and "Number of transactions 
> batched in Syncs" needs a space to separate them.
> FSEditLog#printStatistics:
> {code}
>   private void printStatistics(boolean force) {
> long now = now();
> if (lastPrintTime + 6 > now && !force) {
>   return;
> }
> lastPrintTime = now;
> StringBuilder buf = new StringBuilder();
> buf.append("Number of transactions: ");
> buf.append(numTransactions);
> buf.append(" Total time for transactions(ms): ");
> buf.append(totalTimeTransactions);
> buf.append("Number of transactions batched in Syncs: ");
> buf.append(numTransactionsBatchedInSync);
> buf.append(" Number of syncs: ");
> buf.append(editLogStream.getNumSync());
> buf.append(" SyncTimes(ms): ");
> buf.append(journalSet.getSyncTimes());
> LOG.info(buf);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP

2013-02-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569183#comment-13569183
 ] 

Daryn Sharp commented on HDFS-4457:
---

Ignoring the issue of the client relying on the server to set the service - 
which is what I don't approve of since it's a big step backwards - you still 
have the problem if a proxy is between the client and the webhdfs server.  The 
token will contain the hostname that the proxy used to contact the server, not 
the hostname the client used to contact the proxy.  The proxy, or even some 
form of NAT may be redirecting the port.  The server doesn't know this, only 
the client knows what port it thinks it contacted.

The remote server also doesn't have the ability to know if the client has 
use_ip enabled or disabled.  

Basically, only the client that requested the token knows the exact host:port 
authority it used to request the token.  When it attempts to re-contact that 
service, it needs to match the service with the authority.

My second concern is that you must be assuming the key to store the token in 
the credentials.  It currently happens to be the token's service, but it's a 
private implementation detail.  If the key format changes, and the passed-along 
token is added to the credentials with the old format, then job submission will 
attempt to reacquire the token and fail.  Fetchdt solves this by allowing you 
to acquire tokens and opaquely pass them along in binary form.

What error are you encountering with fetchdt?  It's working for me on a 
production cluster:
{noformat}
$ hdfs fetchdt -fs webhdfs://host /tmp/tokens
Fetched token for host:50070 into file:/tmp/tokens
{noformat}

> WebHDFS obtains/sets delegation token service hostname using wrong config 
> leading to issues when NN is configured with 0.0.0.0 RPC IP
> -
>
> Key: HDFS-4457
> URL: https://issues.apache.org/jira/browse/HDFS-4457
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Attachments: HDFS_4457.patch, HDFS_4457.patch
>
>
> If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then 
> delegationotkens are configured with 0.0.0.0 as service and this breaks 
> clients trying to use those tokens.
> Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is 
> SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back 
> what is being used to resolve getHttpAddress() the NameNodeHttpServer is 
> resolving the httpAddress doing a httpAddress = new 
> InetSocketAddress(bindAddress.getAddress(), httpServer.getPort());
> , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from 
> bindAddress.getAddress().
> Normally (non webhdfs) this is not an issue because it is the responsibility 
> of the client, but in the case of WebHDFS, WebHDFS does it before returning 
> the string version of the token (it must be this way because the client may 
> not be a java client at all and cannot manipulate the DelegationToken as 
> such).
> The solution (thanks to Eric Sammer for helping figure this out) is for 
> WebHDFS to use the exacty hostname that came in the HTTP request as the 
> service to set in the delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics

2013-02-01 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated HDFS-:


Fix Version/s: (was: 0.23.7)

> Add space between total transaction time and number of transactions in 
> FSEditLog#printStatistics
> 
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Trivial
> Fix For: 1.2.0, 2.0.3-alpha
>
> Attachments: HDFS-.patch.001, HDFS-.patch.branch-1
>
>
> Currently, when we log statistics, we see something like
> {code}
> 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total 
> time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number 
> of syncs: 0 SyncTimes(ms): 0
> {code}
> Notice how the value for total transactions time and "Number of transactions 
> batched in Syncs" needs a space to separate them.
> FSEditLog#printStatistics:
> {code}
>   private void printStatistics(boolean force) {
> long now = now();
> if (lastPrintTime + 6 > now && !force) {
>   return;
> }
> lastPrintTime = now;
> StringBuilder buf = new StringBuilder();
> buf.append("Number of transactions: ");
> buf.append(numTransactions);
> buf.append(" Total time for transactions(ms): ");
> buf.append(totalTimeTransactions);
> buf.append("Number of transactions batched in Syncs: ");
> buf.append(numTransactionsBatchedInSync);
> buf.append(" Number of syncs: ");
> buf.append(editLogStream.getNumSync());
> buf.append(" SyncTimes(ms): ");
> buf.append(journalSet.getSyncTimes());
> LOG.info(buf);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4444) Add space between total transaction time and number of transactions in FSEditLog#printStatistics

2013-02-01 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated HDFS-:


Fix Version/s: 0.23.7

> Add space between total transaction time and number of transactions in 
> FSEditLog#printStatistics
> 
>
> Key: HDFS-
> URL: https://issues.apache.org/jira/browse/HDFS-
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Stephen Chu
>Assignee: Stephen Chu
>Priority: Trivial
> Fix For: 1.2.0, 2.0.3-alpha, 0.23.7
>
> Attachments: HDFS-.patch.001, HDFS-.patch.branch-1
>
>
> Currently, when we log statistics, we see something like
> {code}
> 13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total 
> time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number 
> of syncs: 0 SyncTimes(ms): 0
> {code}
> Notice how the value for total transactions time and "Number of transactions 
> batched in Syncs" needs a space to separate them.
> FSEditLog#printStatistics:
> {code}
>   private void printStatistics(boolean force) {
> long now = now();
> if (lastPrintTime + 6 > now && !force) {
>   return;
> }
> lastPrintTime = now;
> StringBuilder buf = new StringBuilder();
> buf.append("Number of transactions: ");
> buf.append(numTransactions);
> buf.append(" Total time for transactions(ms): ");
> buf.append(totalTimeTransactions);
> buf.append("Number of transactions batched in Syncs: ");
> buf.append(numTransactionsBatchedInSync);
> buf.append(" Number of syncs: ");
> buf.append(editLogStream.getNumSync());
> buf.append(" SyncTimes(ms): ");
> buf.append(journalSet.getSyncTimes());
> LOG.info(buf);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4464) Combine collectSubtreeBlocksAndClear with deleteDiffsForSnapshot

2013-02-01 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-4464:


 Summary: Combine collectSubtreeBlocksAndClear with 
deleteDiffsForSnapshot
 Key: HDFS-4464
 URL: https://issues.apache.org/jira/browse/HDFS-4464
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


Both collectSubtreeBlocksAndClear and deleteDiffsForSnapshot are recursive 
methods for deleting inodes and collecting blocks for further block 
deletion/update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569151#comment-13569151
 ] 

Chris Nauroth commented on HDFS-4462:
-

+1 for the new patch

Tests pass with the new patch too.

Thank you for addressing the extremely paranoid feedback.  :-)


> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4404) Create file failure when the machine of first attempted NameNode is down

2013-02-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-4404:
--

Attachment: hdfs-4404.txt

Attached patch adds a unit test and addresses some of the feedback above.

Uma -- I didn't change the "Local Exception" wrapping case to use the new code, 
since that would be a behavioral change which I think is outside the scope of 
this bug fix.

> Create file failure when the machine of first attempted NameNode is down
> 
>
> Key: HDFS-4404
> URL: https://issues.apache.org/jira/browse/HDFS-4404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: liaowenrui
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: HDFS-4404.patch, hdfs-4404.txt, hdfs-4404.txt, 
> hdfs-4404.txt
>
>
> test Environment: NN1,NN2,DN1,DN2,DN3
> machine1:NN1,DN1
> machine2:NN2,DN2
> machine3:DN3
> mathine1 is down.
> 2013-01-12 09:51:21,248 DEBUG ipc.Client (Client.java:setupIOstreams(562)) - 
> Connecting to /160.161.0.155:8020
> 2013-01-12 09:51:38,442 DEBUG ipc.Client (Client.java:close(932)) - closing 
> ipc connection to vm2/160.161.0.155:8020: 1 millis timeout while waiting 
> for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
> java.net.SocketTimeoutException: 1 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending remote=/160.161.0.155:8020]
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:524)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:474)
>  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:568)
>  at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:217)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1286)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1156)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>  at $Proxy9.create(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:187)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
>  at $Proxy10.create(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1261)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1280)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1128)
>  at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1086)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:715)
>  at test.TestLease.main(TestLease.java:45)
> 2013-01-12 09:51:38,443 DEBUG ipc.Client (Client.java:close(940)) - IPC 
> Client (31594013) connection to /160.161.0.155:8020 from 
> hdfs/had...@hadoop.com: closed
> 2013-01-12 09:52:47,834 WARN  retry.RetryInvocationHandler 
> (RetryInvocationHandler.java:invoke(95)) - Exception while invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create. 
> Not retrying because the invoked method is not idempotent, and unable to 
> determine whether it was invoked
> java.net.SocketTimeoutException: Call From szxy1x001833091/172.0.0.13 to 
> vm2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 
> 1 millis timeout while waiting for channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending 
> remote=/160.161.0.155:8020]; For more details see:  
> http://wiki.apache.org/hadoop/SocketTimeout
>  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:743)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1180)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:184)
>

[jira] [Updated] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-4462:
-

Attachment: HDFS-4462.patch

Thanks a lot for the review, Chris, and for running those additional tests.

Your suggestion does seem pretty paranoid (odds are 1 over 2^31), but better to 
be overly conservative in cases such as this. :)

Please take a look at the updated patch. This patch expressly checks to see if 
the local metadata's layout version supports federation or not, and only 
compares the namespace IDs if it doesn't support federation. If federation is 
supported, all three fields are compared.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569083#comment-13569083
 ] 

Andy Isaacson commented on HDFS-4461:
-

The actual OOM backtrace is on the DN thread:
{noformat}
  at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:25)
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat()Lorg/apache/hadoop/hdfs/server/protocol/HeartbeatResponse;
 (BPServiceActor.java:434)
  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService()V 
(BPServiceActor.java:520)
  at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run()V 
(BPServiceActor.java:673)
  at java.lang.Thread.run()V (Thread.java:662)
{noformat}


> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569063#comment-13569063
 ] 

Chris Nauroth commented on HDFS-4462:
-

Hi, Aaron.  The code looks good.  I applied the patch to branch-2 and ran 
multiple test suites related to checkpoints and 2NN.

{code}
-  boolean isSameCluster(FSImage si) {
-return namespaceID == si.getStorage().namespaceID &&
-  clusterID.equals(si.getClusterID()) &&
-  blockpoolID.equals(si.getBlockPoolID());
+  boolean namespaceIdMatches(FSImage si) {
+return namespaceID == si.getStorage().namespaceID;
   }
{code}

Considering that namespace ID is an integer, whereas cluster ID is based on a 
GUID, it seems there is higher likelihood of accidental collision.  Then, 
{{CheckpointSignature#validateStorageInfo}} could misidentify a match.  It's 
still highly unlikely (but non-zero).

I'm wondering if a safer change would be (pseudo-code):

{code}
if namespace ID + cluster ID + blockpool ID are defined on both
  compare all 3 fields
else if only namespace ID is defined on one of them
  compare only namespace ID
{code}

This would keep the logic the same for upgrades between 2 post-federation 
versions, and just change the logic for the case of pre-fed -> post-fed.

Or am I being too paranoid?  :-)


> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569018#comment-13569018
 ] 

Hudson commented on HDFS-4456:
--

Integrated in Hadoop-trunk-Commit #3311 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3311/])
HDFS-4456. Add concat to HttpFS and WebHDFS REST API docs. (plamenj2003 via 
tucu) (Revision 1441603)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441603
Files : 
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ConcatSourcesParam.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm


> Add concat to HttpFS and WebHDFS REST API docs
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013
 ] 

Andy Isaacson commented on HDFS-4461:
-

bq. A server generally has a lot of String objects. There are also file objects 
in ReplicasMap, string paths tracked in many other places as well.

The cluster in question has about 1.5 million blocks per DN, across 12 
datadirs.  This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the 
"Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the 
DirectoryScanner thread at 1.4 GB.

bq. ScanInfo is a short lived object, unlike other data structures that are 
long lived.

It doesn't matter how narrow the peak is, if it exceeds the maximum permissible 
value.  In this case we seem to have a complete set of ScanInfo objects (for 
the entire dataset) active on the heap, with the DirectoryScanner thread in the 
process of reconcile()ing them when it OOMs.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs

2013-02-01 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-4456:
-

   Resolution: Fixed
Fix Version/s: (was: 3.0.0)
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Plamen. Committed to trunk and branch-2.

> Add concat to HttpFS and WebHDFS REST API docs
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4459) command manual dfsadmin missing entry for restoreFailedStorage option

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569009#comment-13569009
 ] 

Hadoop QA commented on HDFS-4459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567514/hdfs4459.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3936//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3936//console

This message is automatically generated.

> command manual dfsadmin missing entry for restoreFailedStorage option
> -
>
> Key: HDFS-4459
> URL: https://issues.apache.org/jira/browse/HDFS-4459
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Thomas Graves
>Assignee: Andy Isaacson
> Attachments: hdfs4459.txt
>
>
> Generating the latest site docs it doesn't show the -restoreFailedStorage 
> option under the dfsadmin section of commands_manual.html
> Also it appears the table header is concatenated with the first row:
> COMMAND_OPTION -report

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4456) Add concat to HttpFS and WebHDFS REST API docs

2013-02-01 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-4456:
-

Summary: Add concat to HttpFS and WebHDFS REST API docs  (was: Add concat 
to WebHDFS REST API)

> Add concat to HttpFS and WebHDFS REST API docs
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API

2013-02-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569001#comment-13569001
 ] 

Alejandro Abdelnur commented on HDFS-4456:
--

got it, +1 for 
https://issues.apache.org/jira/secure/attachment/12567454/HDFS-4456.trunk.patch 
then.

> Add concat to WebHDFS REST API
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API

2013-02-01 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568994#comment-13568994
 ] 

Plamen Jeliazkov commented on HDFS-4456:


The one that introduces the extra warning had no failing tests.

> Add concat to WebHDFS REST API
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568989#comment-13568989
 ] 

Todd Lipcon commented on HDFS-4461:
---

Looks like we can cut the memory usage in half again -- storing both the 
metafile path and the block file path is redundant, since you can always 
compute the block path from the meta path by chopping off the 
"_.meta" prefix.

Suresh -- we routinely see users with millions of replicas per DN now that 
48TB+ configurations have become commodity. Sure, we should also encourage 
users to use things like HAR to coalesce into larger blocks, but easy wins on 
DN memory usage are a no-brainer IMO.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4461:
---

Description: In the {{DirectoryScanner}}, we create a class {{ScanInfo}} 
for every block.  This object contains two File objects-- one for the metadata 
file, and one for the block file.  Since those File objects contain full paths, 
users who pick a lengthly path for their volume roots will end up using an 
extra N_blocks * path_prefix bytes per block scanned.  We also don't really 
need to store File objects-- storing strings and then creating File objects as 
needed would be cheaper.  This would be a nice efficiency improvement.  (was: 
In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
This object contains two File objects-- one for the metadata file, and one for 
the block file.  Since those File objects contain full paths, users who pick a 
lengthly path for their volume roots will end up using an extra N_blocks * 
path_prefix bytes per block scanned.  We also don't really need to store File 
objects-- storing strings and then creating File objects as needed would be 
cheaper.  This has been causing out-of-memory conditions for users who pick 
such long volume paths.)

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API

2013-02-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568985#comment-13568985
 ] 

Alejandro Abdelnur commented on HDFS-4456:
--

No the other way around, the one that introduces an extra warning.

> Add concat to WebHDFS REST API
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API

2013-02-01 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568984#comment-13568984
 ] 

Plamen Jeliazkov commented on HDFS-4456:


The unit test failure associated with the Generics removal patch does not 
appear to be related by the way; code-wise. I will verify with a full test run 
on my own local machine though and get back to you with those results.

> Add concat to WebHDFS REST API
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb

2013-02-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568983#comment-13568983
 ] 

Todd Lipcon commented on HDFS-4463:
---

Good work figuring this one out. I've seen it once or twice but hadn't been 
able to track down the bug.

> ActiveStandbyElector can join election even before Service HEALTHY, and 
> results in null data at ActiveBreadCrumb
> 
>
> Key: HDFS-4463
> URL: https://issues.apache.org/jira/browse/HDFS-4463
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.2-alpha
>Reporter: Vinay
>Assignee: Vinay
>Priority: Critical
>
> ActiveStandbyElector can store null at ActiveBreadCrumb in the below race 
> condition. At further all failovers will fail resulting NPE.
> 1. ZKFC restarted.
> 2. due to less machine busy, first zk connection is expired even before the 
> health monitoring returned the status.
> 3. On re-establishment transitionToActive will be called, at this time 
> appData will be null,
> 4. So now ActiveBreadCrumb will have null.
> 5. After this any failovers will fail throwing 
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat}
> Should not join the election before service is HEALTHY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4456) Add concat to WebHDFS REST API

2013-02-01 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568982#comment-13568982
 ] 

Plamen Jeliazkov commented on HDFS-4456:


Are you saying you would like to push the Generics removal patch then rather 
than the other one? I will check the tests and make sure it is passing normally 
with the Generics removal patch.

> Add concat to WebHDFS REST API
> --
>
> Key: HDFS-4456
> URL: https://issues.apache.org/jira/browse/HDFS-4456
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-3598.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.docAndHttpFS.patch, HDFS-4456.trunk.docAndHttpFS.patch, 
> HDFS-4456.trunk.genericsRemoval.patch, HDFS-4456.trunk.patch, 
> HDFS-4456.trunk.patch
>
>
> HDFS-3598 adds the concat feature to WebHDFS.  The REST API should be updated 
> accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568979#comment-13568979
 ] 

Suresh Srinivas commented on HDFS-4461:
---

bq. If someone is running with around 200,000 blocks (a reasonable number), and 
a 50 to 80 character path, this change saves between 50 and 100 MB of heap 
space during the DirectoryScanner run. That's what we should be focusing on 
here-- the efficiency improvement. After all, that is why I marked this JIRA as 
"improvement" rather than "bug" 

I think you are missing the point I made earlier. In the description you say:
bq. This has been causing out-of-memory conditions for users who pick such long 
volume paths.
It is not correct to attribute the inefficiency in memory of DirectoryScanner 
to OOM. So please update the description to say DirectoryScanner can be made 
more efficient.

bq. I saw more than 1 million ScanInfo objects
I am interested in seeing the number of blocks in this particular setup and if 
we are leaking these objects.

I am more leaning towards incorrect datanode configuration in the setup where 
you saw OOM. Can you provide details on what the heap size of datanode is, the 
number of blocks on the datanode etc.?

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4459) command manual dfsadmin missing entry for restoreFailedStorage option

2013-02-01 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4459:


Status: Patch Available  (was: Open)

> command manual dfsadmin missing entry for restoreFailedStorage option
> -
>
> Key: HDFS-4459
> URL: https://issues.apache.org/jira/browse/HDFS-4459
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Thomas Graves
>Assignee: Andy Isaacson
> Attachments: hdfs4459.txt
>
>
> Generating the latest site docs it doesn't show the -restoreFailedStorage 
> option under the dfsadmin section of commands_manual.html
> Also it appears the table header is concatenated with the first row:
> COMMAND_OPTION -report

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4450) Duplicate data node on the name node after formatting data node

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568973#comment-13568973
 ] 

Suresh Srinivas commented on HDFS-4450:
---

Also please provide from your configuration, what you have set to the parameter 
"dfs.datanode.address" to?s

> Duplicate data node on the name node after formatting data node
> ---
>
> Key: HDFS-4450
> URL: https://issues.apache.org/jira/browse/HDFS-4450
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: WenJin Ma
> Attachments: exception.bmp, normal.bmp
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Duplicate data node on the name node after formatting data node。
> When we registered data node,use nodeReg.getXferPort() to find 
> DatanodeDescriptor.
> {code}
>  DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
> nodeReg.getIpAddr(), nodeReg.getXferPort());
> {code}
> but add data node use node.getIpAddr().
> {code}
> /** add node to the map 
>* return true if the node is added; false otherwise.
>*/
>   boolean add(DatanodeDescriptor node) {
> hostmapLock.writeLock().lock();
> try {
>   if (node==null || contains(node)) {
> return false;
>   }
>   
>   String ipAddr = node.getIpAddr();
>   DatanodeDescriptor[] nodes = map.get(ipAddr);
>   DatanodeDescriptor[] newNodes;
>   if (nodes==null) {
> newNodes = new DatanodeDescriptor[1];
> newNodes[0]=node;
>   } else { // rare case: more than one datanode on the host
> newNodes = new DatanodeDescriptor[nodes.length+1];
> System.arraycopy(nodes, 0, newNodes, 0, nodes.length);
> newNodes[nodes.length] = node;
>   }
>   map.put(ipAddr, newNodes);
>   return true;
> } finally {
>   hostmapLock.writeLock().unlock();
> }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568959#comment-13568959
 ] 

Colin Patrick McCabe commented on HDFS-4461:


If someone is running with around 200,000 blocks (a reasonable number), and a 
50 to 80 character path, this change saves between 50 and 100 MB of heap space 
during the DirectoryScanner run.  That's what we should be focusing on here-- 
the efficiency improvement.  After all, that is why I marked this JIRA as 
"improvement" rather than "bug" :)

bq. Or at least the number of ScanInfo objects you saw.

I saw more than 1 million {{ScanInfo}} objects.  This means that either the 
number of blocks on the DN is much higher than we recommend, or there is 
another leak in the {{DirectoryScanner}}.  I am trying to get confirmation that 
the number of blocks is really that high.  If it isn't, then we will start 
looking more closely for memory leaks in the scanner.

We've found that the block scanner often delivers the finishing blow to DNs 
that are already overloaded.  This makes sense-- if your heap is already near 
max size, asking you to allocate a few hundred megabytes might finish you off.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1765) Block Replication should respect under-replication block priority

2013-02-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-1765:
-

Target Version/s: 0.23.3, 0.24.0  (was: 0.24.0, 0.23.3)
   Fix Version/s: 0.23.7

Committed to branch-0.23.

> Block Replication should respect under-replication block priority
> -
>
> Key: HDFS-1765
> URL: https://issues.apache.org/jira/browse/HDFS-1765
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Uma Maheswara Rao G
> Fix For: 2.0.0-alpha, 0.23.7
>
> Attachments: HDFS-1765.patch, HDFS-1765.patch, HDFS-1765.patch, 
> HDFS-1765.patch, HDFS-1765.pdf, underReplicatedQueue.pdf
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently under-replicated blocks are assigned different priorities depending 
> on how many replicas a block has. However the replication monitor works on 
> blocks in a round-robin fashion. So the newly added high priority blocks 
> won't get replicated until all low-priority blocks are done. One example is 
> that on decommissioning datanode WebUI we often observe that "blocks with 
> only decommissioning replicas" do not get scheduled to replicate before other 
> blocks, so risking data availability if the node is shutdown for repair 
> before decommission completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568946#comment-13568946
 ] 

Aaron T. Myers commented on HDFS-4462:
--

[~acmurthy] Blocker? Probably not. Pretty good to have? I think so. There's a 
pretty simple work-around: when upgrading from a pre-federation version of 
HDFS, blow away your 2NN checkpoint dirs before starting up your 2NN again. A 
problem will arise if an admin doesn't notice that all of their 2NN checkpoints 
are failing post-upgrade.

Regardless, it's a pretty simple change - I'm hoping it can get committed today.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb

2013-02-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568938#comment-13568938
 ] 

Colin Patrick McCabe commented on HDFS-4463:


moving to HDFS, since it's about ZKFC.

> ActiveStandbyElector can join election even before Service HEALTHY, and 
> results in null data at ActiveBreadCrumb
> 
>
> Key: HDFS-4463
> URL: https://issues.apache.org/jira/browse/HDFS-4463
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.2-alpha
>Reporter: Vinay
>Assignee: Vinay
>Priority: Critical
>
> ActiveStandbyElector can store null at ActiveBreadCrumb in the below race 
> condition. At further all failovers will fail resulting NPE.
> 1. ZKFC restarted.
> 2. due to less machine busy, first zk connection is expired even before the 
> health monitoring returned the status.
> 3. On re-establishment transitionToActive will be called, at this time 
> appData will be null,
> 4. So now ActiveBreadCrumb will have null.
> 5. After this any failovers will fail throwing 
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat}
> Should not join the election before service is HEALTHY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Moved] (HDFS-4463) ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb

2013-02-01 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe moved HADOOP-9275 to HDFS-4463:


  Component/s: (was: ha)
   ha
Affects Version/s: (was: 2.0.2-alpha)
   2.0.2-alpha
  Key: HDFS-4463  (was: HADOOP-9275)
  Project: Hadoop HDFS  (was: Hadoop Common)

> ActiveStandbyElector can join election even before Service HEALTHY, and 
> results in null data at ActiveBreadCrumb
> 
>
> Key: HDFS-4463
> URL: https://issues.apache.org/jira/browse/HDFS-4463
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.0.2-alpha
>Reporter: Vinay
>Assignee: Vinay
>Priority: Critical
>
> ActiveStandbyElector can store null at ActiveBreadCrumb in the below race 
> condition. At further all failovers will fail resulting NPE.
> 1. ZKFC restarted.
> 2. due to less machine busy, first zk connection is expired even before the 
> health monitoring returned the status.
> 3. On re-establishment transitionToActive will be called, at this time 
> appData will be null,
> 4. So now ActiveBreadCrumb will have null.
> 5. After this any failovers will fail throwing 
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.StringUtils.byteToHexString(StringUtils.java:171)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:892)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:797)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:475)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:545)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497){noformat}
> Should not join the election before service is HEALTHY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568937#comment-13568937
 ] 

Arun C Murthy commented on HDFS-4462:
-

[~atm] Is this a 2.0.3 blocker? Tx.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4457) WebHDFS obtains/sets delegation token service hostname using wrong config leading to issues when NN is configured with 0.0.0.0 RPC IP

2013-02-01 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568929#comment-13568929
 ] 

Aaron T. Myers commented on HDFS-4457:
--

Daryn, does Tucu's explanation address your concerns? I think Tucu's latest 
comment makes sense - you're right that the client should be setting the token 
service, and in this case the client is effectively doing just that since the 
server is using the host/port as sent by the client when creating the DT.

The patch looks good to me, but I don't want to commit it if you have more 
pending comments. Please let me know.

> WebHDFS obtains/sets delegation token service hostname using wrong config 
> leading to issues when NN is configured with 0.0.0.0 RPC IP
> -
>
> Key: HDFS-4457
> URL: https://issues.apache.org/jira/browse/HDFS-4457
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.1.1, 2.0.2-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Attachments: HDFS_4457.patch, HDFS_4457.patch
>
>
> If the NameNode RPC address is configured with an wildcard IP 0.0.0.0, then 
> delegationotkens are configured with 0.0.0.0 as service and this breaks 
> clients trying to use those tokens.
> Looking at NamenodeWebHdfsMethods#generateDelegationToken() the problem is 
> SecurityUtil.setTokenService(t, namenode.getHttpAddress());, tracing back 
> what is being used to resolve getHttpAddress() the NameNodeHttpServer is 
> resolving the httpAddress doing a httpAddress = new 
> InetSocketAddress(bindAddress.getAddress(), httpServer.getPort());
> , and if using "0.0.0.0" in the configuration, you get 0.0.0.0 from 
> bindAddress.getAddress().
> Normally (non webhdfs) this is not an issue because it is the responsibility 
> of the client, but in the case of WebHDFS, WebHDFS does it before returning 
> the string version of the token (it must be this way because the client may 
> not be a java client at all and cannot manipulate the DelegationToken as 
> such).
> The solution (thanks to Eric Sammer for helping figure this out) is for 
> WebHDFS to use the exacty hostname that came in the HTTP request as the 
> service to set in the delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568930#comment-13568930
 ] 

Suresh Srinivas commented on HDFS-4461:
---

bq. we analyzed a DN heap dump from a production cluster with eclipse memory 
analyzer and found that the memory was full of ScanInfo objects. The memory 
histogram showed that java.lang.String was the third-largest consumer of memory 
in the system. Unfortunately I can't share the heap dump.
A server generally has a lot of String objects. There are also file objects in 
ReplicasMap, string paths tracked in many other places as well.

This patch indeed saves few bytes. However I do not think this is either the 
cause of the OOME or is likely to solve that issue. ScanInfo is a short lived 
object, unlike other data structures that are long lived.

Can you answer the following question, I previously asked:
bq. How many blocks per storage directory do you have, when OOME happened?

Or at least the number of ScanInfo objects you saw.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4450) Duplicate data node on the name node after formatting data node

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568924#comment-13568924
 ] 

Suresh Srinivas commented on HDFS-4450:
---

bq. Can you post the lines from the logs that corresponds to datanode dn0 
registration corresponding to before format and after format?

I should have been more clear. What I asked for is, from the namenode logs, 
please get the two registration requests from dn0, one before you shut it down 
and one after you restart. The log lines should look like:

{noformat}
2013-02-01 10:11:10,522 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.registerDatanode: node registration from 
DatanodeRegistration(10.28.176.234, 
storageID=DS-685519412-10.28.176.234-50010-1359684666375, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-fe3b5079-a34a-4912-b8a8-50443d038749;nsid=1321646662;c=0)
 storage DS-685519412-10.28.176.234-50010-1359684666375
{noformat}


> Duplicate data node on the name node after formatting data node
> ---
>
> Key: HDFS-4450
> URL: https://issues.apache.org/jira/browse/HDFS-4450
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: WenJin Ma
> Attachments: exception.bmp, normal.bmp
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Duplicate data node on the name node after formatting data node。
> When we registered data node,use nodeReg.getXferPort() to find 
> DatanodeDescriptor.
> {code}
>  DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
> nodeReg.getIpAddr(), nodeReg.getXferPort());
> {code}
> but add data node use node.getIpAddr().
> {code}
> /** add node to the map 
>* return true if the node is added; false otherwise.
>*/
>   boolean add(DatanodeDescriptor node) {
> hostmapLock.writeLock().lock();
> try {
>   if (node==null || contains(node)) {
> return false;
>   }
>   
>   String ipAddr = node.getIpAddr();
>   DatanodeDescriptor[] nodes = map.get(ipAddr);
>   DatanodeDescriptor[] newNodes;
>   if (nodes==null) {
> newNodes = new DatanodeDescriptor[1];
> newNodes[0]=node;
>   } else { // rare case: more than one datanode on the host
> newNodes = new DatanodeDescriptor[nodes.length+1];
> System.arraycopy(nodes, 0, newNodes, 0, nodes.length);
> newNodes[nodes.length] = node;
>   }
>   map.put(ipAddr, newNodes);
>   return true;
> } finally {
>   hostmapLock.writeLock().unlock();
> }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568920#comment-13568920
 ] 

Colin Patrick McCabe commented on HDFS-4461:


bq. I doubt that the directory scanner is the cause of OOM error. It is 
probably happening due to some other issue. How many blocks per storage 
directory do you have, when OOME happened?

we analyzed a DN heap dump from a production cluster with eclipse memory 
analyzer and found that the memory was full of ScanInfo objects.  The memory 
histogram showed that {{java.lang.String}} was the third-largest consumer of 
memory in the system.  Unfortunately I can't share the heap dump.

bq. I have hard time understanding the picture. How many bytes are we saving 
per ScanInfo?

In the particular case shown in memory-analysis.png, we save 86 characters in 
each string.  The volume prefix that we avoid storing is 
{{/home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/}}.
  Java uses 2 bytes per character (UCS-2 encoding), and we store both metaPath 
and blockPath, so multiply that by 4 to get 344.  Then add the overhead of 
using two objects File that contain the path string instead of just the string 
itself-- probably around an extra 16 bytes per object, for 376 bytes in total 
saved per {{ScanInfo}}.

You might think that 
{{/home/cmccabe/hadoop4/hadoop-hdfs-project/hadoop-hdfs/build//test/data/dfs/data/data1/}}
 is an unrealistically long volume path, but here is an example of a real 
volume path in use on a production cluster:

{{/mnt/hdfs/hdfs01/10769eef-a23a-4300-b45b-749221786109/dfs/dn}}.

Putting the disk UUID into the volume is an obvious thing to do if you're a 
system administrator.

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


TYCHUN�]T.Y. CHUN ���}���^ is out of the office.

2013-02-01 Thread tychun

I will be out of the office starting  2013/02/02 and will not return until
2013/02/19.

I will respond to your message when I return.
 --- 
 TSMC PROPERTY   
 This email communication (and any attachments) is proprietary information   
 for the sole use of its 
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended 
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by 
 replying to this email, and then delete this email and any copies of it 
 immediately. Thank you. 
 --- 






[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

2013-02-01 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568901#comment-13568901
 ] 

Suresh Srinivas commented on HDFS-4461:
---

bq. This has been causing out-of-memory conditions for users who pick such long 
volume paths.
I doubt that the directory scanner is the cause of OOM error. It is probably 
happening due to some other issue. How many blocks per storage directory do you 
have, when OOME happened?

bq. here's a before vs. after picture of a memory analysis. you can see that in 
the "after" picture, we are no longer storing the path prefix twice per block 
in the ScanInfo class
I have hard time understanding the picture. How many bytes are we saving per 
ScanInfo?

> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This has been causing out-of-memory conditions for users 
> who pick such long volume paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2013-02-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2476:
-

Target Version/s: 0.23.3, 0.24.0  (was: 0.24.0, 0.23.3)
   Fix Version/s: 0.23.7

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Fix For: 2.0.0-alpha, 0.23.7
>
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, 
> hashStructures.patch-9
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2013-02-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568767#comment-13568767
 ] 

Kihwal Lee commented on HDFS-2476:
--

Committed to the current branch-0.23.

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Fix For: 2.0.0-alpha
>
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, 
> hashStructures.patch-9
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568750#comment-13568750
 ] 

Hudson commented on HDFS-4451:
--

Integrated in Hadoop-Mapreduce-trunk #1331 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1331/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hdfs balancer command returns exit code 1 on success instead of 0
> -
>
> Key: HDFS-4451
> URL: https://issues.apache.org/jira/browse/HDFS-4451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.2-alpha
> Environment: Centos 6.3, JDK 1.6.0_25
>Reporter: Joshua Blatt
> Fix For: 2.0.3-alpha
>
> Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch
>
>
> Though the org.apache.hadoop.util.Tool interface javadocs indicate  
> implementations should return 0 on success, the 
> org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the 
> int values of this enum instead:
>   // Exit status
>   enum ReturnStatus {
> SUCCESS(1),
> IN_PROGRESS(0),
> ALREADY_RUNNING(-1),
> NO_MOVE_BLOCK(-2),
> NO_MOVE_PROGRESS(-3),
> IO_EXCEPTION(-4),
> ILLEGAL_ARGS(-5),
> INTERRUPTED(-6);
> This created an issue for us when we tried to run the hdfs balancer as a cron 
> job.   Cron sends emails whenever a executable it runs exits non-zero.   We'd 
> either have to disable all emails and miss real issues or fix this bug.
> I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0.
> Marking this change as incompatible because existing scripts which interpret 
> exit 1 as success will be broken (unless they defensively/liberally interpret 
> both exit 1 and exit 0 as success).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568746#comment-13568746
 ] 

Hudson commented on HDFS-4151:
--

Integrated in Hadoop-Mapreduce-trunk #1331 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1331/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)
HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. 
Contributed by Joshua Blatt. (Revision 1441113)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> Passing INodesInPath instead of INode[] in FSDirectory
> --
>
> Key: HDFS-4151
> URL: https://issues.apache.org/jira/browse/HDFS-4151
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: h4151_20121104.patch, h4151_20121105.patch
>
>
> Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
> better to pass INodesInPath so that we can add more path information later 
> on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4189) rename getter method getMutableX and getXMutable to getXAndEnsureMutable

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568734#comment-13568734
 ] 

Hudson commented on HDFS-4189:
--

Integrated in Hadoop-Hdfs-Snapshots-Branch-build #88 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-Snapshots-Branch-build/88/])
HDFS-4189. Renames the getMutableXxx methods to getXxx4Write and fix a bug 
that some getExistingPathINodes calls should be getINodesInPath4Write. 
(Revision 1441193)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441193
Files : 
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-2802.txt
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java


> rename getter method getMutableX and getXMutable to getXAndEnsureMutable
> 
>
> Key: HDFS-4189
> URL: https://issues.apache.org/jira/browse/HDFS-4189
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Brandon Li
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: h4189_20130130.patch, h4189_20130131.patch
>
>
> The method names with the form "getMutableXxx" may be misleading.  Let's 
> rename them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4361) When listing snapshottable directories, only return those where the user has permission to take snapshots

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568735#comment-13568735
 ] 

Hudson commented on HDFS-4361:
--

Integrated in Hadoop-Hdfs-Snapshots-Branch-build #88 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-Snapshots-Branch-build/88/])
HDFS-4361. When listing snapshottable directories, only return those where 
the user has permission to take snapshots.  Contributed by Jing Zhao (Revision 
1441202)

 Result = FAILURE
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441202
Files : 
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/CHANGES.HDFS-2802.txt
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* 
/hadoop/common/branches/HDFS-2802/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshottableDirListing.java


> When listing snapshottable directories, only return those where the user has 
> permission to take snapshots
> -
>
> Key: HDFS-4361
> URL: https://issues.apache.org/jira/browse/HDFS-4361
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: HDFS-4361.001.patch, HDFS-4361.002.patch, 
> HDFS-4361.003.patch, HDFS-4361.004.patch
>
>
> Currently, all snapshottable directories are returned for any user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568733#comment-13568733
 ] 

Hadoop QA commented on HDFS-4452:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12567572/getAdditionalBlock.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3935//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3935//console

This message is automatically generated.

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568729#comment-13568729
 ] 

Hudson commented on HDFS-4451:
--

Integrated in Hadoop-Hdfs-trunk #1303 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1303/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hdfs balancer command returns exit code 1 on success instead of 0
> -
>
> Key: HDFS-4451
> URL: https://issues.apache.org/jira/browse/HDFS-4451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.2-alpha
> Environment: Centos 6.3, JDK 1.6.0_25
>Reporter: Joshua Blatt
> Fix For: 2.0.3-alpha
>
> Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch
>
>
> Though the org.apache.hadoop.util.Tool interface javadocs indicate  
> implementations should return 0 on success, the 
> org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the 
> int values of this enum instead:
>   // Exit status
>   enum ReturnStatus {
> SUCCESS(1),
> IN_PROGRESS(0),
> ALREADY_RUNNING(-1),
> NO_MOVE_BLOCK(-2),
> NO_MOVE_PROGRESS(-3),
> IO_EXCEPTION(-4),
> ILLEGAL_ARGS(-5),
> INTERRUPTED(-6);
> This created an issue for us when we tried to run the hdfs balancer as a cron 
> job.   Cron sends emails whenever a executable it runs exits non-zero.   We'd 
> either have to disable all emails and miss real issues or fix this bug.
> I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0.
> Marking this change as incompatible because existing scripts which interpret 
> exit 1 as success will be broken (unless they defensively/liberally interpret 
> both exit 1 and exit 0 as success).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568725#comment-13568725
 ] 

Hudson commented on HDFS-4151:
--

Integrated in Hadoop-Hdfs-trunk #1303 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1303/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)
HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. 
Contributed by Joshua Blatt. (Revision 1441113)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> Passing INodesInPath instead of INode[] in FSDirectory
> --
>
> Key: HDFS-4151
> URL: https://issues.apache.org/jira/browse/HDFS-4151
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: h4151_20121104.patch, h4151_20121105.patch
>
>
> Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
> better to pass INodesInPath so that we can add more path information later 
> on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2495) Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568711#comment-13568711
 ] 

Hudson commented on HDFS-2495:
--

Integrated in Hadoop-Hdfs-0.23-Build #512 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/])
merge -r 1199023:1199024 Merging from trunk to branch-0.23 to fix HDFS-2495 
(Revision 1441249)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441249
Files : 
* /hadoop/common/branches/branch-0.23
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java


> Increase granularity of write operations in ReplicationMonitor thus reducing 
> contention for write lock
> --
>
> Key: HDFS-2495
> URL: https://issues.apache.org/jira/browse/HDFS-2495
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Fix For: 2.0.0-alpha, 0.23.7
>
> Attachments: replicationMon.patch, replicationMon.patch-1
>
>
> For processing blocks in ReplicationMonitor 
> (BlockManager.computeReplicationWork), we first obtain a list of blocks to be 
> replicated by calling chooseUnderReplicatedBlocks, and then for each block 
> which was found, we call computeReplicationWorkForBlock. The latter processes 
> a block in three stages, acquiring the writelock twice per call:
> 1. obtaining block related info (livenodes, srcnode, etc.) under lock
> 2. choosing target for replication
> 3. scheduling replication (under lock)
> We would like to change this behaviour and decrease contention for the write 
> lock, by batching blocks and executing 1,2,3, for sets of blocks, rather than 
> for each one separately. This would decrease the number of writeLock to 2, 
> from 2*numberofblocks.
> Also, the info level logging can be pushed outside the writelock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568710#comment-13568710
 ] 

Hudson commented on HDFS-2477:
--

Integrated in Hadoop-Hdfs-0.23-Build #512 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/])
Merging r1196676 and r1197801 from trunk to branch-0.23 to fix HDFS-2477 
(Revision 1441131)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441131
Files : 
* /hadoop/common/branches/branch-0.23
* /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockInfo.java


> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 0.23.0
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Fix For: 2.0.0-alpha, 0.23.7
>
> Attachments: reportDiff.patch, reportDiff.patch-2, 
> reportDiff.patch-3, reportDiff.patch-4, reportDiff.patch-5
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568706#comment-13568706
 ] 

Hudson commented on HDFS-395:
-

Integrated in Hadoop-Hdfs-0.23-Build #512 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/512/])
merge -r 1161991:1161992 Merging from trunk to branch-0.23 to fix HDFS-395 
(Revision 1441117)

 Result = SUCCESS
kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441117
Files : 
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetAsyncDiskService.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/BlockCommand.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReceivedDeletedBlockInfo.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
/hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java


> DFS Scalability: Incremental block reports
> --
>
> Key: HDFS-395
> URL: https://issues.apache.org/jira/browse/HDFS-395
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: dhruba borthakur
>Assignee: Tomasz Nykiel
> Fix For: 2.0.0-alpha, 0.23.7
>
> Attachments: blockReportPeriod.patch, explicitAcks.patch-3, 
> explicitAcks.patch-4, explicitAcks.patch-5, explicitAcks.patch-6, 
> explicitDeleteAcks.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 5 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4452) getAdditionalBlock() can create multiple blocks if the client times out and retries.

2013-02-01 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-4452:
--

Attachment: getAdditionalBlock.patch

The same chamges now with the test case which succeeds with the patch and fails 
on trunk with the expected error:
{code}
java.lang.AssertionError: Must be one block expected:<1> but was:<2>
{code}

> getAdditionalBlock() can create multiple blocks if the client times out and 
> retries.
> 
>
> Key: HDFS-4452
> URL: https://issues.apache.org/jira/browse/HDFS-4452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Critical
> Attachments: getAdditionalBlock.patch, getAdditionalBlock.patch, 
> TestAddBlockRetry.java
>
>
> HDFS client tries to addBlock() to a file. If NameNode is busy the client can 
> timeout and will reissue the same request again. The two requests will race 
> with each other in {{FSNamesystem.getAdditionalBlock()}}, which can result in 
> creating two new blocks on the NameNode while the client will know of only 
> one of them. This eventually results in {{NotReplicatedYetException}} because 
> the extra block is never reported by any DataNode, which stalls file creation 
> and puts it in invalid state with an empty block in the middle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2013-02-01 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568655#comment-13568655
 ] 

Liang Xie commented on HDFS-347:


Hi [~cmccabe], would you mind giving a patch against branch-2 if possible ? 
It'll be appreciated:) 
I could be a volunteer to do a simple performance test on our hbase test 
cluster which is built with branch-2, to see whether there is a performance 
imporement on application-side or not, thanks in advance.

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, performance
>Reporter: George Porter
>Assignee: Colin Patrick McCabe
> Attachments: 2013.01.28.design.pdf, 2013.01.31.consolidated2.patch, 
> 2013.01.31.consolidated.patch, all.tsv, BlockReaderLocal1.txt, full.patch, 
> HADOOP-4801.1.patch, HADOOP-4801.2.patch, HADOOP-4801.3.patch, 
> HDFS-347-016_cleaned.patch, HDFS-347.016.patch, HDFS-347.017.clean.patch, 
> HDFS-347.017.patch, HDFS-347.018.clean.patch, HDFS-347.018.patch2, 
> HDFS-347.019.patch, HDFS-347.020.patch, HDFS-347.021.patch, 
> HDFS-347.022.patch, HDFS-347.024.patch, HDFS-347.025.patch, 
> HDFS-347.026.patch, HDFS-347.027.patch, HDFS-347.029.patch, 
> HDFS-347.030.patch, HDFS-347.033.patch, HDFS-347.035.patch, 
> HDFS-347-branch-20-append.txt, hdfs-347-merge.txt, hdfs-347-merge.txt, 
> hdfs-347-merge.txt, hdfs-347.png, hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4151) Passing INodesInPath instead of INode[] in FSDirectory

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568647#comment-13568647
 ] 

Hudson commented on HDFS-4151:
--

Integrated in Hadoop-Yarn-trunk #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/114/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)
HDFS-4151. hdfs balancer command returns exit code 1 on success instead of 0. 
Contributed by Joshua Blatt. (Revision 1441113)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441113
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> Passing INodesInPath instead of INode[] in FSDirectory
> --
>
> Key: HDFS-4151
> URL: https://issues.apache.org/jira/browse/HDFS-4151
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: h4151_20121104.patch, h4151_20121105.patch
>
>
> Currently, many methods in FSDirectory pass INode[] as a parameter.  It is 
> better to pass INodesInPath so that we can add more path information later 
> on.  This is especially useful in Snapshot implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4451) hdfs balancer command returns exit code 1 on success instead of 0

2013-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568651#comment-13568651
 ] 

Hudson commented on HDFS-4451:
--

Integrated in Hadoop-Yarn-trunk #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/114/])
Change incorrect jira number HDFS-4151 to HDFS-4451 and move it to 
incompatible section. (Revision 1441123)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441123
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> hdfs balancer command returns exit code 1 on success instead of 0
> -
>
> Key: HDFS-4451
> URL: https://issues.apache.org/jira/browse/HDFS-4451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.2-alpha
> Environment: Centos 6.3, JDK 1.6.0_25
>Reporter: Joshua Blatt
> Fix For: 2.0.3-alpha
>
> Attachments: HDFS-4451.patch, HDFS-4451.patch, HDFS-4451.patch
>
>
> Though the org.apache.hadoop.util.Tool interface javadocs indicate  
> implementations should return 0 on success, the 
> org.apache.hadoop.hdfs.server.balance.Balancer.Cli implementation returns the 
> int values of this enum instead:
>   // Exit status
>   enum ReturnStatus {
> SUCCESS(1),
> IN_PROGRESS(0),
> ALREADY_RUNNING(-1),
> NO_MOVE_BLOCK(-2),
> NO_MOVE_PROGRESS(-3),
> IO_EXCEPTION(-4),
> ILLEGAL_ARGS(-5),
> INTERRUPTED(-6);
> This created an issue for us when we tried to run the hdfs balancer as a cron 
> job.   Cron sends emails whenever a executable it runs exits non-zero.   We'd 
> either have to disable all emails and miss real issues or fix this bug.
> I think both SUCCESS and IN_PROGRESS ReturnStatuses should lead to exit 0.
> Marking this change as incompatible because existing scripts which interpret 
> exit 1 as success will be broken (unless they defensively/liberally interpret 
> both exit 1 and exit 0 as success).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568560#comment-13568560
 ] 

Hadoop QA commented on HDFS-4462:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12567534/HDFS-4462.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3934//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3934//console

This message is automatically generated.

> 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation 
> version of HDFS
> ---
>
> Key: HDFS-4462
> URL: https://issues.apache.org/jira/browse/HDFS-4462
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-4462.patch, HDFS-4462.patch
>
>
> The 2NN currently has logic to detect when its on-disk FS metadata needs an 
> upgrade with respect to the NN's metadata (i.e. the layout versions are 
> different) and in this case it will proceed with the checkpoint despite 
> storage signatures not matching precisely if the BP ID and Cluster ID do 
> match exactly. However, in situations where we're upgrading from versions of 
> HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints 
> will always fail with an error like the following:
> {noformat}
> 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
> checkpoint fields.
> LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
> CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
> BP-1520616013-172.21.3.106-1359680537136.
> Expecting respectively: -19; 403832480; 0; ; .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira