date:20140923


[ 
https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144432#comment-14144432
 ] 

Hadoop QA commented on HDFS-6581:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670595/HDFS-6581.merge.10.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 32 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8159//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8159//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8159//console

This message is automatically generated.

 Write to single replica in memory
 -

 Key: HDFS-6581
 URL: https://issues.apache.org/jira/browse/HDFS-6581
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, 
 HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, 
 HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, 
 HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, 
 HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf


 Per discussion with the community on HDFS-5851, we will implement writing to 
 a single replica in DN memory via DataTransferProtocol.
 This avoids some of the issues with short-circuit writes, which we can 
 revisit at a later time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance


[ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144517#comment-14144517
 ] 

Hadoop QA commented on HDFS-6606:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670626/HDFS-6606.006.patch
  against trunk revision a9a55db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.crypto.random.TestOsSecureRandom
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8161//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8161//console

This message is automatically generated.

 Optimize HDFS Encrypted Transport performance
 -

 Key: HDFS-6606
 URL: https://issues.apache.org/jira/browse/HDFS-6606
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
 HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, 
 HDFS-6606.006.patch, OptimizeHdfsEncryptedTransportperformance.pdf


 In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
 it was a great work.
 It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
 three security strength:
 * high  3des   or rc4 (128bits)
 * medium des or rc4(56bits)
 * low   rc4(40bits)
 3des and rc4 are slow, only *tens of MB/s*, 
 http://www.javamex.com/tutorials/cryptography/ciphers.shtml
 http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
 I will give more detailed performance data in future. Absolutely it’s 
 bottleneck and will vastly affect the end to end performance. 
 AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
 it’s more secure; with AES-NI support, the throughput can reach nearly 
 *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
 supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
 a new mode support for AES). 
 This JIRA will use AES with AES-NI support as encryption algorithm for 
 DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end


[ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144516#comment-14144516
 ] 

Hadoop QA commented on HDFS-7128:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670602/HDFS-7128.patch
  against trunk revision 7b8df93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.TestAvailableSpaceVolumeChoosingPolicy
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8160//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8160//console

This message is automatically generated.

 Decommission slows way down when it gets towards the end
 

 Key: HDFS-7128
 URL: https://issues.apache.org/jira/browse/HDFS-7128
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7128.patch


 When we decommission nodes across different racks, the decommission process 
 becomes really slow at the end, hardly making any progress. The problem is 
 some blocks are on 3 decomm-in-progress DNs and the way how replications are 
 scheduled caused unnecessary delay. Here is the analysis.
 When BlockManager schedules the replication work from neededReplication, it 
 first needs to pick the source node for replication via chooseSourceDatanode. 
 The core policies to pick the source node are:
 1. Prefer decomm-in-progress node.
 2. Only pick the nodes whose outstanding replication counts are below 
 thresholds dfs.namenode.replication.max-streams or 
 dfs.namenode.replication.max-streams-hard-limit, based on the replication 
 priority.
 When we decommission nodes,
 1. All the decommission nodes' blocks will be added to neededReplication.
 2. BM will pick X number of blocks from neededReplication in each iteration. 
 X is based on cluster size and some configurable multiplier. So if the 
 cluster has 2000 nodes, X will be around 4000.
 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
 being chosen as the source node of all these 4000 nodes. The reason the 
 outstanding replication thresholds don't kick is due to the implementation of 
 BlockManager.computeReplicationWorkForBlocks; 
 node.getNumberOfBlocksToBeReplicated() remains zero given 
 node.addBlockToBeReplicated is called after source node iteration.
 {noformat}
 ...
   synchronized (neededReplications) {
 for (int priority = 0; priority  blocksToReplicate.size(); 
 priority++) {
 ...
 chooseSourceDatanode
 ...
 }
   for(ReplicationWork rw : work){
 ...
   rw.srcNode.addBlockToBeReplicated(block, targets);
 ...
   }
 {noformat}
  
 4. So several decomm-in-progress nodes A, B, C end up with 4000 
 node.getNumberOfBlocksToBeReplicated().
 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
 take 800 minutes to finish replication of these blocks.
 6. Pending replication timeout kick in after 5 minutes. The items will be 
 removed from the pending replication queue and added back to 
 neededReplication. The replications will then be handled by other source 
 nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
 replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
 the replications of these blocks, although these blocks might

[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance

2014-09-23 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144534#comment-14144534
 ] 

Yi Liu commented on HDFS-6606:
--

Test failures are unrelated. 
[~atm] and [~tucu00], do you have further comments? Thanks.

 Optimize HDFS Encrypted Transport performance
 -

 Key: HDFS-6606
 URL: https://issues.apache.org/jira/browse/HDFS-6606
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, security
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
 HDFS-6606.003.patch, HDFS-6606.004.patch, HDFS-6606.005.patch, 
 HDFS-6606.006.patch, OptimizeHdfsEncryptedTransportperformance.pdf


 In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
 it was a great work.
 It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
 three security strength:
 * high  3des   or rc4 (128bits)
 * medium des or rc4(56bits)
 * low   rc4(40bits)
 3des and rc4 are slow, only *tens of MB/s*, 
 http://www.javamex.com/tutorials/cryptography/ciphers.shtml
 http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
 I will give more detailed performance data in future. Absolutely it’s 
 bottleneck and will vastly affect the end to end performance. 
 AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
 it’s more secure; with AES-NI support, the throughput can reach nearly 
 *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
 supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
 a new mode support for AES). 
 This JIRA will use AES with AES-NI support as encryption algorithm for 
 DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6633) Support reading new data in a being written file until the file is closed


 [ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6633:

Attachment: HDFS-6633-001.patch

Attached the patch.

changes:
1. Added 2 new APIs, {{pollNewData() and isFileUnderConstruction()}} to 
DFSInputStream and HdfsDataInputStream

2. {{pollNewData()}} should be called after EOF on being-written file.

3. Once it returns true, then can continue reading again.


Tried changing the datatransfer protocol to continue reading from the existing 
stream itself. But I was facing problem in BlockSender.

 Support reading new data in a being written file until the file is closed
 -

 Key: HDFS-6633
 URL: https://issues.apache.org/jira/browse/HDFS-6633
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Vinayakumar B
 Attachments: HDFS-6633-001.patch, h6633_20140707.patch, 
 h6633_20140708.patch


 When a file is being written, the file length keeps increasing.  If the file 
 is opened for read, the reader first gets the file length and then read only 
 up to that length.  The reader will not be able to read the new data written 
 afterward.
 We propose adding a new feature so that readers will be able to read all the 
 data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7134) Replication count for a block should not update till the blocks have settled on Datanodes

2014-09-23 Thread gurmukh singh (JIRA)

gurmukh singh created HDFS-7134:
---

 Summary: Replication count for a block should not update till the 
blocks have settled on Datanodes
 Key: HDFS-7134
 URL: https://issues.apache.org/jira/browse/HDFS-7134
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.2.1
 Environment: Linux nn1.cluster1.com 2.6.32-431.20.3.el6.x86_64 #1 SMP 
Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[hadoop@nn1 conf]$ cat /etc/redhat-release
CentOS release 6.5 (Final)
Reporter: gurmukh singh


The count for the number of replica's for a block should not change till the 
blocks have settled on the datanodes.

Test Case:

Hadoop Cluster with 1 namenode and 3 datanodes.
nn1.cluster1.com(192.168.1.70)
dn1.cluster1.com(192.168.1.72)
dn2.cluster1.com(192.168.1.73)
dn3.cluster1.com(192.168.1.74)

Cluster up and running fine with replication set to 1 for parameter 
dfs.replication on all nodes

property
namedfs.replication/name
value1/value
/property

To reduce the wait time, have reduced the dfs.heartbeat and recheck parameters.

on datanode2 (192.168.1.72)

[hadoop@dn2 ~]$ hadoop fs -Ddfs.replication=2 -put from_dn2 /

[hadoop@dn2 ~]$ hadoop fs -ls /from_dn2
Found 1 items
-rw-r--r--   2 hadoop supergroup 17 2014-09-23 13:33 /from_dn2

On Namenode
===
As expected, copy was done from datanode2, one copy will go locally.

[hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
13:53:16 IST 2014
/from_dn2 17 bytes, 1 block(s):  OK
0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
192.168.1.73:50010]

Can see the blocks on the data nodes disks as well under the current 
directory.

Now, shutdown datanode2(192.168.1.73) and as expected block moves to another 
datanode to maintain a replication of 2

[hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
13:54:21 IST 2014
/from_dn2 17 bytes, 1 block(s):  OK
0. blk_8132629811771280764_1175 len=17 repl=2 [192.168.1.74:50010, 
192.168.1.72:50010]

But, now if i bring back the datanode2, and although the namenode see that this 
block is at 3 places now and fires a invalidate command for 
datanode1(192.168.1.72) but the replication on the namenode is bumped to 3 
immediately.

[hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
13:56:12 IST 2014
/from_dn2 17 bytes, 1 block(s):  OK
0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
192.168.1.72:50010, 192.168.1.73:50010]

on Datanode1 - The invalidate command has been fired immediately and the block 
deleted.
=
2014-09-23 13:54:17,483 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
/192.168.1.72:50010
2014-09-23 13:54:17,502 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received blk_8132629811771280764_1175 src: /192.168.1.74:38099 dest: 
/192.168.1.72:50010 size 17
2014-09-23 13:55:28,720 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Scheduling blk_8132629811771280764_1175 file 
/space/disk1/current/blk_8132629811771280764 for deletion
2014-09-23 13:55:28,721 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Deleted blk_8132629811771280764_1175 at file 
/space/disk1/current/blk_8132629811771280764

The namenode still shows 3 replica's. even if one has been deleted, even after 
more then 30 mins.

[hadoop@nn1 conf]$ hadoop fsck /from_dn2 -files -blocks -locations
FSCK started by hadoop from /192.168.1.70 for path /from_dn2 at Tue Sep 23 
14:21:27 IST 2014
/from_dn2 17 bytes, 1 block(s):  OK
0. blk_8132629811771280764_1175 len=17 repl=3 [192.168.1.74:50010, 
192.168.1.72:50010, 192.168.1.73:50010]

This could be a dangerous, if someone remove or other 2 datanodes fail.

On Datanode 1
=
Before, the datanode1 is brought back

[hadoop@dn1 conf]$ ls -l /space/disk*/current
/space/disk1/current:
total 28
-rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
-rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
-rw-rw-r-- 1 hadoop hadoop   17 Sep 23 13:54 blk_8132629811771280764
-rw-rw-r-- 1 hadoop hadoop   11 Sep 23 13:54 blk_8132629811771280764_1175.meta
-rw-rw-r-- 1 hadoop hadoop 5299 Sep 21 10:04 dncp_block_verification.log.curr
-rw-rw-r-- 1 hadoop hadoop  157 Sep 23 13:51 VERSION

After, starting datanode daemon

[hadoop@dn1 conf]$ ls -l /space/disk*/current
/space/disk1/current:
total 20
-rw-rw-r-- 1 hadoop hadoop   13 Sep 21 09:09 blk_2278001646987517832
-rw-rw-r-- 1 hadoop hadoop   11 Sep 21 09:09 blk_2278001646987517832_1171.meta
-rw-rw-r-- 1 hadoop hadoop 5299 Sep 21 10:04 dncp_block_verification.log.curr
-rw-rw-r-- 1 hadoop

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144584#comment-14144584
]

Vinayakumar B commented on HDFS-7097:
-

Thanks Kihwal, Changes looks good.

I too have the same question as [~mingma].
1. though its not frequent, saveNamespace RPC on standby will create the same
problem as mentioned in description.
can we have different locks for saveNamespace based on HA state?

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch

On a reasonably busy HDFS cluster, there are stream of creates, causing data
nodes to generate incremental block reports. When a standby name node is
checkpointing, RPC handler threads trying to process a full or incremental
block report is blocked on the name system's {{fsLock}}, because the
checkpointer acquires the read lock on it. This can create a serious problem
if the size of name space is big and checkpointing takes a long time.
All available RPC handlers can be tied up very quickly. If you have 100
handlers, it only takes 34 file creates. If a separate service RPC port is
not used, HA transition will have to wait in the call queue for minutes. Even
if a separate service RPC port is configured, hearbeats from datanodes will
be blocked. A standby NN with a big name space can lose all data nodes after
checkpointing. The rpc calls will also be retransmitted by data nodes many
times, filling up the call queue and potentially causing listen queue
overflow.
Since block reports are not modifying any state that is being saved to
fsimage, I propose letting them through during checkpointing.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7102) Null dereference in PacketReceiver#receiveNextPacket()


[ 
https://issues.apache.org/jira/browse/HDFS-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144593#comment-14144593
 ] 

Vinayakumar B commented on HDFS-7102:
-

Any potential problem you are seeing here?

 Null dereference in PacketReceiver#receiveNextPacket()
 --

 Key: HDFS-7102
 URL: https://issues.apache.org/jira/browse/HDFS-7102
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 {code}
   public void receiveNextPacket(ReadableByteChannel in) throws IOException {
 doRead(in, null);
 {code}
 doRead() would pass null as second parameter to (line 134):
 {code}
 doReadFully(ch, in, curPacketBuf);
 {code}
 which dereferences it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7113) Add DFSAdmin Command to Recover Lease


 [ 
https://issues.apache.org/jira/browse/HDFS-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7113:

Fix Version/s: (was: 2.5.1)

 Add DFSAdmin Command to Recover Lease
 -

 Key: HDFS-7113
 URL: https://issues.apache.org/jira/browse/HDFS-7113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Miklos Christine
Priority: Minor
 Attachments: HDFS-7113.2.patch, HDFS-7113.patch


 In certain conditions, a lease may be left around if an error occurs while 
 writing to HDFS and the file is left open. 
 Having a DFSAdmin command would allow administrators to recover the lease and 
 close the file easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7115) TestEncryptionZones assumes Unix path separator for KMS key store path


[ 
https://issues.apache.org/jira/browse/HDFS-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144663#comment-14144663
 ] 

Hudson commented on HDFS-7115:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #689 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/689/])
HDFS-7115. TestEncryptionZones assumes Unix path separator for KMS key store 
path. Contributed by Xiaoyu Yao. (cnauroth: rev 
26cba7f35ff24262afa5d8f9ed22f3a7f01d9a71)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestEncryptionZones assumes Unix path separator for KMS key store path
 --

 Key: HDFS-7115
 URL: https://issues.apache.org/jira/browse/HDFS-7115
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.5.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.6.0

 Attachments: HDFS-7115.0.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7001) Tests in TestTracing should not depend on the order of execution


[ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144660#comment-14144660
 ] 

Hudson commented on HDFS-7001:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #689 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/689/])
HDFS-7001. Tests in TestTracing should not depend on the order of execution. 
(iwasakims via cmccabe) (cmccabe: rev 7b8df93ce1b7204a247e64b394d57eef748e73aa)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Tests in TestTracing should not depend on the order of execution
 

 Key: HDFS-7001
 URL: https://issues.apache.org/jira/browse/HDFS-7001
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch


 o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
 first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7106) Reconfiguring DataNode volumes does not release the lock files in removed volumes.


[ 
https://issues.apache.org/jira/browse/HDFS-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144669#comment-14144669
 ] 

Hudson commented on HDFS-7106:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #689 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/689/])
HDFS-7106. Reconfiguring DataNode volumes does not release the lock files in 
removed volumes. (cnauroth via cmccabe) (cmccabe: rev 
912ad32b03c1e023ab88918bfa8cb356d1851545)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


 Reconfiguring DataNode volumes does not release the lock files in removed 
 volumes.
 --

 Key: HDFS-7106
 URL: https://issues.apache.org/jira/browse/HDFS-7106
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.6.0

 Attachments: HDFS-7106.1.patch, HDFS-7106.2.patch, HDFS-7106.3.patch


 After reconfiguring a DataNode to remove volumes without restarting the 
 DataNode, the process still holds lock files exclusively in all of the 
 volumes that were removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed


[ 
https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144692#comment-14144692
 ] 

Hadoop QA commented on HDFS-6633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670663/HDFS-6633-001.patch
  against trunk revision f557820.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8162//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8162//console

This message is automatically generated.

 Support reading new data in a being written file until the file is closed
 -

 Key: HDFS-6633
 URL: https://issues.apache.org/jira/browse/HDFS-6633
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Vinayakumar B
 Attachments: HDFS-6633-001.patch, h6633_20140707.patch, 
 h6633_20140708.patch


 When a file is being written, the file length keeps increasing.  If the file 
 is opened for read, the reader first gets the file length and then read only 
 up to that length.  The reader will not be able to read the new data written 
 afterward.
 We propose adding a new feature so that readers will be able to read all the 
 data until the writer closes the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7115) TestEncryptionZones assumes Unix path separator for KMS key store path


[ 
https://issues.apache.org/jira/browse/HDFS-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144818#comment-14144818
 ] 

Hudson commented on HDFS-7115:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1880/])
HDFS-7115. TestEncryptionZones assumes Unix path separator for KMS key store 
path. Contributed by Xiaoyu Yao. (cnauroth: rev 
26cba7f35ff24262afa5d8f9ed22f3a7f01d9a71)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestEncryptionZones assumes Unix path separator for KMS key store path
 --

 Key: HDFS-7115
 URL: https://issues.apache.org/jira/browse/HDFS-7115
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.5.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.6.0

 Attachments: HDFS-7115.0.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7115) TestEncryptionZones assumes Unix path separator for KMS key store path


[ 
https://issues.apache.org/jira/browse/HDFS-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144795#comment-14144795
 ] 

Hudson commented on HDFS-7115:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1905 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1905/])
HDFS-7115. TestEncryptionZones assumes Unix path separator for KMS key store 
path. Contributed by Xiaoyu Yao. (cnauroth: rev 
26cba7f35ff24262afa5d8f9ed22f3a7f01d9a71)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java


 TestEncryptionZones assumes Unix path separator for KMS key store path
 --

 Key: HDFS-7115
 URL: https://issues.apache.org/jira/browse/HDFS-7115
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.5.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.6.0

 Attachments: HDFS-7115.0.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7106) Reconfiguring DataNode volumes does not release the lock files in removed volumes.


[ 
https://issues.apache.org/jira/browse/HDFS-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144824#comment-14144824
 ] 

Hudson commented on HDFS-7106:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1880/])
HDFS-7106. Reconfiguring DataNode volumes does not release the lock files in 
removed volumes. (cnauroth via cmccabe) (cmccabe: rev 
912ad32b03c1e023ab88918bfa8cb356d1851545)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Reconfiguring DataNode volumes does not release the lock files in removed 
 volumes.
 --

 Key: HDFS-7106
 URL: https://issues.apache.org/jira/browse/HDFS-7106
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.6.0

 Attachments: HDFS-7106.1.patch, HDFS-7106.2.patch, HDFS-7106.3.patch


 After reconfiguring a DataNode to remove volumes without restarting the 
 DataNode, the process still holds lock files exclusively in all of the 
 volumes that were removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7106) Reconfiguring DataNode volumes does not release the lock files in removed volumes.


[ 
https://issues.apache.org/jira/browse/HDFS-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144801#comment-14144801
 ] 

Hudson commented on HDFS-7106:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1905 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1905/])
HDFS-7106. Reconfiguring DataNode volumes does not release the lock files in 
removed volumes. (cnauroth via cmccabe) (cmccabe: rev 
912ad32b03c1e023ab88918bfa8cb356d1851545)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


 Reconfiguring DataNode volumes does not release the lock files in removed 
 volumes.
 --

 Key: HDFS-7106
 URL: https://issues.apache.org/jira/browse/HDFS-7106
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.6.0

 Attachments: HDFS-7106.1.patch, HDFS-7106.2.patch, HDFS-7106.3.patch


 After reconfiguring a DataNode to remove volumes without restarting the 
 DataNode, the process still holds lock files exclusively in all of the 
 volumes that were removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

2014-09-23 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144833#comment-14144833
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Plamen could you please update JavaDocs per the design doc. I found one thing:
 * Fails if truncate is not to block boundary and someone has lease.
This should be removed because block boundary is irrelevant and you already 
said before that truncate fails if file is not closed.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7001) Tests in TestTracing should not depend on the order of execution


[ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144792#comment-14144792
 ] 

Hudson commented on HDFS-7001:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1905 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1905/])
HDFS-7001. Tests in TestTracing should not depend on the order of execution. 
(iwasakims via cmccabe) (cmccabe: rev 7b8df93ce1b7204a247e64b394d57eef748e73aa)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java


 Tests in TestTracing should not depend on the order of execution
 

 Key: HDFS-7001
 URL: https://issues.apache.org/jira/browse/HDFS-7001
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch


 o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
 first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-3107) HDFS truncate

2014-09-23 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-3107:
--
Attachment: HDFS_truncate.pdf

Here is the design doc.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7001) Tests in TestTracing should not depend on the order of execution


[ 
https://issues.apache.org/jira/browse/HDFS-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144815#comment-14144815
 ] 

Hudson commented on HDFS-7001:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1880 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1880/])
HDFS-7001. Tests in TestTracing should not depend on the order of execution. 
(iwasakims via cmccabe) (cmccabe: rev 7b8df93ce1b7204a247e64b394d57eef748e73aa)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java


 Tests in TestTracing should not depend on the order of execution
 

 Key: HDFS-7001
 URL: https://issues.apache.org/jira/browse/HDFS-7001
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7001-0.patch, HDFS-7001-1.patch


 o.a.h.tracing.TestTracing#testSpanReceiverHost is assumed to be executed 
 first. It should be done in BeforeClass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7128) Decommission slows way down when it gets towards the end

[
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144865#comment-14144865
]

Kihwal Lee edited comment on HDFS-7128 at 9/23/14 3:00 PM:
---

This is not just about decommissioning. If nodes die and a large number of
blocks need to replicated, the replication monitor can schedule a large number
of blocks in one run and it can over-schedule far beyond the hard limit on
certain nodes, since {{getNumberOfBlocksToBeReplicated()}} is not updated. As
you pointed out, gross over-scheduling should be avoided as it causes
replication timeout and potentially duplicate replications and invalidations.
In my experience, multiple node deaths are commonly caused by DNS or network
outages. When it causes a big cluster to lose a large proportion of nodes, the
recovery can be very slow because almost every nodes are over-scheduled with
replication works that are no longer necessary. This patch will also help in
this case.

I think the proposed approach is reasonable. If I were to change one thing, I
would call {{decrementPendingReplicationWithoutTargets()}} in the finally block
of the try block surrounding {{chooseTarget()}}. Do you think the default
soft-limit and hard-limit are reasonable?

was (Author: kihwal):
This is not just about decommissioning. If nodes die and a large number of
blocks need to replicated, the replication monitor can schedule a large number
of blocks in one run and it can over-schedule far beyond the hard limit on
certain nodes, since {{getNumberOfBlocksToBeReplicated()}} is not updated. As
you pointed out, gross over-scheduling should be avoided as it causes
replication timeout and potentially duplicate replications and invalidations.
In my experience, multiple node deaths are commonly caused by DNS or network
outages. When it causes a big cluster to lose a large proportion of nodes, the
recovery can be very slow because almost every nodes are over-scheduled with
replication works that are no longer necessary. This patch will also help in
this case.

I think the proposed approach is reasonable. If I were to change one thing, I
would call {{decrementPendingReplicationWithoutTargets()}} in a finally block
surrounding {{chooseTarget()}}. Do you think the default soft-limit and
hard-limit are reasonable?

Decommission slows way down when it gets towards the end

Key: HDFS-7128
URL: https://issues.apache.org/jira/browse/HDFS-7128
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HDFS-7128.patch

When we decommission nodes across different racks, the decommission process
becomes really slow at the end, hardly making any progress. The problem is
some blocks are on 3 decomm-in-progress DNs and the way how replications are
scheduled caused unnecessary delay. Here is the analysis.
When BlockManager schedules the replication work from neededReplication, it
first needs to pick the source node for replication via chooseSourceDatanode.
The core policies to pick the source node are:
1. Prefer decomm-in-progress node.
2. Only pick the nodes whose outstanding replication counts are below
thresholds dfs.namenode.replication.max-streams or
dfs.namenode.replication.max-streams-hard-limit, based on the replication
priority.
When we decommission nodes,
1. All the decommission nodes' blocks will be added to neededReplication.
2. BM will pick X number of blocks from neededReplication in each iteration.
X is based on cluster size and some configurable multiplier. So if the
cluster has 2000 nodes, X will be around 4000.
3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up
being chosen as the source node of all these 4000 nodes. The reason the
outstanding replication thresholds don't kick is due to the implementation of
BlockManager.computeReplicationWorkForBlocks;
node.getNumberOfBlocksToBeReplicated() remains zero given
node.addBlockToBeReplicated is called after source node iteration.
{noformat}
...
synchronized (neededReplications) {
for (int priority = 0; priority blocksToReplicate.size();
priority++) {
...
chooseSourceDatanode
...
}
for(ReplicationWork rw : work){
...
rw.srcNode.addBlockToBeReplicated(block, targets);
...
}
{noformat}

4. So several decomm-in-progress nodes A, B, C end up with 4000
node.getNumberOfBlocksToBeReplicated().
5. If we assume each node can replicate 5 blocks per minutes, it is going to
take 800 minutes to finish replication of these blocks.
6. Pending replication timeout kick in after 5 minutes. The items will be
removed from the pending replication queue and added back to

[jira] [Commented] (HDFS-7114) Secondary NameNode failed to rollback from 2.4.1 to 2.2.0


[ 
https://issues.apache.org/jira/browse/HDFS-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144882#comment-14144882
 ] 

Kihwal Lee commented on HDFS-7114:
--

Secondary name node does not persist any state for its own start up.  Clean up 
the temporary storage and restart.

 Secondary NameNode failed to rollback from 2.4.1 to 2.2.0
 -

 Key: HDFS-7114
 URL: https://issues.apache.org/jira/browse/HDFS-7114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: sam liu
Priority: Blocker

 Can upgrade from 2.2.0 to 2.4.1, but failed to rollback the secondary 
 namenode with following issue.
 2014-09-22 10:41:28,358 FATAL 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start 
 secondary namenode
 org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
 version of storage directory /var/hadoop/tmp/hdfs/dfs/namesecondary. 
 Reported: -56. Expecting = -47.
 at 
 org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1082)
 at 
 org.apache.hadoop.hdfs.server.common.Storage.setFieldsFromProperties(Storage.java:890)
 at 
 org.apache.hadoop.hdfs.server.namenode.NNStorage.setFieldsFromProperties(NNStorage.java:585)
 at 
 org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:921)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:913)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:249)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:199)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:652)
 2014-09-22 10:41:28,360 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 2014-09-22 10:41:28,363 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7128) Decommission slows way down when it gets towards the end

[
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144865#comment-14144865
]

Kihwal Lee edited comment on HDFS-7128 at 9/23/14 2:59 PM:
---

was (Author: kihwal):
This is not just about decommissioning. If nodes die and a large number of
blocks need to replicated, the replication monitor can schedule a large number
of blocks in one run and it can over-schedule far beyond the hard limit on
certain nodes, since {{getNumberOfBlocksToBeReplicated()}} is not updated. As
you pointed out, gross over-scheduling should be avoided as it causes
replication timeout and potentially duplicate replication work and
invalidation. In my experience, multiple node deaths are commonly caused by
DNS or network outages. When it causes a big cluster to lose a large
proportion of nodes, the recovery can be very slow because almost every nodes
are over-scheduled with replication works that are no longer necessary. This
patch will also help in this case.

Decommission slows way down when it gets towards the end

Key: HDFS-7128
URL: https://issues.apache.org/jira/browse/HDFS-7128
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HDFS-7128.patch

[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end


[ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144865#comment-14144865
 ] 

Kihwal Lee commented on HDFS-7128:
--

This is not just about decommissioning.  If nodes die and a large number of 
blocks need to replicated, the replication monitor can schedule a large number 
of blocks in one run and it can over-schedule far beyond the hard limit on 
certain nodes, since {{getNumberOfBlocksToBeReplicated()}} is not updated.  As 
you pointed out, gross over-scheduling should be avoided as it causes 
replication timeout and potentially duplicate replication work and 
invalidation.  In my experience, multiple node deaths are commonly caused by 
DNS or network outages.  When it causes a big cluster to lose a large 
proportion of nodes, the recovery can be very slow because almost every nodes 
are over-scheduled with replication works that are no longer necessary.  This 
patch will also help in this case.

I think the proposed approach is reasonable. If I were to change one thing, I 
would call {{decrementPendingReplicationWithoutTargets()}} in a finally block 
surrounding {{chooseTarget()}}.  Do you think the default soft-limit and 
hard-limit are reasonable?


 Decommission slows way down when it gets towards the end
 

 Key: HDFS-7128
 URL: https://issues.apache.org/jira/browse/HDFS-7128
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7128.patch


 When we decommission nodes across different racks, the decommission process 
 becomes really slow at the end, hardly making any progress. The problem is 
 some blocks are on 3 decomm-in-progress DNs and the way how replications are 
 scheduled caused unnecessary delay. Here is the analysis.
 When BlockManager schedules the replication work from neededReplication, it 
 first needs to pick the source node for replication via chooseSourceDatanode. 
 The core policies to pick the source node are:
 1. Prefer decomm-in-progress node.
 2. Only pick the nodes whose outstanding replication counts are below 
 thresholds dfs.namenode.replication.max-streams or 
 dfs.namenode.replication.max-streams-hard-limit, based on the replication 
 priority.
 When we decommission nodes,
 1. All the decommission nodes' blocks will be added to neededReplication.
 2. BM will pick X number of blocks from neededReplication in each iteration. 
 X is based on cluster size and some configurable multiplier. So if the 
 cluster has 2000 nodes, X will be around 4000.
 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
 being chosen as the source node of all these 4000 nodes. The reason the 
 outstanding replication thresholds don't kick is due to the implementation of 
 BlockManager.computeReplicationWorkForBlocks; 
 node.getNumberOfBlocksToBeReplicated() remains zero given 
 node.addBlockToBeReplicated is called after source node iteration.
 {noformat}
 ...
   synchronized (neededReplications) {
 for (int priority = 0; priority  blocksToReplicate.size(); 
 priority++) {
 ...
 chooseSourceDatanode
 ...
 }
   for(ReplicationWork rw : work){
 ...
   rw.srcNode.addBlockToBeReplicated(block, targets);
 ...
   }
 {noformat}
  
 4. So several decomm-in-progress nodes A, B, C end up with 4000 
 node.getNumberOfBlocksToBeReplicated().
 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
 take 800 minutes to finish replication of these blocks.
 6. Pending replication timeout kick in after 5 minutes. The items will be 
 removed from the pending replication queue and added back to 
 neededReplication. The replications will then be handled by other source 
 nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
 replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
 the replications of these blocks, although these blocks might have been 
 replicated by other DNs after replication timeout.
 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
 replication queue. Even though the block's replication timeout, no source 
 node can be chosen given A, B, C all have high pending replication count. So 
 we have to wait until A drains its pending replication queue. Meanwhile, the 
 items in A's pending replication queue have been taken care of by other nodes 
 and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7126) TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path


 [ 
https://issues.apache.org/jira/browse/HDFS-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7126:

   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

The test failures are unrelated.  I committed this to trunk and branch-2.  
Xiaoyu, thank you for your help cleaning up these tests.

 TestEncryptionZonesWithHA assumes Unix path separator for KMS key store path
 

 Key: HDFS-7126
 URL: https://issues.apache.org/jira/browse/HDFS-7126
 Project: Hadoop HDFS
  Issue Type: Test
  Components: security, test
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7126.0.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.


[ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144900#comment-14144900
 ] 

Chris Nauroth commented on HDFS-7130:
-

It looks like the pre-commit job's process got killed somehow.  I've submitted 
a new pre-commit run.

 TestDataTransferKeepalive fails intermittently on Windows.
 --

 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7130.1.patch


 {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
 tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
 likely too short on Windows, which has been observed to have a less granular 
 clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7135) Add trace command to FsShell

Masatake Iwasaki created HDFS-7135:
--

 Summary: Add trace command to FsShell
 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7135:
---
Component/s: hdfs-client

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki

 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7135:
---
Component/s: (was: datanode)
 (was: namenode)

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki

 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7135:
---
Attachment: HDFS-7135-0.patch

attaching patch.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7135:
---
Attachment: HDFS-7135-1.patch

I updated the patch. Trace does not need to use ToolRunner because generic 
option parsing is done before invoking Trace#run.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7114) Secondary NameNode failed to rollback from 2.4.1 to 2.2.0


 [ 
https://issues.apache.org/jira/browse/HDFS-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-7114.
--
Resolution: Invalid

 Secondary NameNode failed to rollback from 2.4.1 to 2.2.0
 -

 Key: HDFS-7114
 URL: https://issues.apache.org/jira/browse/HDFS-7114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: sam liu
Priority: Blocker

 Can upgrade from 2.2.0 to 2.4.1, but failed to rollback the secondary 
 namenode with following issue.
 2014-09-22 10:41:28,358 FATAL 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start 
 secondary namenode
 org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
 version of storage directory /var/hadoop/tmp/hdfs/dfs/namesecondary. 
 Reported: -56. Expecting = -47.
 at 
 org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1082)
 at 
 org.apache.hadoop.hdfs.server.common.Storage.setFieldsFromProperties(Storage.java:890)
 at 
 org.apache.hadoop.hdfs.server.namenode.NNStorage.setFieldsFromProperties(NNStorage.java:585)
 at 
 org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:921)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:913)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:249)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:199)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:652)
 2014-09-22 10:41:28,360 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 2014-09-22 10:41:28,363 INFO 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-7135:
---
Status: Patch Available  (was: Open)

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-09-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144968#comment-14144968
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6799:
---

TestBalancer.testUnknownDatanode failed in [build 
#8162|https://builds.apache.org/job/PreCommit-HDFS-Build/8162//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testUnknownDatanode/].
  It might be related to the change here since there were a lot of 
ReplicaNotFoundException resulted from SimulatedFSDataset.  For example,
{noformat}
2014-09-23 11:20:05,974 ERROR datanode.DataNode (DataXceiver.java:run(243))
 - host1.foo.com:47137:DataXceiver error processing COPY_BLOCK operation  src: 
/127.0.0.1:36294 dst: /127.0.0.1:47137
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
found for BP-1049218722-67.195.81.148-1411471192218:blk_1073741850_1026
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:419)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:228)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:918)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:241)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:662)
{noformat}

 The invalidate method in SimulatedFSDataset.java failed to remove 
 (invalidate) blocks from the file system.
 ---

 Key: HDFS-6799
 URL: https://issues.apache.org/jira/browse/HDFS-6799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.1
Reporter: Megasthenis Asteris
Assignee: Megasthenis Asteris
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6799.patch


 The invalidate(String bpid, Block[] invalidBlks) method in 
 SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
 system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-09-23 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144979#comment-14144979
 ] 

Benoy Antony commented on HDFS-6799:


Thanks for letting me know [~szetszwo]. I will take a look.

 The invalidate method in SimulatedFSDataset.java failed to remove 
 (invalidate) blocks from the file system.
 ---

 Key: HDFS-6799
 URL: https://issues.apache.org/jira/browse/HDFS-6799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.1
Reporter: Megasthenis Asteris
Assignee: Megasthenis Asteris
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6799.patch


 The invalidate(String bpid, Block[] invalidBlks) method in 
 SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
 system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-09-23 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144990#comment-14144990
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6799:
---

Thanks Benoy.  The patch here was correct.  There might be more bugs in 
SimulatedFSDataset or TestBalancer.testUnknownDatanode.  The fix here simply 
triggered them.

 The invalidate method in SimulatedFSDataset.java failed to remove 
 (invalidate) blocks from the file system.
 ---

 Key: HDFS-6799
 URL: https://issues.apache.org/jira/browse/HDFS-6799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, test
Affects Versions: 2.4.1
Reporter: Megasthenis Asteris
Assignee: Megasthenis Asteris
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-6799.patch


 The invalidate(String bpid, Block[] invalidBlks) method in 
 SimulatedFSDataset.java should remove all invalidBlks from the simulated file 
 system. It currently fails to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145040#comment-14145040
 ] 

Allen Wittenauer commented on HDFS-7135:


This is effectively a dupe of HDFS-6956.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-23 Thread Charles Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145039#comment-14145039
 ] 

Charles Lamb commented on HDFS-7132:


TestEncryptionZonesWithKMS and TestPipelinesFailover passed on my local machine 
with the patch applied. TestWebHdfsFileSystemContract failed with and without 
the patch.

 hdfs namenode -metadataVersion command does not honor configured name dirs
 --

 Key: HDFS-7132
 URL: https://issues.apache.org/jira/browse/HDFS-7132
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7132.001.patch


 The hdfs namenode -metadataVersion command does not honor 
 dfs.namenode.name.dir.nameservice.namenode configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7135:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7104) Fix and clarify INodeInPath getter functions


 [ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7104:

Status: Patch Available  (was: Open)

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor

 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145043#comment-14145043
 ] 

stack commented on HDFS-7135:
-

+1 LGTM.  Needs nice release note.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7117) Not all datanodes are displayed on the namenode http tab


[ 
https://issues.apache.org/jira/browse/HDFS-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145045#comment-14145045
 ] 

Kihwal Lee commented on HDFS-7117:
--

I assume you saw this in 2.4. Have you tried branch-2 or release-2.5.*? There 
have been multiple fixes around the UI since 2.4.

 Not all datanodes are displayed on the namenode http tab
 

 Key: HDFS-7117
 URL: https://issues.apache.org/jira/browse/HDFS-7117
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Jean-Baptiste Onofré
 Fix For: 2.4.0


 On a single machine, I have three fake nodes (each node use different 
 dfs.datanode.address, dfs.datanode.ipc.address, dfs.datanode.http.address)
 - node1 starts the namenode and a datanode
 - node2 starts a datanode
 - node3 starts a datanode
 In the namenode http console, on the overview, I can see 3 live nodes:
 {code}
 http://localhost:50070/dfshealth.html#tab-overview
 {code}
 but, when clicking on the Live Nodes:
 {code}
 http://localhost:50070/dfshealth.html#tab-datanode
 {code}
 I can see only one node row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7104) Fix and clarify INodeInPath getter functions


 [ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7104:

Attachment: HDFS-7104-20140923-v1.patch

[~jingzhao] Thanks very much for the clarification. I also double checked and 
it seems {{capacity}} is only used to eliminate dot-snapshot elements in 
{{inodes}}. This patch basically removed {{capacity}} as a field and made it a 
counter inside {{resolve}}. 

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-7104-20140923-v1.patch


 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7123) Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint

2014-09-23 Thread Lohit Vijayarenu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145065#comment-14145065
 ] 

Lohit Vijayarenu commented on HDFS-7123:


If we drop the lock then users of these images will have to have the knowledge 
that both are of them could be out of sync. Parallel checkpoint takes more CPU, 
but that that point it is holding big lock where no other CPU intensive task is 
going on anyways. Changes look good. +1 on the patch. 

 Run legacy fsimage checkpoint in parallel with PB fsimage checkpoint
 

 Key: HDFS-7123
 URL: https://issues.apache.org/jira/browse/HDFS-7123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7123.patch


 HDFS-7097 will address the checkpoint and BR issue. In addition, it might 
 still be useful to reduce the overall checkpoint duration, given it blocks 
 edit log replay. If there is large volume of edit log to catch up and NN fail 
 overs, it will impact the availability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6971) Bounded staleness of EDEK caches on the NN


[ 
https://issues.apache.org/jira/browse/HDFS-6971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145094#comment-14145094
 ] 

Zhe Zhang commented on HDFS-6971:
-

The {{ValueQueue}} ({{encKeyVersionQueue}}) uses an underlying cache 
{{keyQueues}}, which is constructed with {{expireAfterAccess}} too. I'll add a 
test in {{TestKMS}} to verify the time-boundedness. 

 Bounded staleness of EDEK caches on the NN
 --

 Key: HDFS-6971
 URL: https://issues.apache.org/jira/browse/HDFS-6971
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: encryption
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Zhe Zhang

 The EDEK cache on the NN can hold onto keys after the admin has rolled the 
 key. It'd be good to time-bound the caches, perhaps also providing an 
 explicit flush command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HDFS-7135) Add trace command to FsShell


 [ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reopened HDFS-7135:


Not a duplicate.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145103#comment-14145103
 ] 

Colin Patrick McCabe commented on HDFS-7135:


bq. This is effectively a dupe of HDFS-6956.

HDFS-6956 is about servers, this is about clients.  It's not possible to do an 
RPC to clients to change their tracing configuration because clients normally 
don't expose a port and a listening service to the world.  Plus clients are 
often short-lived, so by the time you could even do an RPC, the client might 
have exited.

bq. Adding manual tracing command which can be used like 'hdfs dfs -trace -ls 
/'.

I sort of appreciate the simplicity of this interface, but I'm not sure how I 
feel about this patch.  It seems more consistent to configure client-side 
tracing by setting a configuration key.  I suppose the \-trace command added 
here can simply override that configuration, though.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-6957) Allow DFSClient to manually specify that a request should be traced


 [ 
https://issues.apache.org/jira/browse/HDFS-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-6957.

Resolution: Duplicate

 Allow DFSClient to manually specify that a request should be traced
 ---

 Key: HDFS-6957
 URL: https://issues.apache.org/jira/browse/HDFS-6957
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 Allow the DFSClient to manually specify that a request should be traced.  One 
 easy way to do this might be to have a configuration property that the 
 DFSClient reads which causes it to make all its requests traced.  This will 
 allow us to more easily diagnose performance problems with a specific file or 
 client request type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream


[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145113#comment-14145113
 ] 

Colin Patrick McCabe commented on HDFS-7055:


Jenkins says that there is a new findbugs warning, but looking at:
https://builds.apache.org/job/PreCommit-HDFS-Build/8150//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html

It says there are 0?

Meanwhile {{diffJavacWarnings.txt}} is missing, so I can't evaluate where there 
is an additional warning or not.

Jenkins has been frustrating lately.  I will re-trigger this build.

 Add tracing to DFSInputStream
 -

 Key: HDFS-7055
 URL: https://issues.apache.org/jira/browse/HDFS-7055
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7055.002.patch


 Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145125#comment-14145125
 ] 

Colin Patrick McCabe commented on HDFS-7135:


[~stack], [~iwasakims]: When you get a chance, can you take a look at my patch 
on HDFS-7055?  It adds a new configuration key, {{dfs.client.trace.sampler}}, 
which the HDFS client can set to control tracing.  It's a little more flexible 
than the approach here because {{dfs.client.trace.sampler}} can be set for 
arbitrary clients, and it can do probabilistic sampling rather than always-on.

Given that we already have ways to inject extra hadoop config keys into the 
shell via command-line arguments (via \-D and friends), it might be easier just 
to tell people to use that.  Especially given that they will need other 
configuration to use tracing, like the location of the trace file (if they're 
using the file sink), and etc.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold


[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145139#comment-14145139
 ] 

Colin Patrick McCabe commented on HDFS-6988:


It seems like these configuration keys should be longs, not ints, since we 
might want values bigger than 2 gigabytes.

This scheme seems to be getting a little too complex to easily understand.

Rather than having three configuration keys (min, max, percentage), how about a 
single configuration key that is interpreted differently based on its value?  
So if it is 10% (has a percent sign at the end) we interpret it as a 
percentage... if it's 128MB we interpret that as the amount of space to keep 
free.

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: HDFS-6581

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145144#comment-14145144
 ] 

Allen Wittenauer edited comment on HDFS-7135 at 9/23/14 6:04 PM:
-

Why do we need two commands?  Why can't one work for both client and server, 
just use different options?


was (Author: aw):
Why do we need two commands?  Why can't both work for both client and server, 
just use different options?

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145144#comment-14145144
 ] 

Allen Wittenauer commented on HDFS-7135:


Why do we need two commands?  Why can't both work for both client and server, 
just use different options?

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145152#comment-14145152
 ] 

stack commented on HDFS-7135:
-

Yeah, not a duplicate.  HDFS-6956 adds enable/disable/listing and configuration 
of server-side tracing sinks.  This patch turns on tracing while dfs command 
runs. Adds simple, low-threshold means of poking at suspected, problematics 
area in HDFS. 

This patch is also nice because it hoists tracing up to be a first-class 
option.  It could be argued that tracing doesn't yet have enough meat on it so 
it may not yet be ready for the spotlight but hopefully that'll be soon 
addressed.

bq. I suppose the -trace command added here can simply override that 
configuration, though.

Sounds good [~cmccabe]



 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-3107) HDFS truncate

2014-09-23 Thread Plamen Jeliazkov (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Plamen Jeliazkov updated HDFS-3107:
---
Attachment: HDFS-3107.patch

Attaching patch with updated Javadoc.

# Changed ClientProtocol JavaDoc to match what Konstantin pointed out.
# Also added an @return JavaDoc for ClientProtocol.
# Modified the FSDirectory unprotectedTruncate() JavaDoc and comments to remove
any notion of 'schedule block for truncate'. The scheduling logic lives in
FSNamesystem.

More tests to show when dealing with competing appends / creates are necessary.
I'll include some in the next patch.

HDFS truncate
-

Key: HDFS-3107
URL: https://issues.apache.org/jira/browse/HDFS-3107
Project: Hadoop HDFS
Issue Type: New Feature
Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf,
HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf

Original Estimate: 1,344h
Remaining Estimate: 1,344h

Systems with transaction support often need to undo changes made to the
underlying storage when a transaction is aborted. Currently HDFS does not
support truncate (a standard Posix operation) which is a reverse operation of
append, which makes upper layer applications use ugly workarounds (such as
keeping track of the discarded byte range per file in a separate metadata
store, and periodically running a vacuum process to rewrite compacted files)
to overcome this limitation of HDFS.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145161#comment-14145161
 ] 

Colin Patrick McCabe commented on HDFS-7135:


My worry about adding this option is that it won't be useful without setting 
configuration keys such as {{local-file-span-receiver.path}}.  So if you have 
to tweak the configuration anyway, why not just tweak 
{{dfs.client.trace.sampler}} while you're at it?  Then we don't need this 
command.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.


[ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145163#comment-14145163
 ] 

Hadoop QA commented on HDFS-7130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670544/HDFS-7130.1.patch
  against trunk revision a1fd804.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8163//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8163//console

This message is automatically generated.

 TestDataTransferKeepalive fails intermittently on Windows.
 --

 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7130.1.patch


 {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
 tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
 likely too short on Windows, which has been observed to have a less granular 
 clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7125) Report failures during adding or removing volumes


[ 
https://issues.apache.org/jira/browse/HDFS-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145168#comment-14145168
 ] 

Colin Patrick McCabe commented on HDFS-7125:


I guess this should happen through the status-listing RPC, since 
reconfiguration is asynchronous.

 Report failures during adding or removing volumes
 -

 Key: HDFS-7125
 URL: https://issues.apache.org/jira/browse/HDFS-7125
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 The details of the failures during hot swapping volumes should be reported 
 through RPC to the user who issues the reconfiguration CLI command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.


[ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145171#comment-14145171
 ] 

Chris Nauroth commented on HDFS-7130:
-

The test failures are unrelated.

 TestDataTransferKeepalive fails intermittently on Windows.
 --

 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7130.1.patch


 {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
 tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
 likely too short on Windows, which has been observed to have a less granular 
 clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers


[ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145188#comment-14145188
 ] 

Colin Patrick McCabe commented on HDFS-6956:


Not sure what the issue is here.  All the tests pass for me locally, and the 
jenkins log is puzzling.  I am going to re-trigger the build.

 Allow dynamically changing the tracing level in Hadoop servers
 --

 Key: HDFS-6956
 URL: https://issues.apache.org/jira/browse/HDFS-6956
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6956.002.patch, HDFS-6956.003.patch, 
 HDFS-6956.004.patch


 We should allow users to dynamically change the tracing level in Hadoop 
 servers.  The easiest way to do this is probably to have an RPC accessible 
 only to the superuser that changes tracing settings.  This would allow us to 
 turn on and off tracing on the NameNode, DataNode, etc. at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold

2014-09-23 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145202#comment-14145202
 ] 

Arpit Agarwal commented on HDFS-6988:
-

Thanks for taking a look at the patch. They are integers as they are replica 
counts - to be multiplied by the default block length at runtime.

A single default simply won't work for a range of ram disk sizes. It will force 
every administrator to configure one more setting. This way we have reasonable 
default behavior for most drive sizes, from a few GB up to 100GB.

 Add configurable limit for percentage-based eviction threshold
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: HDFS-6581

 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7113) Add DFSAdmin Command to Recover Lease


[ 
https://issues.apache.org/jira/browse/HDFS-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145200#comment-14145200
 ] 

Colin Patrick McCabe commented on HDFS-7113:


Yeah, this is a duplicate.

 Add DFSAdmin Command to Recover Lease
 -

 Key: HDFS-7113
 URL: https://issues.apache.org/jira/browse/HDFS-7113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Miklos Christine
Priority: Minor
 Attachments: HDFS-7113.2.patch, HDFS-7113.patch


 In certain conditions, a lease may be left around if an error occurs while 
 writing to HDFS and the file is left open. 
 Having a DFSAdmin command would allow administrators to recover the lease and 
 close the file easily. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream

[
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145221#comment-14145221
]

stack commented on HDFS-7055:
-

bq. One thing to keep in mind here is that if you call Trace.startSpan with
Sampler.NEVER, and there is an existing thread trace span, a subspan will
always be created.

Thanks for mentioning this up front... first thing I stumbled on looking in
code. Its a little confusing but having a comment to explain NEVER in every
span open, it'll get annoying fast.

Nit: This exception, if it possible to ask trace for the list of options,
should list the possible options (I can see folks typing in sampler with wrong
case or missing a piece... listing possible options will allow them quickly see
what they have done wrong):

+ throw new RuntimeException(Can't create sampler + samplerStr);

Nit: Should we have a convention naming spans [~cmccabe]? For example, method
name followed by arg types all in camel case?

+dfsClient.getTraceScope(byteBufferRead, src);

... would become readByteBuffer and +
dfsClient.getTraceScope(byteArrayRead, src); would be readByteArrayIntInt?

Patch looks great to me. You gotten any spans out of it? I can try it if you'd
like, no problem.

Add tracing to DFSInputStream
-

Key: HDFS-7055
URL: https://issues.apache.org/jira/browse/HDFS-7055
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-7055.002.patch

Add tracing to DFSInputStream.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145224#comment-14145224
 ] 

stack commented on HDFS-7135:
-

bq. So if you have to tweak the configuration anyway, why not just tweak 
dfs.client.trace.sampler while you're at it? Then we don't need this command.

Sounds reasonable to me [~cmccabe]

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7125) Report failures during adding or removing volumes


[ 
https://issues.apache.org/jira/browse/HDFS-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145235#comment-14145235
 ] 

Lei (Eddy) Xu commented on HDFS-7125:
-

[~cmccabe] Yes, this is for refining the information and formats from 
{{-reconfig status}} command. 

 Report failures during adding or removing volumes
 -

 Key: HDFS-7125
 URL: https://issues.apache.org/jira/browse/HDFS-7125
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.1
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 The details of the failures during hot swapping volumes should be reported 
 through RPC to the user who issues the reconfiguration CLI command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6917) Add an hdfs debug command to validate blocks, call recoverlease, etc.

[
https://issues.apache.org/jira/browse/HDFS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145250#comment-14145250
]

Colin Patrick McCabe commented on HDFS-6917:

bq. Can clean up the imports in DebugAdmin, I can tell where it was copy pasted
from

bq. Missing some indentation in the super() calls in each command

I wanted to keep it this way so that what was shown in the source file
corresponded to what was displayed on the command-line. At the same time, I
didn't want to exceed 79 columns as per our coding standard. You can see the
dilemma here... if I use normal indentation, I either have to accept less than
79 columns for the command-line output, or exceed the limit for the source code.

bq. Need tests

Can we do this in a follow-up?

bq. Hardcoding 7 is okay, but slightly better would be 2 +
DataChecksum.HEADER_LEN.

bq. CHECKSUMS_PER_BUF seems kinda large. With 512B per checksum, we're
allocating a 64MB data buffer. I figure 8MB would be enough to still get good
disk perf.

bq. metaDidRead is unused

removed

bq. Could print the current retry count when sleeping/looping

bq. I expected the default # of retries to be 0, so the command by default
tries to do a single recoverLease

Add an hdfs debug command to validate blocks, call recoverlease, etc.
-

Key: HDFS-6917
URL: https://issues.apache.org/jira/browse/HDFS-6917
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-6917.001.patch, HDFS-6917.002.patch

HDFS should have a debug command which could validate HDFS block files, call
recoverLease, and have some other functionality. These commands would be
purely for debugging and would appear under a separate command hierarchy
inside the hdfs command. There would be no guarantee of API stability for
these commands and the debug submenu would not be listed by just typing the
hdfs command.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6917) Add an hdfs debug command to validate blocks, call recoverlease, etc.


 [ 
https://issues.apache.org/jira/browse/HDFS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-6917:
---
Attachment: HDFS-6917.003.patch

 Add an hdfs debug command to validate blocks, call recoverlease, etc.
 -

 Key: HDFS-6917
 URL: https://issues.apache.org/jira/browse/HDFS-6917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6917.001.patch, HDFS-6917.002.patch, 
 HDFS-6917.003.patch


 HDFS should have a debug command which could validate HDFS block files, call 
 recoverLease, and have some other functionality.  These commands would be 
 purely for debugging and would appear under a separate command hierarchy 
 inside the hdfs command.  There would be no guarantee of API stability for 
 these commands and the debug submenu would not be listed by just typing the 
 hdfs command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7129) Metrics to track usage of memory for writes

2014-09-23 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-7129:

Description: 
A few metrics to evaluate feature usage and suggest improvements. Thanks to 
[~sureshms] for some of these suggestions.

# Number of times a block in memory was read (before being ejected)
# Average block size for data written to memory tier
# Time the block was in memory before being ejected
# Number of blocks written to memory
# Number of memory writes requested but not satisfied (failed-over to disk)
# Number of blocks evicted without ever being read from memory
# Average delay between memory write and disk write (window where a node 
restart could cause data loss).
# Replicas written to disk by lazy writer
# Bytes written to disk by lazy writer
# Replicas deleted by application before being persisted to disk

  was:
A few metrics to evaluate feature usage and suggest improvements. Thanks to 
[~sureshms] for some of these suggestions.

# Number of times a block in memory was read (before being ejected)
# Average block size for data written to memory tier
# Time the block was in memory before being ejected
# Number of blocks written to memory
# Number of memory writes requested but not satisfied (failed-over to disk)
# Number of blocks evicted without ever being read from memory
# Average delay between memory write and disk write (window where a node 
restart could cause data loss).


 Metrics to track usage of memory for writes
 ---

 Key: HDFS-7129
 URL: https://issues.apache.org/jira/browse/HDFS-7129
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal

 A few metrics to evaluate feature usage and suggest improvements. Thanks to 
 [~sureshms] for some of these suggestions.
 # Number of times a block in memory was read (before being ejected)
 # Average block size for data written to memory tier
 # Time the block was in memory before being ejected
 # Number of blocks written to memory
 # Number of memory writes requested but not satisfied (failed-over to disk)
 # Number of blocks evicted without ever being read from memory
 # Average delay between memory write and disk write (window where a node 
 restart could cause data loss).
 # Replicas written to disk by lazy writer
 # Bytes written to disk by lazy writer
 # Replicas deleted by application before being persisted to disk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7104) Fix and clarify INodeInPath getter functions


[ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145285#comment-14145285
 ] 

Hadoop QA commented on HDFS-7104:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12670747/HDFS-7104-20140923-v1.patch
  against trunk revision a1fd804.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestSnapshotCommands
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8165//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8165//console

This message is automatically generated.

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-7104-20140923-v1.patch


 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7135) Add trace command to FsShell


[ 
https://issues.apache.org/jira/browse/HDFS-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145284#comment-14145284
 ] 

Hadoop QA commented on HDFS-7135:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670725/HDFS-7135-1.patch
  against trunk revision a1fd804.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.crypto.random.TestOsSecureRandom
  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.tracing.TestTracing
  org.apache.hadoop.hdfs.TestDatanodeBlockScanner
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8164//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8164//console

This message is automatically generated.

 Add trace command to FsShell
 

 Key: HDFS-7135
 URL: https://issues.apache.org/jira/browse/HDFS-7135
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
 Attachments: HDFS-7135-0.patch, HDFS-7135-1.patch


 Adding manual tracing command which can be used like 'hdfs dfs -trace -ls /'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7128) Decommission slows way down when it gets towards the end

2014-09-23 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145289#comment-14145289
 ] 

Ming Ma commented on HDFS-7128:
---

Thanks, Kihwal. After we finish some cluster level testing, I will update the 
patch with some unit tests and address your comment.

Regarding dfs.namenode.replication.max-streams or 
dfs.namenode.replication.max-streams-hard-limit, my understanding is the 
current values are ok, based on block size and heartbeat interval and DN 
balance bandwidth. If we increase it, it might not help with performance. For 
example, say block size is 512MB, heartbeat interval is 6s, DN balance 
bandwidth is 40MB, they it takes around min 12s to replicate one block. DN 
heartbeat is frequent enough to get the next items in the queue to maintain the 
max throughput. We can do some evaluation on this.

 Decommission slows way down when it gets towards the end
 

 Key: HDFS-7128
 URL: https://issues.apache.org/jira/browse/HDFS-7128
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HDFS-7128.patch


 When we decommission nodes across different racks, the decommission process 
 becomes really slow at the end, hardly making any progress. The problem is 
 some blocks are on 3 decomm-in-progress DNs and the way how replications are 
 scheduled caused unnecessary delay. Here is the analysis.
 When BlockManager schedules the replication work from neededReplication, it 
 first needs to pick the source node for replication via chooseSourceDatanode. 
 The core policies to pick the source node are:
 1. Prefer decomm-in-progress node.
 2. Only pick the nodes whose outstanding replication counts are below 
 thresholds dfs.namenode.replication.max-streams or 
 dfs.namenode.replication.max-streams-hard-limit, based on the replication 
 priority.
 When we decommission nodes,
 1. All the decommission nodes' blocks will be added to neededReplication.
 2. BM will pick X number of blocks from neededReplication in each iteration. 
 X is based on cluster size and some configurable multiplier. So if the 
 cluster has 2000 nodes, X will be around 4000.
 3. Given these 4000 nodes are on the same decomm-in-progress node A, A end up 
 being chosen as the source node of all these 4000 nodes. The reason the 
 outstanding replication thresholds don't kick is due to the implementation of 
 BlockManager.computeReplicationWorkForBlocks; 
 node.getNumberOfBlocksToBeReplicated() remains zero given 
 node.addBlockToBeReplicated is called after source node iteration.
 {noformat}
 ...
   synchronized (neededReplications) {
 for (int priority = 0; priority  blocksToReplicate.size(); 
 priority++) {
 ...
 chooseSourceDatanode
 ...
 }
   for(ReplicationWork rw : work){
 ...
   rw.srcNode.addBlockToBeReplicated(block, targets);
 ...
   }
 {noformat}
  
 4. So several decomm-in-progress nodes A, B, C end up with 4000 
 node.getNumberOfBlocksToBeReplicated().
 5. If we assume each node can replicate 5 blocks per minutes, it is going to 
 take 800 minutes to finish replication of these blocks.
 6. Pending replication timeout kick in after 5 minutes. The items will be 
 removed from the pending replication queue and added back to 
 neededReplication. The replications will then be handled by other source 
 nodes of these blocks. But the blocks still remain in nodes A, B, C's pending 
 replication queue, DatanodeDescriptor.replicateBlocks, so A, B, C continue 
 the replications of these blocks, although these blocks might have been 
 replicated by other DNs after replication timeout.
 7. Some block' replicas exist on A, B, C and it is at the end of A's pending 
 replication queue. Even though the block's replication timeout, no source 
 node can be chosen given A, B, C all have high pending replication count. So 
 we have to wait until A drains its pending replication queue. Meanwhile, the 
 items in A's pending replication queue have been taken care of by other nodes 
 and no longer under replicated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6894) Add XDR parser method for each NFS response


[ 
https://issues.apache.org/jira/browse/HDFS-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145370#comment-14145370
 ] 

Hadoop QA commented on HDFS-6894:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670129/HDFS-6894.001.patch
  against trunk revision 3dc28e2.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8169//console

This message is automatically generated.

 Add XDR parser method for each NFS response
 ---

 Key: HDFS-6894
 URL: https://issues.apache.org/jira/browse/HDFS-6894
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-6894.001.patch


 This can be an abstract method in NFS3Response to force the subclasses to 
 implement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7055) Add tracing to DFSInputStream


[ 
https://issues.apache.org/jira/browse/HDFS-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145389#comment-14145389
 ] 

Hadoop QA commented on HDFS-7055:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670363/HDFS-7055.002.patch
  against trunk revision 5338ac4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8166//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8166//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8166//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8166//console

This message is automatically generated.

 Add tracing to DFSInputStream
 -

 Key: HDFS-7055
 URL: https://issues.apache.org/jira/browse/HDFS-7055
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7055.002.patch


 Add tracing to DFSInputStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7035) Make adding volume an atomic operation.


 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Summary: Make adding volume an atomic operation.  (was: Refactor 
DataStorage and BlockSlicePoolStorage )

 Make adding volume an atomic operation.
 ---

 Key: HDFS-7035
 URL: https://issues.apache.org/jira/browse/HDFS-7035
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
 HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
 HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch


 {{DataStorage}} and {{BlockPoolSliceStorage}} share many similar code path. 
 This jira extracts the common part of these two classes to simplify the logic 
 for both.
 This is the ground work for handling partial failures during hot swapping 
 volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7035) Refactor DataStorage and BlockSlicePoolStorage


 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Attachment: HDFS-7035.004.patch

Make addVolume() become an atomic operation. The volume metadata in 
{{DataStorage}} and {{FsDataset}} is first load into a local copy. After all 
I/O finishes, if there is nothing failed, then the {{DataNode}} commits the 
loaded volume metadata to {{DataStorage}} and {{FsDataset}} respectively. 
Therefore, if there is any error happened during loading a volume, the metadata 
belonging to this volume will not be visible to the service.

Also it captures the error message for {{IOExceptions}} in 
{{DataStorage#removeVolumes()}}.

 Refactor DataStorage and BlockSlicePoolStorage 
 ---

 Key: HDFS-7035
 URL: https://issues.apache.org/jira/browse/HDFS-7035
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
 HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
 HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch


 {{DataStorage}} and {{BlockPoolSliceStorage}} share many similar code path. 
 This jira extracts the common part of these two classes to simplify the logic 
 for both.
 This is the ground work for handling partial failures during hot swapping 
 volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7035) Make adding volume an atomic operation.


 [ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-7035:

Description: It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to 
reduce the duplicate code and supports atomic adding volume operations.   (was: 
{{DataStorage}} and {{BlockPoolSliceStorage}} share many similar code path. 
This jira extracts the common part of these two classes to simplify the logic 
for both.

This is the ground work for handling partial failures during hot swapping 
volumes.)

 Make adding volume an atomic operation.
 ---

 Key: HDFS-7035
 URL: https://issues.apache.org/jira/browse/HDFS-7035
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
 HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
 HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch


 It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
 duplicate code and supports atomic adding volume operations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6917) Add an hdfs debug command to validate blocks, call recoverlease, etc.


[ 
https://issues.apache.org/jira/browse/HDFS-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145438#comment-14145438
 ] 

Hadoop QA commented on HDFS-6917:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670775/HDFS-6917.003.patch
  against trunk revision 3dc28e2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8168//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8168//console

This message is automatically generated.

 Add an hdfs debug command to validate blocks, call recoverlease, etc.
 -

 Key: HDFS-6917
 URL: https://issues.apache.org/jira/browse/HDFS-6917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6917.001.patch, HDFS-6917.002.patch, 
 HDFS-6917.003.patch


 HDFS should have a debug command which could validate HDFS block files, call 
 recoverLease, and have some other functionality.  These commands would be 
 purely for debugging and would appear under a separate command hierarchy 
 inside the hdfs command.  There would be no guarantee of API stability for 
 these commands and the debug submenu would not be listed by just typing the 
 hdfs command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-23 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7132:
--
   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

+1 LGTM, committed to trunk and branch-2. Thanks Charles.

 hdfs namenode -metadataVersion command does not honor configured name dirs
 --

 Key: HDFS-7132
 URL: https://issues.apache.org/jira/browse/HDFS-7132
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7132.001.patch


 The hdfs namenode -metadataVersion command does not honor 
 dfs.namenode.name.dir.nameservice.namenode configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-23 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7132:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: HDFS-6891)

 hdfs namenode -metadataVersion command does not honor configured name dirs
 --

 Key: HDFS-7132
 URL: https://issues.apache.org/jira/browse/HDFS-7132
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7132.001.patch


 The hdfs namenode -metadataVersion command does not honor 
 dfs.namenode.name.dir.nameservice.namenode configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-316) Balancer should run for a configurable # of iterations


[ 
https://issues.apache.org/jira/browse/HDFS-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145456#comment-14145456
 ] 

Allen Wittenauer commented on HDFS-316:
---

Please don't use camel case options.  I know the rest of the system does, but 
they are extremely user unfriendly and something we should start actively 
avoiding.

 Balancer should run for a configurable # of iterations
 --

 Key: HDFS-316
 URL: https://issues.apache.org/jira/browse/HDFS-316
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.4.1
Reporter: Brian Bockelman
Assignee: Xiaoyu Yao
Priority: Minor
  Labels: newbie
 Attachments: HDFS-316.0.patch


 The balancer currently exits if nothing has changed after 5 iterations.
 Our site would like to constantly balance a stream of incoming data; we would 
 like to be able to set the number of iterations it does nothing for before 
 exiting; even better would be if we set it to a negative number and could 
 continuously run this as a daemon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6932) Balancer and Mover tools should ignore replicas on RAM_DISK

2014-09-23 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-6932:
-
Attachment: HDFS-6932.1.patch

Attach a patch to skip move/balance to/from transient storage and add unit 
tests.  

 Balancer and Mover tools should ignore replicas on RAM_DISK
 ---

 Key: HDFS-6932
 URL: https://issues.apache.org/jira/browse/HDFS-6932
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao
 Attachments: HDFS-6932.0.patch, HDFS-6932.1.patch


 Per title, balancer and mover should just ignore replicas on RAM disk instead 
 of attempting to move them to other nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7104) Fix and clarify INodeInPath getter functions


[ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145463#comment-14145463
 ] 

Zhe Zhang commented on HDFS-7104:
-

{{TestSnapshotCommands}} failure reveals that {{FSDirectory#isDir()}} also 
relies on {{getLastINodeInPath()}} to return a null inode to indicate the path 
(e.g. /dir/.snapshot) is _not_ a directory. So I don't think we can simply 
eliminate null elements in {{resolve()}}. 

I think we need to keep {{resolve()}} as is and rename {{getINodes()}} to 
{{getNonNullINodes()}}, while making it a real getter (without changing 
{{inodes}} array). It might also be useful to write another {{getAllINodes()}} 
method to return both null and non-null inodes. 

Thoughts?

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-7104-20140923-v1.patch


 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.

2014-09-23 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145464#comment-14145464
 ] 

Jing Zhao commented on HDFS-7130:
-

The patch looks good to me. +1

 TestDataTransferKeepalive fails intermittently on Windows.
 --

 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-7130.1.patch


 {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
 tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
 likely too short on Windows, which has been observed to have a less granular 
 clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6888) Remove audit logging of getFIleInfo()

2014-09-23 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated HDFS-6888:
--
Attachment: HDFS-6888-4.patch

patch updated.

 Remove audit logging of getFIleInfo()
 -

 Key: HDFS-6888
 URL: https://issues.apache.org/jira/browse/HDFS-6888
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Chen He
  Labels: log
 Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, 
 HDFS-6888.patch


 The audit logging of getFileInfo() was added in HDFS-3733.  Since this is a 
 one of the most called method, users have noticed that audit log is now 
 filled with this.  Since we now have HTTP request logging, this seems 
 unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-3107) HDFS truncate

2014-09-23 Thread Plamen Jeliazkov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Plamen Jeliazkov updated HDFS-3107:
---
Status: Patch Available  (was: Open)

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6956) Allow dynamically changing the tracing level in Hadoop servers


[ 
https://issues.apache.org/jira/browse/HDFS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145487#comment-14145487
 ] 

Hadoop QA commented on HDFS-6956:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670549/HDFS-6956.004.patch
  against trunk revision 5338ac4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8167//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8167//console

This message is automatically generated.

 Allow dynamically changing the tracing level in Hadoop servers
 --

 Key: HDFS-6956
 URL: https://issues.apache.org/jira/browse/HDFS-6956
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-6956.002.patch, HDFS-6956.003.patch, 
 HDFS-6956.004.patch


 We should allow users to dynamically change the tracing level in Hadoop 
 servers.  The easiest way to do this is probably to have an RPC accessible 
 only to the superuser that changes tracing settings.  This would allow us to 
 turn on and off tracing on the NameNode, DataNode, etc. at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate


[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145497#comment-14145497
 ] 

Hadoop QA commented on HDFS-3107:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670760/HDFS-3107.patch
  against trunk revision f48686a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8172//console

This message is automatically generated.

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7130) TestDataTransferKeepalive fails intermittently on Windows.


 [ 
https://issues.apache.org/jira/browse/HDFS-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7130:

   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thank you for the code review, Jing.  I committed this to trunk and branch-2.

 TestDataTransferKeepalive fails intermittently on Windows.
 --

 Key: HDFS-7130
 URL: https://issues.apache.org/jira/browse/HDFS-7130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Fix For: 2.6.0

 Attachments: HDFS-7130.1.patch


 {{TestDataTransferKeepalive}} has failed intermittently on Windows.  These 
 tests rely on a 1 ms thread sleep to wait for a cache expiration.  This is 
 likely too short on Windows, which has been observed to have a less granular 
 clock interrupt period compared to typical Linux machines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7104) Fix and clarify INodeInPath getter functions


 [ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-7104:

Attachment: HDFS-7104-20140923-v2.patch

This patch reflects the above comment on refactoring {{getINodes()}}

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-7104-20140923-v1.patch, HDFS-7104-20140923-v2.patch


 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7132) hdfs namenode -metadataVersion command does not honor configured name dirs

2014-09-23 Thread Charles Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145528#comment-14145528
 ] 

Charles Lamb commented on HDFS-7132:


Thanks for the review [~andrew.wang].

 hdfs namenode -metadataVersion command does not honor configured name dirs
 --

 Key: HDFS-7132
 URL: https://issues.apache.org/jira/browse/HDFS-7132
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 2.6.0

 Attachments: HDFS-7132.001.patch


 The hdfs namenode -metadataVersion command does not honor 
 dfs.namenode.name.dir.nameservice.namenode configuration parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7121) For JournalNode operations that must succeed on all nodes, attempt to undo the operation on all nodes if it fails on one node.

[
https://issues.apache.org/jira/browse/HDFS-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145535#comment-14145535
]

Colin Patrick McCabe commented on HDFS-7121:

I think it's probably good enough to just check if all JournalNodes are present
before sending out the {{doPreUpgrade}} message. This guards against the
administrative misconfiguration case, or the case where one or more journal
nodes are down. It's true that we could experience a failure in between that
check and the pre-upgrade operation, but the chances of that happening are very
low. If it does happen, it will simply result in a JN being dropped out of the
quorum later, which monitoring tools will pick up, and admins will fix. I'm
pretty sure that there isn't a complete solution to this problem because it can
be reduced to the Two Generals Problem.

For JournalNode operations that must succeed on all nodes, attempt to undo
the operation on all nodes if it fails on one node.
--

Key: HDFS-7121
URL: https://issues.apache.org/jira/browse/HDFS-7121
Project: Hadoop HDFS
Issue Type: Sub-task
Components: journal-node
Reporter: Chris Nauroth

Several JournalNode operations are not satisfied by a quorum. They must
succeed on every JournalNode in the cluster. If the operation succeeds on
some nodes, but fails on others, then this may leave the nodes in an
inconsistent state and require operations to do manual recovery steps. For
example, if {{doPreUpgrade}} succeeds on 2 nodes and fails on 1 node, then
the operator will need to correct the problem on the failed node and also
manually restore the previous.tmp directory to current on the 2 successful
nodes before reattempting the upgrade.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7121) For JournalNode operations that must succeed on all nodes, attempt to undo the operation on all nodes if it fails on one node.

[
https://issues.apache.org/jira/browse/HDFS-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145556#comment-14145556
]

Chris Nauroth commented on HDFS-7121:
-

bq. I think it's probably good enough to just check if all JournalNodes are
present before sending out the doPreUpgrade message.

Hi Colin. This is coming out of a production support issue in which some
invalid file system permissions caused the rename from current to previous.tmp
to fail on 1 out of 3 JournalNodes. There weren't any nodes down. A pre-check
like you suggested wouldn't have helped protect against this, because the
failure wouldn't show up until actually attempting to do the work.

For JournalNode operations that must succeed on all nodes, attempt to undo
the operation on all nodes if it fails on one node.
--

Key: HDFS-7121
URL: https://issues.apache.org/jira/browse/HDFS-7121
Project: Hadoop HDFS
Issue Type: Sub-task
Components: journal-node
Reporter: Chris Nauroth

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7136) libhdfs doesn't compile on OS X

Allen Wittenauer created HDFS-7136:
--

 Summary: libhdfs doesn't compile on OS X
 Key: HDFS-7136
 URL: https://issues.apache.org/jira/browse/HDFS-7136
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer


vecsum uses clock_gettime which isn't supported on OS X. Like Windows, we just 
need to ignore that bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7136) libhdfs doesn't compile on OS X


 [ 
https://issues.apache.org/jira/browse/HDFS-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7136:
---
Attachment: HDFS-7136.patch

-00: Change the cmakefile to only build vecsum if the clock_gettime routine 
exists.

 libhdfs doesn't compile on OS X
 ---

 Key: HDFS-7136
 URL: https://issues.apache.org/jira/browse/HDFS-7136
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer
 Attachments: HDFS-7136.patch


 vecsum uses clock_gettime which isn't supported on OS X. Like Windows, we 
 just need to ignore that bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7136) libhdfs fails compile step on OS X


 [ 
https://issues.apache.org/jira/browse/HDFS-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7136:
---
Summary: libhdfs fails compile step on OS X  (was: libhdfs doesn't compile 
on OS X)

 libhdfs fails compile step on OS X
 --

 Key: HDFS-7136
 URL: https://issues.apache.org/jira/browse/HDFS-7136
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer
 Attachments: HDFS-7136.patch


 vecsum uses clock_gettime which isn't supported on OS X. Like Windows, we 
 just need to ignore that bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7104) Fix and clarify INodeInPath getter functions

2014-09-23 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145567#comment-14145567
 ] 

Jing Zhao commented on HDFS-7104:
-

Thanks for the analysis, [~zhz].

For all the callers of {{getINodes}}, we have the following two cases:
# The path is a non-snapshot path. In this case the {{getINodes}} returns 
inodes directly, which includes null elements.
# The path is a snapshot path (including paths ending with dot-snapshot). In 
this case, the current {{getINodes}} trims elements to make the length of 
{{inodes}} equal to the value of {{capacity}}. Note that in this case null 
elements may still be contained in {{inodes}} (otherwise we cannot identify 
non-existing files/directories in snapshots).

So I can see three options here. The first option is to keep the current 
{{getINodes}} unchanged, but adding more javadoc to explain the logic behind. 
The second option is to make it a real getter, but we cannot rename it to 
{{getNonNullINodes}} since null elements can still be included.

The third option is to also create an extra method 
{{INodesInPath#getINodesForWrite}} for the above case #1, which first does a 
sanity check to make sure {{capacity == inodes.length}}, and then returns 
{{inodes}} directly. This method can be called by write ops like mkdir, concat, 
delete, etc.

Since we do not usually call {{getINodes}} multiple times for the same 
INodesInPath instance, I think we may consider starting from option 2.

 Fix and clarify INodeInPath getter functions
 

 Key: HDFS-7104
 URL: https://issues.apache.org/jira/browse/HDFS-7104
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor
 Attachments: HDFS-7104-20140923-v1.patch, HDFS-7104-20140923-v2.patch


 inodes is initialized with the number of patch components. After resolve, it 
 contains both non-null and null elements (introduced by dot-snapshot dirs).
 When getINodes is called, an array is returned excluding all non elements, 
 which is the correct behavior. Meanwhile, the inodes array is trimmed too, 
 which shouldn't be done by a getter.
 Because of the above, the behavior of getINodesInPath depends on whether 
 getINodes has been called, which is not correct.
 The name of getLastINodeInPath is confusing – it actually returns the last 
 non-null inode in the path. Also, shouldn't the return type be a single INode?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7136) libhdfs fails compile step on OS X


 [ 
https://issues.apache.org/jira/browse/HDFS-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7136:
---
Fix Version/s: (was: HDFS-6534)

 libhdfs fails compile step on OS X
 --

 Key: HDFS-7136
 URL: https://issues.apache.org/jira/browse/HDFS-7136
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer
 Attachments: HDFS-7136.patch


 vecsum uses clock_gettime which isn't supported on OS X. Like Windows, we 
 just need to ignore that bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7136) libhdfs fails compile step on OS X