[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5428:


Attachment: HDFS-5428.001.patch

Upload a new patch that replaces the block but without replacing the inodefile. 

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815766#comment-13815766
 ] 

Vinay commented on HDFS-5428:
-

bq. So here my question is whether it's possible that we just replace the last 
block of the snapshot INode with a BlockInfoUC (but without replacing the 
INodeFile with an INodeFileUC)?
If we replace the problem is, if the same INode is referring to a completed 
file [  might be due to rename and leaserecovery ] in normal path and replacing 
a last block in this INode may not be correct.

And one more problem here is the snapshotUCMap will not always contains the 
latest snapshot inode which will be written to fsmage as underconstruction file.
for ex:
1. when the file is being written, after allocating block b1, take snapshot 
s1
2. File is renamed.
3. Now the file is closed by lease recovery. and appended again one more 
block b2, and before closing one more snapshot is taken s2
4. and finally file is deleted.
5. Now while writing the inode tree to fsimage, inode in s2 comes first and 
then s1 , then only INode in s1 will be marked as underconstruction. but actual 
underconstruction is INode in S2 snapshot

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815775#comment-13815775
 ] 

Jing Zhao commented on HDFS-5428:
-

bq. if the same INode is referring to a completed file [ might be due to rename 
and leaserecovery ] in normal path 

We will replace the whole Inode if it is in normal path. We only replace its 
last block if the file is only in snapshot. But next time when we do the 
checkpoint again, we may need to check a file's last block to decide whether 
it's a fileUC.

Another option here is that we replace the inode for all the cases. To cover 
the challenge that we cannot get the full snapshot path, we can use the inode 
id to get the inode first, then scan the diff list of its parent to do the 
replacement. This will be inefficient but might be ok in case that we do not 
have a lot of snapshots and inodeUC.

bq. Now while writing the inode tree to fsimage, inode in s2 comes first and 
then s1 , then only INode in s1 will be marked as underconstruction. but actual 
underconstruction is INode in S2 snapshot

For rename, we will only have one INode here, which is referenced by two 
INodeReference instances stored in s1 and s2. And since we only record inode id 
in snapshotUCMap, this scenario might be fine?

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.1

2013-11-07 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815819#comment-13815819
 ] 

Rakesh R commented on HDFS-5411:


Thanks a lot for giving a try with 4.2.2. I'll take a look at this.

 Update Bookkeeper dependency to 4.2.1
 -

 Key: HDFS-5411
 URL: https://issues.apache.org/jira/browse/HDFS-5411
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Robert Rati
Priority: Minor
 Attachments: HDFS-5411.patch


 Update the bookkeeper dependency to 4.2.1.  This eases compilation on Fedora 
 platforms



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815821#comment-13815821
 ] 

Vinay commented on HDFS-5428:
-

bq. We will replace the whole Inode if it is in normal path. 
Here we will replace whole Inode only if its underconstruction. What if the 
same file is closed and present in some other path.?
bq.  Another option here is that we replace the inode for all the cases. To 
cover the challenge that we cannot get the full snapshot path, we can use the 
inode id to get the inode first, then scan the diff list of its parent to do 
the replacement. This will be inefficient but might be ok in case that we do 
not have a lot of snapshots and inodeUC.
To what level of scanning we can do..? And how we can find out the all previous 
locations of the inode. same INode might be renamed to different locations in 
snapshot

bq. For rename, we will only have one INode here, which is referenced by two 
INodeReference instances stored in s1 and s2. And since we only record inode id 
in snapshotUCMap, this scenario might be fine?
I am not sure about this. As far as I have seen while debugging if there is any 
modification done (such as adding one more block) on snapshotted node, a new 
inode instance will be saved inside snaphot diffs, not the INodeReference. 
INodeReference  will be used only if there is no modification between two 
inodes attributes other than name. 
Actually I got this point, because I have already faced these problems while 
preparing my patch. 

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815827#comment-13815827
 ] 

Hadoop QA commented on HDFS-5428:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612548/HDFS-5428.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5353//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5353//console

This message is automatically generated.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815873#comment-13815873
 ] 

Vinay commented on HDFS-5443:
-

Patch will not clear the blocks in this case.
1. rename underconstruction file/directory with 0-sized blocks after snapshot
2. delete the renamed directory.

because INode is saved to snapshot while renaming itself. so updation will not 
happen during deletion.

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815876#comment-13815876
 ] 

Vinay commented on HDFS-5443:
-

But, without making this patch much complex, if want to go ahead for 
committing, I have no objection as this will be covered anyway in HDFS-5428. 

 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 This issue is reported by Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark

2013-11-07 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816175#comment-13816175
 ] 

Arpit Agarwal commented on HDFS-5472:
-

+1 for the patch. I will commit it shortly.

 Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
 ---

 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5472_20131106.patch


 - DatanodeDescriptor should be initialized with updateHeartbeat for updating 
 the timestamps.
 - NNThroughputBenchmark should create DatanodeRegistrations with real 
 datanode UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5472) Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5472.
-

   Resolution: Fixed
Fix Version/s: Heterogeneous Storage (HDFS-2832)
 Hadoop Flags: Reviewed

Committed this to branch HDFS-2832. Thanks Nicholas.

 Fix TestDatanodeManager, TestSafeMode and TestNNThroughputBenchmark
 ---

 Key: HDFS-5472
 URL: https://issues.apache.org/jira/browse/HDFS-5472
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: Heterogeneous Storage (HDFS-2832)

 Attachments: h5472_20131106.patch


 - DatanodeDescriptor should be initialized with updateHeartbeat for updating 
 the timestamps.
 - NNThroughputBenchmark should create DatanodeRegistrations with real 
 datanode UUIDs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-07 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816184#comment-13816184
 ] 

Brandon Li commented on HDFS-5252:
--

Thank you, Jing, for the review. I've committed the patch.

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: H2832_20131107.patch

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
 h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, 
 h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, 
 h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816196#comment-13816196
 ] 

Hudson commented on HDFS-5252:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4700 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4700/])
HDFS-5252. Stable write is not handled correctly in someplace. Contributed by 
Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539740)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/request/READ3Request.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5252) Stable write is not handled correctly in someplace

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5252:
-

Fix Version/s: 2.2.1

 Stable write is not handled correctly in someplace
 --

 Key: HDFS-5252
 URL: https://issues.apache.org/jira/browse/HDFS-5252
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.2.1

 Attachments: HDFS-5252.001.patch, HDFS-5252.002.patch


 When the client asks for a stable write but the prerequisite writes are not 
 transferred to NFS gateway, the stableness can't be honored. NFS gateway has 
 to treat the write as unstable write and set the flag to UNSTABLE in the 
 write response.
 One bug was found during test with Ubuntu client when copying one 1KB file. 
 For small files like 1KB file, Ubuntu client does one stable write (with 
 FILE_SYNC flag). However, NFS gateway missed one place 
 where(OpenFileCtx#doSingleWrite) it sends response with the flag NOT updated 
 to UNSTABLE.
 With this bug, the client thinks the write is on disk and thus doesn't send 
 COMMIT anymore. The following test tries to read the data back and of course 
 fails to do so since the data was not synced. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5443:


Summary: Delete 0-sized block when deleting an under-construction file that 
is included in snapshot  (was: Namenode can stuck in safemode on restart if it 
crashes just after addblock logsync and after taking snapshot for such file.)

 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5443:


Description: 
Namenode can stuck in safemode on restart if it crashes just after addblock 
logsync and after taking snapshot for such file. This issue is reported by 
Prakash and Sathish.

On looking into the issue following things are happening.
.
1) Client added block at NN and just did logsync
   So, NN has block ID persisted.
2)Before returning addblock response to client take a snapshot for root or 
parent directories for that file
3) Delete parent directory for that file
4) Now crash the NN with out responding success to client for that addBlock call

Now on restart of the Namenode, it will stuck in safemode.


  was:
This issue is reported by Prakash and Sathish.

On looking into the issue following things are happening.
.
1) Client added block at NN and just did logsync
   So, NN has block ID persisted.
2)Before returning addblock response to client take a snapshot for root or 
parent directories for that file
3) Delete parent directory for that file
4) Now crash the NN with out responding success to client for that addBlock call

Now on restart of the Namenode, it will stuck in safemode.



 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file.
 

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816211#comment-13816211
 ] 

Andrew Wang commented on HDFS-5326:
---

bq. reordering methods

I think you missed one reordering in FSEditLog :)

bq. Let's do this as part of HDFS-5471 if it looks good... similarly with 
refactoring pc#checkPermission.

OK, I'll cross post my cleanup comments there.

+1 once addressed (and the Findbugs warning), thanks Colin.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-11-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816218#comment-13816218
 ] 

Jing Zhao commented on HDFS-5443:
-

bq. because INode is saved to snapshot while renaming itself. so updation will 
not happen during deletion.

Thanks for the comments Vinay! I still think our current rename implementation 
will not lead to this scenario. But let's continue this discussion in HDFS-5428 
and add possible fix there. I will commit the current patch shortly.

 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: sathish
 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to

2013-11-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816220#comment-13816220
 ] 

Andrew Wang commented on HDFS-5471:
---

Colin asked to bump some CacheManager cleanup work from HDFS-5326 to this JIRA, 
cross-posting:

* Add and modify aren't that different besides the difference in required, 
optional, and default fields. I just first validate all present fields in the 
directive, then enforce required fields, then fill in default values.
* Modify and remove have the same checks for an existing entry
* Add and modify have the same checks for an existing cache pool
* All three do write checks to a cache pool, moving this into 
FSPermissionChecker or a method was an easy savings
* success/fail logs are inconsistently formatted. I'd like something like e.g. 
methodName: successfully verb directive directive and methodName: failed 
to verb noun parameters:, e
{code}
  LOG.warn(addDirective  + directive + : failed, e);
LOG.info(addDirective  + directive + : succeeded.);
...
  LOG.warn(modifyDirective  + idString + : error, e);
LOG.info(modifyDirective  + idString + : applied  + directive);
...
  LOG.warn(removeDirective  + id +  failed, e);
LOG.info(removeDirective  + id + : removed);
{code}

 CacheAdmin -listPools fails when pools exist that user does not have 
 permissions to
 ---

 Key: HDFS-5471
 URL: https://issues.apache.org/jira/browse/HDFS-5471
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0
Reporter: Stephen Chu

 When a user does not have read permissions to a cache pool and executes hdfs 
 cacheadmin -listPools the command will error complaining about missing 
 required fields with something like:
 {code}
 [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
 Exception in thread main 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
 Message missing required fields: ownerName, groupName, mode, weight
   at 
 com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051)
   at 
 org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675)
   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85)
   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90)
 [schu@hdfs-nfs ~]$ 
 {code}
 In this example, the pool root has 750 permissions, and the root superuser 
 is able to successfully -listPools:
 {code}
 [root@hdfs-nfs ~]# hdfs cacheadmin -listPools
 Found 4 results.
 NAME  OWNER  GROUP  MODE   WEIGHT 
 bar   root   root   rwxr-xr-x  100
 foo   root   root   rwxr-xr-x  100
 root  root   root   rwxr-x---  100
 schu  root   root   rwxr-xr-x  100
 [root@hdfs-nfs ~]# 
 {code}
 When we modify the root pool to mode 755, schu user can now -listPools 
 successfully without error.
 {code}
 [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
 Found 4 results.
 NAME  OWNER  GROUP  MODE   WEIGHT 
 bar   root   root   rwxr-xr-x  100
 foo   root   root   rwxr-xr-x  100
 root  root   root   rwxr-xr-x  100
 schu  root   root   rwxr-xr-x  100
 [schu@hdfs-nfs ~]$ 
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5443:


   Resolution: Fixed
Fix Version/s: 2.3.0
 Assignee: Jing Zhao  (was: sathish)
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2.

 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816235#comment-13816235
 ] 

Hudson commented on HDFS-5443:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4701 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4701/])
HDFS-5443. Delete 0-sized block when deleting an under-construction file that 
is included in snapshot. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539754)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfoUnderConstruction.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFileUnderConstruction.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java


 Delete 0-sized block when deleting an under-construction file that is 
 included in snapshot
 --

 Key: HDFS-5443
 URL: https://issues.apache.org/jira/browse/HDFS-5443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 3.0.0, 2.2.0
Reporter: Uma Maheswara Rao G
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: 5443-test.patch, HDFS-5443.000.patch


 Namenode can stuck in safemode on restart if it crashes just after addblock 
 logsync and after taking snapshot for such file. This issue is reported by 
 Prakash and Sathish.
 On looking into the issue following things are happening.
 .
 1) Client added block at NN and just did logsync
So, NN has block ID persisted.
 2)Before returning addblock response to client take a snapshot for root or 
 parent directories for that file
 3) Delete parent directory for that file
 4) Now crash the NN with out responding success to client for that addBlock 
 call
 Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: (was: HDFS-5326.008.patch)

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.008.patch

fix findbugs warning

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Attachment: HDFS-5326.008.patch

complete reordering

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5471) CacheAdmin -listPools fails when pools exist that user does not have permissions to

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5471:
---

Assignee: Andrew Wang

 CacheAdmin -listPools fails when pools exist that user does not have 
 permissions to
 ---

 Key: HDFS-5471
 URL: https://issues.apache.org/jira/browse/HDFS-5471
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0
Reporter: Stephen Chu
Assignee: Andrew Wang

 When a user does not have read permissions to a cache pool and executes hdfs 
 cacheadmin -listPools the command will error complaining about missing 
 required fields with something like:
 {code}
 [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
 Exception in thread main 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
 Message missing required fields: ownerName, groupName, mode, weight
   at 
 com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ListCachePoolsResponseElementProto$Builder.build(ClientNamenodeProtocolProtos.java:51722)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCachePools(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2057)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2051)
   at 
 org.apache.hadoop.hdfs.tools.CacheAdmin$ListCachePoolsCommand.run(CacheAdmin.java:675)
   at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:85)
   at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:90)
 [schu@hdfs-nfs ~]$ 
 {code}
 In this example, the pool root has 750 permissions, and the root superuser 
 is able to successfully -listPools:
 {code}
 [root@hdfs-nfs ~]# hdfs cacheadmin -listPools
 Found 4 results.
 NAME  OWNER  GROUP  MODE   WEIGHT 
 bar   root   root   rwxr-xr-x  100
 foo   root   root   rwxr-xr-x  100
 root  root   root   rwxr-x---  100
 schu  root   root   rwxr-xr-x  100
 [root@hdfs-nfs ~]# 
 {code}
 When we modify the root pool to mode 755, schu user can now -listPools 
 successfully without error.
 {code}
 [schu@hdfs-nfs ~]$ hdfs cacheadmin -listPools
 Found 4 results.
 NAME  OWNER  GROUP  MODE   WEIGHT 
 bar   root   root   rwxr-xr-x  100
 foo   root   root   rwxr-xr-x  100
 root  root   root   rwxr-xr-x  100
 schu  root   root   rwxr-xr-x  100
 [schu@hdfs-nfs ~]$ 
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5475) NN should not allow more than one replica per storage

2013-11-07 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5475:
---

 Summary: NN should not allow more than one replica per storage
 Key: HDFS-5475
 URL: https://issues.apache.org/jira/browse/HDFS-5475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


NN chooses a provisional target storage when allocating a new block and records 
that block in the blockList of that storage. However the datanode is free to 
choose a different storage for the block. On the next block report the NN ends 
up with two blockList entries for the same replica+DN combination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5475:


Summary: NN incorrectly tracks more than one replica per DN  (was: NN 
should not allow more than one replica per storage)

 NN incorrectly tracks more than one replica per DN
 --

 Key: HDFS-5475
 URL: https://issues.apache.org/jira/browse/HDFS-5475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal

 NN chooses a provisional target storage when allocating a new block and 
 records that block in the blockList of that storage. However the datanode is 
 free to choose a different storage for the block. On the next block report 
 the NN ends up with two blockList entries for the same replica+DN combination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5468) CacheAdmin help command does not recognize commands

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816340#comment-13816340
 ] 

Colin Patrick McCabe commented on HDFS-5468:


+1.

The audit warning is bogus based on it not finding an apache release header on 
some pid files that were left over from a previous jenkins job

 CacheAdmin help command does not recognize commands
 ---

 Key: HDFS-5468
 URL: https://issues.apache.org/jira/browse/HDFS-5468
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.3.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Attachments: HDFS-5468.patch


 Currently, the hdfs cacheadmin -help command will not recognize correct 
 command inputs:
 {code}
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 {code}
 In the code, we strip the input command of leading hyphens, but then compare 
 it to the command names, which are all prefixed by a hyphen.
 Also, cacheadmin -removeDirectives requires specifying a path with -path but 
 -path is not shown in the usage. We should fix this as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5468) CacheAdmin help command does not recognize commands

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5468:
---

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

 CacheAdmin help command does not recognize commands
 ---

 Key: HDFS-5468
 URL: https://issues.apache.org/jira/browse/HDFS-5468
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.3.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5468.patch


 Currently, the hdfs cacheadmin -help command will not recognize correct 
 command inputs:
 {code}
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 {code}
 In the code, we strip the input command of leading hyphens, but then compare 
 it to the command names, which are all prefixed by a hyphen.
 Also, cacheadmin -removeDirectives requires specifying a path with -path but 
 -path is not shown in the usage. We should fix this as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5364:
-

Attachment: HDFS-5364.008.patch

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816367#comment-13816367
 ] 

Brandon Li commented on HDFS-5364:
--

{quote}2 and 3 are optimization of the eviction method. As we discussed 
offline, I will file a following up JIRA for that.{quote}
The new patch added the optimization of the eviction method. Also the scan() 
method is not holding the lock all the time. A unit test is added to test the 
scan() method.

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816378#comment-13816378
 ] 

Chris Nauroth commented on HDFS-5394:
-

I just tested with patch version 7, and the datanode didn't uncache previously 
cached blocks after receiving the DNA_CACHE message.  Debug logging shows that 
it's due to the following logic in {{FsDatasetCache#uncacheBlock}}.  I assume 
{{case CACHED}} should be doing the same as the {{default}} block and 
submitting an {{UncachingTask}}.

{code}
case CACHED:
  if (LOG.isDebugEnabled()) {
LOG.debug(Block with id  + blockId + , pool  + bpid +   +
does not need to be uncached, because it is  +
in state  + prevValue.state + .);
  }
  break;
{code}


 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5468) CacheAdmin help command does not recognize commands

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816384#comment-13816384
 ] 

Hudson commented on HDFS-5468:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4702 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4702/])
HDFS-5468. CacheAdmin help command does not recognize commands  (Stephen Chu 
via Colin Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539786)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testCacheAdminConf.xml


 CacheAdmin help command does not recognize commands
 ---

 Key: HDFS-5468
 URL: https://issues.apache.org/jira/browse/HDFS-5468
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 3.0.0, 2.3.0
Reporter: Stephen Chu
Assignee: Stephen Chu
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5468.patch


 Currently, the hdfs cacheadmin -help command will not recognize correct 
 command inputs:
 {code}
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 [hdfs@hdfs-cache ~]# hdfs cacheadmin -help -listPools
 Sorry, I don't know the command 'listPools'.
 Valid command names are:
 -addDirective, -removeDirective, -removeDirectives, -listDirectives, 
 -addPool, -modifyPool, -removePool, -listPools, -help
 {code}
 In the code, we strip the input command of leading hyphens, but then compare 
 it to the command names, which are all prefixed by a hyphen.
 Also, cacheadmin -removeDirectives requires specifying a path with -path but 
 -path is not shown in the usage. We should fix this as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816394#comment-13816394
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2832:
--

 With a billion nodes the probability of a collision in a 128-bit space is 
 less than 1 in 10^20. ...

Let n be the number of possible IDs.
Let m be the number of nodes.
The probability of no collision is P = n!/((n-m)! n^m).

Put n=2^128 and m=10^9, we have
* P ~= 0.853063206294150856

The probability of collision is
* 1-P ~= 1.4693679370584914464 * 10^(-21)  10^(-20).

However, randomly generated UUIDs only have 122 random bits accoring to 
[Wikipedia|http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates].
Now put n=2^122 and m=10^9, we have
* P ~= 0.9990596045202825654743

The probability of collision is
* 1-P ~= 9.403954797174345257 * 10^(-20)  10^(-19)

Similar result can be obtained using approximation P ~= exp(-m^2/(2*n)).


 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
 h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, 
 h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, 
 h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816397#comment-13816397
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2832:
--

 ... Even though unlikely, a collision if it happens creates a serious problem 
 for the system integrity.  Does it concern you?

It depends on how small the probability is - certainly not for 10^(-19).

- Below is quoted from 
[Wikipedia|http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates]
{quote}
To put these numbers into perspective, the annual risk of someone being hit by 
a meteorite is estimated to be one chance in 17 billion, which means the 
probability is about 0.006 (6 × 10^(−11)), equivalent to the odds of 
creating a few tens of trillions of UUIDs in a year and having one duplicate. 
In other words, only after generating 1 billion UUIDs every second for the next 
100 years, the probability of creating just one duplicate would be about 50%. 
The probability of one duplicate would be about 50% if every person on earth 
owns 600 million UUIDs.
{quote}

- I beg you have heard [risk of cosmic 
rays|http://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program]
 argurment.


 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
 h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, 
 h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, 
 h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816399#comment-13816399
 ] 

Hadoop QA commented on HDFS-5364:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612683/HDFS-5364.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5356//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5356//console

This message is automatically generated.

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816403#comment-13816403
 ] 

Colin Patrick McCabe commented on HDFS-5394:


Good catch.  That was a bug introduced by the latest round of shuffling 
everything around.  the default and CACHED cases were switched.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5394:
---

Attachment: HDFS-5394.008.patch

fix uncaching issue discovered by chris

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5364:
-

Attachment: HDFS-5364.009.patch

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816421#comment-13816421
 ] 

Jing Zhao commented on HDFS-5364:
-

Thanks for addressing all the comments, Brandon! The new patch looks good to 
me. +1 pending Jenkins.

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5376) Incremental rescanning of cached blocks and cache entries

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5376:
---

Issue Type: Wish  (was: Sub-task)
Parent: (was: HDFS-4949)

 Incremental rescanning of cached blocks and cache entries
 -

 Key: HDFS-5376
 URL: https://issues.apache.org/jira/browse/HDFS-5376
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: namenode
Affects Versions: HDFS-4949
Reporter: Andrew Wang
Assignee: Andrew Wang

 {{CacheReplicationMonitor#rescan}} is invoked whenever a new cache entry is 
 added or removed. This involves a complete rescan of all cache entries and 
 cached blocks, which is potentially expensive. It'd be better to do an 
 incremental scan instead. This would also let us incrementally re-scan on 
 namespace changes like rename and create for better caching latency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816457#comment-13816457
 ] 

Jing Zhao commented on HDFS-5428:
-

From HDFS-5443:
bq. Patch will not clear the blocks in this case.

So I checked the rename case. Looks like we have a bug there and we fail to 
clean the blocks for INodeFile/INodeFileUnderConstruction in some cases after 
rename. I will fix it in a new jira. 

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816455#comment-13816455
 ] 

Hadoop QA commented on HDFS-5326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612666/HDFS-5326.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5355//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5355//console

This message is automatically generated.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816474#comment-13816474
 ] 

Colin Patrick McCabe commented on HDFS-5326:


As described earlier, the test failure is just the fact that jenkins failed to 
apply the binary diff to the editsStored file.  Eclipse:eclipse has been 
failing today in several other jobs... it seems to be an environment issue.

Thanks for the +1.  Will commit shortly.

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816492#comment-13816492
 ] 

Andrew Wang commented on HDFS-5394:
---

Thanks for bumping Colin, basically just rollup in this review:

bq. Could this be written with value.state == State.CACHING_CANCELLED instead?

My point here was about the logic, since I did a find usages on 
CACHING_CANCELLED in Eclipse and only saw it being set. Right now it checks 
not CACHED which should be equivalent to is CACHING_CANCELLED because of 
the state transition invariants, and ideally with this kind of logic, we 
transition based on being *in* a state rather than *not being* in a state.

bq. I would rather not do that, since right now we can look at entries in the 
map and instantly know that anything in state UNCACHING has an associated 
Runnable scheduled in the Executor.

I guess this makes sense in light of HDFS-5182, since uncaching might require 
waiting for clients while cancelling caching shouldn't. In either case though, 
something needs to happen, it's just that instead of deferring the work to an 
UncachingTask, it's deferred to the end of the CachingTask.

bq. waitFor

Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten minutes :) 
Overkill.

bq. catching FileNotFoundException

This is better, thanks. As a general comment, I'd like to avoid relying on NN 
retries if possible, but I guess it's okay for now.

Test:
* Do we need that {{Preconditions}} check in {{setUp}}? There's already an 
assumeTrue for the same thing right above it, so I don't think it'll do 
anything.
* I'd like to see the {{LogVerificationAppender}} used in 
{{testUncachingBlocksBeforeCachingFinishes}} too. This seems like it might be 
flaky though. What was wrong with the old approach that used a barrier to force 
ordering?

Also need to run through the Jenkins stuff still. The javac warning is fine 
(the new usage of Unsafe to get the page size) but the rest needs to be touched 
up. Not sure about the test failure.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work stopped] (HDFS-5166) caching PB cleanups

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5166 stopped by Colin Patrick McCabe.

 caching PB cleanups
 ---

 Key: HDFS-5166
 URL: https://issues.apache.org/jira/browse/HDFS-5166
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Once we have a better idea of what we need in the RPCs, let's do some 
 protobuf cleanups on the caching RPCs.  We may want to factor some fields out 
 into a common type, for example.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5166) caching PB cleanups

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-5166.


Resolution: Duplicate

 caching PB cleanups
 ---

 Key: HDFS-5166
 URL: https://issues.apache.org/jira/browse/HDFS-5166
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Once we have a better idea of what we need in the RPCs, let's do some 
 protobuf cleanups on the caching RPCs.  We may want to factor some fields out 
 into a common type, for example.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5166) caching PB cleanups

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816496#comment-13816496
 ] 

Colin Patrick McCabe commented on HDFS-5166:


we did this as part of HDFS-5326

 caching PB cleanups
 ---

 Key: HDFS-5166
 URL: https://issues.apache.org/jira/browse/HDFS-5166
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Once we have a better idea of what we need in the RPCs, let's do some 
 protobuf cleanups on the caching RPCs.  We may want to factor some fields out 
 into a common type, for example.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5326) add modifyDirective to cacheAdmin

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5326:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to trunk

 add modifyDirective to cacheAdmin
 -

 Key: HDFS-5326
 URL: https://issues.apache.org/jira/browse/HDFS-5326
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5326.003.patch, HDFS-5326.004.patch, 
 HDFS-5326.006.patch, HDFS-5326.007.patch, HDFS-5326.008.patch


 We should add a way of modifying cache directives on the command-line, 
 similar to how modifyCachePool works.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work started] (HDFS-5166) caching PB cleanups

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5166 started by Colin Patrick McCabe.

 caching PB cleanups
 ---

 Key: HDFS-5166
 URL: https://issues.apache.org/jira/browse/HDFS-5166
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

 Once we have a better idea of what we need in the RPCs, let's do some 
 protobuf cleanups on the caching RPCs.  We may want to factor some fields out 
 into a common type, for example.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816735#comment-13816735
 ] 

Colin Patrick McCabe commented on HDFS-5394:


bq. CACHING_CANCELLED discussion

yeah, it does make more sense to explicitly check for the states we expect to 
be in, rather than having a catch-all.  I have changed this to use 
{{Precondition}} to assert that we are in the correct state, since that seemed 
more appropriate, and also to be clearer about needing to be in the {{CACHING}} 
or {{CACHING_CANCELLED}} state there.

bq. Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten 
minutes  Overkill.

Noted.  Reduced this to 10 minutes, which should be ample.

bq. Do we need that Preconditions check in setUp? There's already an assumeTrue 
for the same thing right above it, so I don't think it'll do anything.

No, it's a repeat of the previous one.  Removed.

bq. I'd like to see the LogVerificationAppender used in 
testUncachingBlocksBeforeCachingFinishes too. This seems like it might be flaky 
though. What was wrong with the old approach that used a barrier to force 
ordering?

The problem is we don't have a barrier in all the places we would need it.  
We'd need to know that the DN had received the DN_CACHE heartbeat response and 
initiated caching during the 3-second window it has to do so, in order to know 
that we would later see a log message about cancellation.  To check for the log 
message would be, as you guessed, flaky and we don't need another flaky test.

I'd like to keep a LogVerificationAppender for this test in mind as a future 
improvement, but still get this fix committed soon since HDFS-5366, HDFS-5320, 
HDFS-5451, and HDFS-5431 all depend on this patch to some extent.  Perhaps we 
can roll a test improvement for this into HDFS-5451, since that JIRA is all 
about debuggability and logging.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5394:
---

Attachment: HDFS-5394.009.patch

rebase on trunk

reduce test timeouts

add preconditions

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816750#comment-13816750
 ] 

Andrew Wang commented on HDFS-5451:
---

Cross-posting my comment from HDFS-5394 as a follow on for here:

bq. I'd like to see the LogVerificationAppender used in 
testUncachingBlocksBeforeCachingFinishes too. This seems like it might be flaky 
though. What was wrong with the old approach that used a barrier to force 
ordering?

 add more debugging for cache rescan
 ---

 Key: HDFS-5451
 URL: https://issues.apache.org/jira/browse/HDFS-5451
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang

 It would be nice to have message at DEBUG level that described all the 
 decisions we made for cache entries.  That way we could turn on this 
 debugging to get more information.  We should also store the number of bytes 
 each PBCE wanted, and the number of bytes it got, plus the number of inodes 
 it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816749#comment-13816749
 ] 

Andrew Wang commented on HDFS-5394:
---

+1 pending Jenkins, thanks Colin. I'll cross-post the LogVerificationAppender 
improvement ot HDFS-5451, agree we should get rolling on the rest.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816753#comment-13816753
 ] 

Hadoop QA commented on HDFS-5364:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612696/HDFS-5364.009.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5358//console

This message is automatically generated.

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5451) add more debugging for cache rescan

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816774#comment-13816774
 ] 

Colin Patrick McCabe commented on HDFS-5451:


one way to support using the {{LogVerificationAppender}} in 
{{testUncachingBlocksBeforeCachingFinishes}} would be to use {{Mockito}} to 
detect when we had started caching on the DN, and only have the test proceed 
after that.

 add more debugging for cache rescan
 ---

 Key: HDFS-5451
 URL: https://issues.apache.org/jira/browse/HDFS-5451
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Andrew Wang

 It would be nice to have message at DEBUG level that described all the 
 decisions we made for cache entries.  That way we could turn on this 
 debugging to get more information.  We should also store the number of bytes 
 each PBCE wanted, and the number of bytes it got, plus the number of inodes 
 it got, and output those in {{listDirectives}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5366) recaching improvements

2013-11-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816802#comment-13816802
 ] 

Chris Nauroth commented on HDFS-5366:
-

I tested this patch and found that blocks were never uncaching.  The NameNode 
never sent DNA_UNCACHE messages to the DataNode.  The reason is that there are 
separate calls to {{DatanodeManager#getCacheCommand}} to get the DNA_CACHE set 
followed by the DNA_UNCACHE set.  The method internally resets the last message 
time for the DataNode.  This means that when it's time to send messages, the 
first call for the DNA_CACHE messages succeeds and resets the clock for that 
DataNode to right now.  Then, the second call for the DNA_UNCACHE messages 
always returns null, because it looks like it's not time to send messages.

To solve this, we need to set the DataNode's last caching directive sent time 
just once, after calculating both the DNA_CACHE and DNA_UNCACHE commands.  I 
changed the code as follows to do this.  Feel free to incorporate it into the 
next patch.  (I'm not uploading a new patch right now, because I don't want to 
detangle it out of the HDFS-5394 patch applied in my environment.)

In {{DatanodeManager#handleHeartbeat}}:

{code}
long monoTimeMs = Time.monotonicNow();
if (sendCachingCommands) {
  if ((monoTimeMs - nodeinfo.getLastCachingDirectiveSentTimeMs()) =
  timeBetweenResendingCachingDirectivesMs) {
DatanodeCommand pendingCacheCommand = getCacheCommand(
nodeinfo.getPendingCached(), nodeinfo,
DatanodeProtocol.DNA_CACHE, blockPoolId);
if (pendingCacheCommand != null) {
  cmds.add(pendingCacheCommand);
}
DatanodeCommand pendingUncacheCommand = getCacheCommand(
nodeinfo.getPendingUncached(), nodeinfo,
DatanodeProtocol.DNA_UNCACHE, blockPoolId);
if (pendingUncacheCommand != null) {
  cmds.add(pendingUncacheCommand);
}
nodeinfo.setLastCachingDirectiveSentTimeMs(monoTimeMs);
  }
}
{code}

And {{DatanodeManager#getCacheCommand}}:

{code}

  /**
   * Convert a CachedBlockList into a DatanodeCommand with a list of blocks.
   *
   * @param list   The {@link CachedBlocksList}.  This function 
   *   clears the list.
   * @param datanode   The datanode.
   * @param action The action to perform in the command.
   * @param poolId The block pool id.
   * @return   A DatanodeCommand to be sent back to the DN, or null if
   *   there is nothing to be done.
   */
  private DatanodeCommand getCacheCommand(CachedBlocksList list,
  DatanodeDescriptor datanode, int action, String poolId) {
int length = list.size();
if (length == 0) {
  return null;
}
// Read and clear the existing cache commands.
long[] blockIds = new long[length];
int i = 0;
for (IteratorCachedBlock iter = list.iterator();
iter.hasNext(); ) {
  CachedBlock cachedBlock = iter.next();
  blockIds[i++] = cachedBlock.getBlockId();
  iter.remove();
}
return new BlockIdCommand(action, poolId, blockIds);
  }
{code}

I re-tested with these changes, and it worked.


 recaching improvements
 --

 Key: HDFS-5366
 URL: https://issues.apache.org/jira/browse/HDFS-5366
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5366-caching.001.patch


 There are a few things about our HDFS-4949 recaching strategy that could be 
 improved.
 * We should monitor the DN's maximum and current mlock'ed memory consumption 
 levels, so that we don't ask the DN to do stuff it can't.
 * We should not try to initiate caching on stale or decomissioning DataNodes 
 (although we should not recache things stored on such nodes until they're 
 declared dead).
 * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few 
 times before giving up.  Currently, we only send it once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-5475:


Attachment: h5475.01.patch

Patch to update target storage in {{BlockInfo}} and 
{{BlockInfoUnderConstruction}} from block reports. This fixes {{TestGetBlocks}}.

 NN incorrectly tracks more than one replica per DN
 --

 Key: HDFS-5475
 URL: https://issues.apache.org/jira/browse/HDFS-5475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: h5475.01.patch


 NN chooses a provisional target storage when allocating a new block and 
 records that block in the blockList of that storage. However the datanode is 
 free to choose a different storage for the block. On the next block report 
 the NN ends up with two blockList entries for the same replica+DN combination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5366) recaching improvements

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816843#comment-13816843
 ] 

Colin Patrick McCabe commented on HDFS-5366:


good find, Chris.  We definitely should update the 
{{lastCachingDirectiveSentTimeMs}} just once in that function.  As you said, 
I'm waiting for 5394 to land before rebasing this.  I kicked the jenkins build, 
but it's still pending.

 recaching improvements
 --

 Key: HDFS-5366
 URL: https://issues.apache.org/jira/browse/HDFS-5366
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-4949
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5366-caching.001.patch


 There are a few things about our HDFS-4949 recaching strategy that could be 
 improved.
 * We should monitor the DN's maximum and current mlock'ed memory consumption 
 levels, so that we don't ask the DN to do stuff it can't.
 * We should not try to initiate caching on stale or decomissioning DataNodes 
 (although we should not recache things stored on such nodes until they're 
 declared dead).
 * We might want to resend the {{DNA_CACHE}} or {{DNA_UNCACHE}} command a few 
 times before giving up.  Currently, we only send it once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5475) NN incorrectly tracks more than one replica per DN

2013-11-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5475:
-

Hadoop Flags: Reviewed

+1 patch looks good.

 NN incorrectly tracks more than one replica per DN
 --

 Key: HDFS-5475
 URL: https://issues.apache.org/jira/browse/HDFS-5475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: h5475.01.patch


 NN chooses a provisional target storage when allocating a new block and 
 records that block in the blockList of that storage. However the datanode is 
 free to choose a different storage for the block. On the next block report 
 the NN ends up with two blockList entries for the same replica+DN combination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5475) NN incorrectly tracks more than one replica per DN

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5475.
-

   Resolution: Fixed
Fix Version/s: Heterogeneous Storage (HDFS-2832)

Thanks for the quick review Nicholas!

I committed it to branch HDFS-2832.

 NN incorrectly tracks more than one replica per DN
 --

 Key: HDFS-5475
 URL: https://issues.apache.org/jira/browse/HDFS-5475
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: Heterogeneous Storage (HDFS-2832)

 Attachments: h5475.01.patch


 NN chooses a provisional target storage when allocating a new block and 
 records that block in the blockList of that storage. However the datanode is 
 free to choose a different storage for the block. On the next block report 
 the NN ends up with two blockList entries for the same replica+DN combination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-07 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5476:
---

 Summary: Snapshot: clean the blocks/files/directories under a 
renamed file/directory while deletion
 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao


Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
under the DstReference node for file/directory/snapshot deletion.

Use case 1:
# rename under-construction file with 0-sized blocks after snapshot.
# delete the renamed directory.

We need to make sure we delete the 0-sized block.

Use case 2:
# create snapshot s0 for /
# create a new file under /foo/bar/
# rename foo -- foo2
# create snapshot s1
# delete bar and foo2
# delete snapshot s1

We need to make sure we delete the file under /foo/bar since it is not included 
in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5476:


Status: Patch Available  (was: Open)

 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-07 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5476:


Attachment: HDFS-5476.001.patch

Upload the initial patch including two unit tests to reproduce the two use 
cases described above.

 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

2013-11-07 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816882#comment-13816882
 ] 

Jing Zhao commented on HDFS-5428:
-

Created HDFS-5476 for the rename issue.

 under construction files deletion after snapshot+checkpoint+nn restart leads 
 nn safemode
 

 Key: HDFS-5428
 URL: https://issues.apache.org/jira/browse/HDFS-5428
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
 HDFS-5428.001.patch, HDFS-5428.patch


 1. allow snapshots under dir /foo
 2. create a file /foo/test/bar and start writing to it
 3. create a snapshot s1 under /foo after block is allocated and some data has 
 been written to it
 4. Delete the directory /foo/test
 5. wait till checkpoint or do saveNameSpace
 6. restart NN.
 NN enters to safemode.
 Analysis:
 Snapshot nodes loaded from fsimage are always complete and all blocks will be 
 in COMPLETE state. 
 So when the Datanode reports RBW blocks those will not be updated in 
 blocksmap.
 Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5477) Block manager as a service

2013-11-07 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-5477:
-

 Summary: Block manager as a service
 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The block manager needs to evolve towards having the ability to run as a 
standalone service to improve NN vertical and horizontal scalability.  The goal 
is reducing the memory footprint of the NN proper to support larger namespaces, 
and improve overall performance by decoupling the block manager from the 
namespace and its lock.  Ideally, a distinct BM will be transparent to clients 
and DNs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5477) Block manager as a service

2013-11-07 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816886#comment-13816886
 ] 

Daryn Sharp commented on HDFS-5477:
---

This a joint effort between researchers and coworkers.  I will post preliminary 
design docs soon.

 Block manager as a service
 --

 Key: HDFS-5477
 URL: https://issues.apache.org/jira/browse/HDFS-5477
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 The block manager needs to evolve towards having the ability to run as a 
 standalone service to improve NN vertical and horizontal scalability.  The 
 goal is reducing the memory footprint of the NN proper to support larger 
 namespaces, and improve overall performance by decoupling the block manager 
 from the namespace and its lock.  Ideally, a distinct BM will be transparent 
 to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816897#comment-13816897
 ] 

Hadoop QA commented on HDFS-5394:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612725/HDFS-5394.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1545 javac 
compiler warnings (more than the trunk's current 1544 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5357//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5357//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5357//console

This message is automatically generated.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-11-07 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: h2832_20131107b.patch

 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Attachments: 20130813-HeterogeneousStorage.pdf, H2832_20131107.patch, 
 h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, 
 h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, 
 h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, 
 h2832_20131107b.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5364:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.2.1

 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816912#comment-13816912
 ] 

Brandon Li commented on HDFS-5364:
--

Thank you, Jing. I've committed the patch.

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.2.1

 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5364) Add OpenFileCtx cache

2013-11-07 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5364:
-

Fix Version/s: 2.2.1

 Add OpenFileCtx cache
 -

 Key: HDFS-5364
 URL: https://issues.apache.org/jira/browse/HDFS-5364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: nfs
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.2.1

 Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch, 
 HDFS-5364.003.patch, HDFS-5364.004.patch, HDFS-5364.005.patch, 
 HDFS-5364.006.patch, HDFS-5364.007.patch, HDFS-5364.008.patch, 
 HDFS-5364.009.patch


 NFS gateway can run out of memory when the stream timeout is set to a 
 relatively long period(e.g., 1 minute) and user uploads thousands of files 
 in parallel.  Each stream DFSClient creates a DataStreamer thread, and will 
 eventually run out of memory by creating too many threads.
 NFS gateway should have a OpenFileCtx cache to limit the total opened files. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816945#comment-13816945
 ] 

Colin Patrick McCabe commented on HDFS-5394:


Thanks for the +1, will commit shortly.

As mentioned earlier, the javac and the javadoc warning are about the use of 
{{sun.misc.Unsafe}}, which we can't really avoid here.  I will increase 
{{OK_JAVADOC_WARNINGS}} in {{test-patch.sh}} to prevent warning spew.  The 
javac warning will be ignored automatically on the next build after submission.

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5394:
---

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816954#comment-13816954
 ] 

Hadoop QA commented on HDFS-5476:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12612754/HDFS-5476.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5359//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5359//console

This message is automatically generated.

 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

2013-11-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816964#comment-13816964
 ] 

Hudson commented on HDFS-5394:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4704 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4704/])
HDFS-5394: Fix race conditions in DN caching and uncaching (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1539909)
* /hadoop/common/trunk/dev-support/test-patch.sh
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmap.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java


 fix race conditions in DN caching and uncaching
 ---

 Key: HDFS-5394
 URL: https://issues.apache.org/jira/browse/HDFS-5394
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5394-caching.001.patch, 
 HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
 HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
 HDFS-5394.007.patch, HDFS-5394.008.patch, HDFS-5394.009.patch


 The DN needs to handle situations where it is asked to cache the same replica 
 more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
 also needs to handle the situation where caching a replica is cancelled 
 before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5478) File size reports as zero after writing and calling FSDataOutputStream#hsync()

2013-11-07 Thread Brett Randall (JIRA)
Brett Randall created HDFS-5478:
---

 Summary: File size reports as zero after writing and calling 
FSDataOutputStream#hsync()
 Key: HDFS-5478
 URL: https://issues.apache.org/jira/browse/HDFS-5478
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
 Environment: RHEL/OEL 6u3
Reporter: Brett Randall


Using a Java client to write to a FSDataOutputStream.  After some data is 
written and hsync() is called, {{hdfs dfs -get /path/to/file}} gets a file 
containing the data written so-far, all good.

{{hdfs dfs -ls /path/to/file}} however reports a zero-byte file, presumably 
until the stream is closed (it then shows the correct size).  Hue File Browser 
(running CDH4) also shows zero bytes until the stream is closed.

See also 
http://grokbase.com/t/hadoop/hdfs-user/113j63nrce/zero-file-size-after-hsync 
which discusses the same problem.

After the buffer is flushed it would be good if the reported file size was 
updated.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion

2013-11-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5476:
-

Hadoop Flags: Reviewed

+1 patch looks good.

 Snapshot: clean the blocks/files/directories under a renamed file/directory 
 while deletion
 --

 Key: HDFS-5476
 URL: https://issues.apache.org/jira/browse/HDFS-5476
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5476.001.patch


 Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree 
 under the DstReference node for file/directory/snapshot deletion.
 Use case 1:
 # rename under-construction file with 0-sized blocks after snapshot.
 # delete the renamed directory.
 We need to make sure we delete the 0-sized block.
 Use case 2:
 # create snapshot s0 for /
 # create a new file under /foo/bar/
 # rename foo -- foo2
 # create snapshot s1
 # delete bar and foo2
 # delete snapshot s1
 We need to make sure we delete the file under /foo/bar since it is not 
 included in snapshot s0.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5479) Fix test failures in Balancer.

2013-11-07 Thread Junping Du (JIRA)
Junping Du created HDFS-5479:


 Summary: Fix test failures in Balancer.
 Key: HDFS-5479
 URL: https://issues.apache.org/jira/browse/HDFS-5479
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Affects Versions: Heterogeneous Storage (HDFS-2832)
Reporter: Junping Du


Many tests failures w.r.t balancer as 
https://builds.apache.org/job/PreCommit-HDFS-Build/5360/#showFailuresLink 
shows. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)