[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Jeremy Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532457#comment-13532457
 ] 

Jeremy Carroll commented on HDFS-3912:
--

FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for 
branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, 
etc..

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Jeremy Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532482#comment-13532482
 ] 

Jeremy Carroll commented on HDFS-3912:
--

Basically this patch requires HDFS-3601 (Version 3.0). So there is no Branch 
2.0 patch on the ticket.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532488#comment-13532488
 ] 

nkeywal commented on HDFS-3912:
---

Are you sure? It's committed in branch-1?

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532493#comment-13532493
 ] 

Harsh J commented on HDFS-3912:
---

bq. FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for 
branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, 
etc..

The diff may be dependent on the JIRA you mention, but perhaps not the patch 
itself. We merged the trunk commit directly into branch-2, as 
viewable/downloadable here: view at 
http://svn.apache.org/viewvc?view=revisionrevision=1397219 and download at 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java?revision=1397219view=co

If you use git locally, you can also add a remote and cherry-pick it out I 
guess.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-12-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532495#comment-13532495
 ] 

Harsh J commented on HDFS-3912:
---

bq. Are you sure? It's committed in branch-1?

Yes, branch-1 has this as a backport commit, whose different patch is attached 
as well.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 1.2.0, 2.0.3-alpha

 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch, HDFS-3912.branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477166#comment-13477166
 ] 

Suresh Srinivas commented on HDFS-3912:
---

Jing, branch-1 patch does not cleanly apply. Can you please upload a new patch.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477403#comment-13477403
 ] 

Hadoop QA commented on HDFS-3912:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12549392/HDFS-3912-branch-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3348//console

This message is automatically generated.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-16 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477534#comment-13477534
 ] 

Suresh Srinivas commented on HDFS-3912:
---

Typo in the patch: ecxludedNodes. This also needs to be fixed in trunk.

With that change +1 for the patch.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch, 
 HDFS-3912-branch-1.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474976#comment-13474976
 ] 

Hudson commented on HDFS-3912:
--

Integrated in Hadoop-Hdfs-trunk #1193 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1193/])
HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing 
Zhao (Revision 1397211)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1397211
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475002#comment-13475002
 ] 

Hudson commented on HDFS-3912:
--

Integrated in Hadoop-Mapreduce-trunk #1224 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1224/])
HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing 
Zhao (Revision 1397211)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1397211
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473910#comment-13473910
 ] 

nkeywal commented on HDFS-3912:
---

@Suresh, @Jing, Thanks a lot for doing the backport! I'm giving it a try today. 
With the jet lag, if everything goes well you will have the result when you 
wake up :-)

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474272#comment-13474272
 ] 

nkeywal commented on HDFS-3912:
---

Very good news: It works as expected :-). I don't have anymore write or read 
errors / timeouts during the HBase recovery. So we can now have a mttr under 
the minute in HBase.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474392#comment-13474392
 ] 

Hudson commented on HDFS-3912:
--

Integrated in Hadoop-Hdfs-trunk-Commit #2909 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2909/])
HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing 
Zhao (Revision 1397211)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1397211
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474393#comment-13474393
 ] 

Hudson commented on HDFS-3912:
--

Integrated in Hadoop-Common-trunk-Commit #2847 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2847/])
HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing 
Zhao (Revision 1397211)

 Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1397211
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474415#comment-13474415
 ] 

Suresh Srinivas commented on HDFS-3912:
---

I committed the patch to trunk and branch-2. I will review the branch-1 patch 
soon.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474443#comment-13474443
 ] 

Hudson commented on HDFS-3912:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #2872 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2872/])
HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing 
Zhao (Revision 1397211)

 Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1397211
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473246#comment-13473246
 ] 

nkeywal commented on HDFS-3912:
---

Hi,

I'm ok with the new logic for the warning, 3 times the heartbeat is a quite 
common rule. I would like to test the patch on branch 1.1, it defers quite a 
lot from branch 3.0 regarding to block placement policy and so on. Jing, do you 
want to do the port? If you don't have time I will do it. I've already tested 
HBase trunk with branch 1.1 without the patch, it works (but with write errors 
as it is sent on the dead box).

Thanks!

Nicolas

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473459#comment-13473459
 ] 

Jing Zhao commented on HDFS-3912:
-

Hi Nicolas, 
   I will work on the branch 1.1 patch. Hopefully I can upload the patch today 
or tomorrow.
Thanks,
-Jing

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473771#comment-13473771
 ] 

Jing Zhao commented on HDFS-3912:
-

For the 1.1 patch, I've run local tests and all the testcases passed.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-10 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13473858#comment-13473858
 ] 

Suresh Srinivas commented on HDFS-3912:
---

Nicolas, when you get some time can you please give 1.x version of the patch a 
go?

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch, 
 HDFS-3912.009.patch, HDFS-3912-010.patch, HDFS-3912-branch-1.1-001.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470080#comment-13470080
 ] 

Suresh Srinivas commented on HDFS-3912:
---

Patch looks good. Nicolas, I will wait for HBase validation and your +1 to 
commit this patch.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470211#comment-13470211
 ] 

Hadoop QA commented on HDFS-3912:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12547883/HDFS-3912.006.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3271//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3271//console

This message is automatically generated.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470412#comment-13470412
 ] 

nkeywal commented on HDFS-3912:
---

It will try the patch on HBase 0.96 next week (hopefully).
I had a look at the patch, it seems ok to me. Only point is this one:
{code}
+  LOG.warn(The given interval for marking stale datanode = 
+  + staleInterval + , which is smaller than the default value 
+  + DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT
+  + .);
{code}

I think we should not have a warning if we're below the default, because:
- usually the default are just the most common harmless setting, i.e. it's 
should be possible to go below it without being in danger.
- a reasonable setting for HBase would be around 20s (so less than the hdfs 
default), to be sure that the datanode is not used when we start the HBase 
recovery. So when used with HBase we will have a warning when using the 
recommended setting.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470486#comment-13470486
 ] 

Suresh Srinivas commented on HDFS-3912:
---

bq. I think we should not have a warning if we're below the default, because:
I think we should have a warning because some one could set this to way smaller 
value than what even HBase could be setup with. That said, we could print the 
warning if the stale period is, say, 3 times the hearbeat period. Also we need 
to also document in the hdfs-default.xml the pros and cons of the stale period 
choices.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470545#comment-13470545
 ] 

Hadoop QA commented on HDFS-3912:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12548005/HDFS-3912.007.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3273//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3273//console

This message is automatically generated.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470615#comment-13470615
 ] 

Hadoop QA commented on HDFS-3912:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12548007/HDFS-3912.008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.TestPersistBlocks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3274//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3274//console

This message is automatically generated.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch, 
 HDFS-3912.006.patch, HDFS-3912.007.patch, HDFS-3912.008.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-03 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468535#comment-13468535
 ] 

nkeywal commented on HDFS-3912:
---

I haven't kept the log files, but it was in the small test categories (the ones 
executed first when you do a mvn test in HBase).

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-03 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468542#comment-13468542
 ] 

nkeywal commented on HDFS-3912:
---

Actually it seems it's HBASE-6928; so it's not related to the branch-1.1, I've 
just been unlucky when I've compared the test results on branch 1.1 vs. branch 
1.0...

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-02 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467864#comment-13467864
 ] 

nkeywal commented on HDFS-3912:
---

I like this approach, it's deterministic.
I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were not 
working with this branch. I was lacking time to understand why, but I will have 
a look again later (hopefully it will get fixed by just waiting...)

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468015#comment-13468015
 ] 

Suresh Srinivas commented on HDFS-3912:
---

nicholas, did you mean to assign this to yourself?

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-02 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468045#comment-13468045
 ] 

Suresh Srinivas commented on HDFS-3912:
---

# Remove HeartbeatManager#checkStaleNodes and use 
DatanodeManager#checkStaleNodes instead
# What happens when ratio is configured invalid?
# when calculating the ration in HeatbeatManager, you are accessing 
datanodes.size() outside synchronization block.
# Can we introduce a method in FSClusterStats to provide the cluster state of 
whether it is avoiding writes to stale nodes and avoid having to add 
DatanodeManager into BlockPlacementPolicy. This way, customer placemet policy 
implementations are not affected.
# I think we should create a separte jira to move some relevant methods such as 
getLiveNodes, stale nodes etc into DatanodeStatics interface.
# We should also add metrics related to stale datanodes.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-02 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468131#comment-13468131
 ] 

nkeywal commented on HDFS-3912:
---

@Sureshaaa
I was echoing my message from the 21th: I had issues (not yet analyzed) with 
branch 1.1 on HBase, but I definitively want to try Jing's patch, so I will 
give it another try later.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-10-02 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13468207#comment-13468207
 ] 

Devaraj Das commented on HDFS-3912:
---

bq. I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were 
not working with this branch. I was lacking time to understand why, but I will 
have a look again later (hopefully it will get fixed by just waiting...)

Hey Nicolas, can you please enumerate the failing tests?

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: nkeywal
 Attachments: HDFS-3912.001.patch, HDFS-3912.002.patch, 
 HDFS-3912.003.patch, HDFS-3912.004.patch, HDFS-3912.005.patch


 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460811#comment-13460811
 ] 

nkeywal commented on HDFS-3912:
---

Hi Jing,

Are you working on it currently? I would like to try HDFS-3703 branch 1.1 on 
HBase, but I need as well the write path: without most of the time is spent on 
the write errors...

Thanks!

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460830#comment-13460830
 ] 

Jing Zhao commented on HDFS-3912:
-

Hi Nicalos, I'm currently working on this. Will post something today.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-21 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460833#comment-13460833
 ] 

nkeywal commented on HDFS-3912:
---

Great! So I will test it beginning of next week then. Thanks a lot Jing.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-21 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460910#comment-13460910
 ] 

Jing Zhao commented on HDFS-3912:
-

Nicalos:

So based on your prior comments, we rethink the strategy that dynamically 
changes the stale interval for writing. One problem for this strategy is that 
after a datanode is marked as stale, since the stale interval may increase as a 
result of the increase of the number of the stale datanodes, the same datanode 
may be marked as healthy (i.e., non-stale) at once. 

In the current solution, we try to provide a simpler solution. The stale 
interval now is a fixed value after loading from the configuration. For read, 
the strategy is the same with HDFS-3703. And for write, we add a switch flag 
(only for write) so that when certain proportion of datanodes are marked as 
stale, the stale datanodes can also be included as writing targets. Users can 
specify this proportion through configuration. For example, if the proportion 
is set to 0.5 in the beginning, when more than half of the datanodes have been 
marked as stale in the cluster, we stop avoiding stale nodes for writing. And 
when some of the datanodes come back, we continue avoiding stale nodes for 
writing.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-13 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13455262#comment-13455262
 ] 

nkeywal commented on HDFS-3912:
---

Some thinking, with an HBase bias:
- if the datanode is too busy and cannot heartbeat in a minute, we will also 
get timeouts when writing the blocks (if the datanode is dead: 20s connect 
timeout. If it's not dead, or if we had previously a connection, we will fail 
on the read timeout for the ack, it's around 1 minute by default).
- the recovery is on the critical path, so going to a suspicious node is not 
something you want to do.
- things are already quite complicated, so I think I would end up with the same 
value for read  write to keep them simple.

Then there is the case when many nodes are staled. I think we're in a really 
bad shape at this stage... I feel that just throwing an exception is the best 
solution. HBase would wait a few seconds and retry. That's better for the 
cluster than trying a node that is unlikely to execute the write. But it's a 
kind of change vs. today's behavior.

To synthesis, this could make sense imho:
- there are enough fully alive nodes: let's use them, whatever the number of 
stale nodes.
- there are not enough fully alive nodes, but there are some stale nodes that 
we could use: let's use the stale nodes them, at least the behavior will be 
backward compatible.
- there are not enough live node: as today.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453551#comment-13453551
 ] 

Jing Zhao commented on HDFS-3912:
-

Suresh's comments in HDFS-3703:
bq. However for the write site, not picking the stale node could result in an 
issue, especially for small clusters. That is the reason why I think we should 
do the write side changes in a related jira. We should consider making stale 
timeout adaptive to the number of nodes marked stale in the cluster as 
discussed in the previous comments. Additionally we should consider having a 
separate configuration for write skipping the stale nodes.

The more detailed proposal for handling write is: 
For writes do not use stale datanodes (if possible). To avoid the scenario 
where a small T for judging stale state may generate new hotspots on cluster, T 
is proposed to be calculated as: 
T = t_c + (number of nodes already marked as stale) / (total number of nodes) * 
(T_d - t_c),
where t_c is a constant value initially set in the configuration, and T_d is 
the time for marking as dead (i.e., 10.5 min).

E.g., t_c can be set as 30s, then when there is no or few nodes marked as 
stale, we can have a small T to satisfy the HBase requirement. In case that 
there are large number nodes marked as stale, e.g., near the total number of 
nodes, T will be almost T_d (i.e., ~10min), and the workload can still be 
distributed to all the nodes alive.

When almost all nodes are marked as stale, include stale nodes as writing 
target candidates when the number of remaining normal alive nodes is less than 
the replica number.


 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3912) Detecting and avoiding stale datanodes for writing

2012-09-11 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453552#comment-13453552
 ] 

Jing Zhao commented on HDFS-3912:
-

Move and summarize part of the comments from HDFS-3703 here to highlight the 
existing thoughts on writing part.

 Detecting and avoiding stale datanodes for writing
 --

 Key: HDFS-3912
 URL: https://issues.apache.org/jira/browse/HDFS-3912
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 1. Make stale timeout adaptive to the number of nodes marked stale in the 
 cluster.
 2. Consider having a separate configuration for write skipping the stale 
 nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira