subject:"\[jira\] \[Commented\] \(HDFS\-7097\) Allow block reports to be processed during checkpointing on standby name node"

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226067#comment-14226067
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #17 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/17/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226087#comment-14226087
 ] 

Hudson commented on HDFS-7097:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #755 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/755/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226238#comment-14226238
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1945 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1945/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226248#comment-14226248
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #17 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/17/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226302#comment-14226302
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1969 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1969/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226319#comment-14226319
 ] 

Hudson commented on HDFS-7097:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #17 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/17/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Akira AJISAKA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226450#comment-14226450
]

Akira AJISAKA commented on HDFS-7097:
-

The patch was committed to branch-2 and trunk. Updating the status.

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Fix For: 2.7.0

Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch,
HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch

On a reasonably busy HDFS cluster, there are stream of creates, causing data
nodes to generate incremental block reports. When a standby name node is
checkpointing, RPC handler threads trying to process a full or incremental
block report is blocked on the name system's {{fsLock}}, because the
checkpointer acquires the read lock on it. This can create a serious problem
if the size of name space is big and checkpointing takes a long time.
All available RPC handlers can be tied up very quickly. If you have 100
handlers, it only takes 34 file creates. If a separate service RPC port is
not used, HA transition will have to wait in the call queue for minutes. Even
if a separate service RPC port is configured, hearbeats from datanodes will
be blocked. A standby NN with a big name space can lose all data nodes after
checkpointing. The rpc calls will also be retransmitted by data nodes many
times, filling up the call queue and potentially causing listen queue
overflow.
Since block reports are not modifying any state that is being saved to
fsimage, I propose letting them through during checkpointing.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-25 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225393#comment-14225393
]

Andrew Wang commented on HDFS-7097:
---

+1 LGTM as well, thanks Kihwal for the patch, ATM, Vinay, and Ming for
reviewing. I'll commit this shortly.

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch,
HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225424#comment-14225424
 ] 

Hudson commented on HDFS-7097:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6605/])
HDFS-7097. Allow block reports to be processed during checkpointing on standby 
name node. (kihwal via wang) (wang: rev 
f43a20c529ac3f104add95b222de6580757b3763)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyCheckpoints.java


 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225476#comment-14225476
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683658/HDFS-7097.ultimate.trunk.patch
  against trunk revision 78f7cdb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestEnhancedByteBufferAccess

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8835//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8835//console

This message is automatically generated.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-24 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223893#comment-14223893
]

Andrew Wang commented on HDFS-7097:
---

This patch has unfortunately gone a bit stale due to the RetryCache refactor.
Kihwal, do you mind rebasing? I promise to +1/commit in a timely fashion :)

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-20 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220234#comment-14220234
]

Aaron T. Myers commented on HDFS-7097:
--

Sorry for the delay, Kihwal. Yes, the latest patch looks good to me.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-31 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191884#comment-14191884
]

Kihwal Lee commented on HDFS-7097:
--

[~atm], are you okay with the latest patch?

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188409#comment-14188409
]

Kihwal Lee commented on HDFS-7097:
--

Thanks for the review, Aaron.
The checkpoint lock is only meaningful on standby namenode. Some of the
namenode methods that can be called on both active and standby do acquire the
lock, but the locking has no correctness implications on active namenode. On
active namenode, the regular namesystem lock (fsLock) takes care of everything.

1. {{FSNamesystem#rollEditLog}} : This is only allowed on active namenodes.
2. {{FSNamesystem#(start|end)Checkpoint}} : This is only for backup node.
3. I am renaming it to {{cpLock}}.
4. I believe [~daryn] is eager to do that once this is in. :)

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188700#comment-14188700
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677893/HDFS-7097.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestPread

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8580//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8580//console

This message is automatically generated.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188946#comment-14188946
]

Kihwal Lee commented on HDFS-7097:
--

The test case failed because : {{java.net.BindException: Port in use:
localhost:40123}} We are getting this sort of failures more often nowadays
from precommit.
Both test cases pass when run on my machine.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-09 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165215#comment-14165215
]

Kihwal Lee commented on HDFS-7097:
--

I am not sure about calling triggerBlockReports(). The block finalization makes
the IBR to be sent out right away, so this extra call will actually do nothing.
It's a bug, if NNs are not getting it. But your concern is valid. Depending
on the testing environment, things can get slow down and the delay of 1 second
may not be enough. I can make it periodically check for a longer period of
time. The test will terminate sooner when it succeeds, but will take an extra
time when fails.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-09 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165342#comment-14165342
]

Ming Ma commented on HDFS-7097:
---

Thanks, Kihwal. LGTM.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-09 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165527#comment-14165527
]

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12673911/HDFS-7097.patch
against trunk revision 8d7c549.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:red}-1 javac{color}. The applied patch generated 1272 javac
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}. The applied patch generated 1
release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
org.apache.hadoop.hdfs.TestDecommission
org.apache.hadoop.hdfs.server.balancer.TestBalancer

The following test timeouts occurred in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOTests
org.apache.hadoop.hdfs.Tests
org.apache.hadoop.hdfs.server.blockmanagement.TestCorruptReplicaInfo
org.apache.hadoop.hTests
org.apache.hadoopTests
org.apache.hadoop.hdfs.server.Tests
org.apache.hadoop.hdfs.sTests
org.apache.hadooTests
org.apache.hadoop.hdfs.qjournal.client.TestIPCLoggerChanTests
org.apache.hadoop.traciTests
org.apache.hadoop.hdfs.TestFileCreationClient

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/8378//testReport/
Release audit warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/8378//artifact/patchprocess/patchReleaseAuditProblems.txt
Javac warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/8378//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8378//console

This message is automatically generated.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-09 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165623#comment-14165623
]

Aaron T. Myers commented on HDFS-7097:
--

The patch looks pretty good, and in thinking about it a fair bit I think it
won't regress the issue I was trying to address in HDFS-5064, though Kihwal I
would appreciate if you could confirm that as well.

A few small comments:

# Does {{FSNamesystem#rollEditLog}} need to take the nsLock as well? Seems like
it might, given that tailing edits no longer is taking the normal FSNS rw lock.
# Similarly for {{FSNamesystem#(start|end)Checkpoint}}, though that's less
obvious to me.
# Seems a little strange to me to be calling this new lock the nsLock, when
that's also what we've been calling the main FSNS rw lock all this time. I'd
suggest renaming this to the checkpoint lock or something, to more clearly
distinguish its purpose.
# I think you can now remove some of the other stuff added as part of
HDFS-5064, e.g. the entire {{longReadLock}} I believe was only actually being
locked for read during checkpointing.

Thanks a lot for working on this, Kihwal.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-07 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162180#comment-14162180
]

Ming Ma commented on HDFS-7097:
---

In the unit test, to get rid of possible flaky situation where SBN might not
have received incremental BR after the file has been closed, maybe it can call
MiniDFSCluster.triggerBlockReports? Otherwise, the patch looks good.

Allow block reports to be processed during checkpointing on standby name node
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-02 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156884#comment-14156884
]

Kihwal Lee commented on HDFS-7097:
--

bq. Does FSNameSystem.saveNamespace still need to take its readLock if it is
called in Standby?
If we want external invocations of {{saveNamespace()}} to be remain as
blocking, then yes. I don't think we want it like this, but then it becomes a
bit complicated and something beyond the scope of this jira.

- External invocations of saveNamespace should be cancellable.
- May be block report processing should be allowed even on active when
saveNamespace is going on. Otherwise the rpc handlers will all be tied up up
quickly and even if it acquires the read lock, no read ops (e.g. heartbeat)
will go through. Incremental BRs are not much of a problem on active NN, since
it needs to be in safe mode first. Incremental BR will settle down in a few
seconds after going into safe mode. Full block reports can be a problem
depending on the number of nodes, time it takes to save the name space and the
full block report interval. The rpc does timeout on DataNode, so full block
reports can be resent multiple times if saveNamespace takes long. They will all
take up a different handler or a call queue slot.
- Ongoing external invocation of saveNamespace should be cancelled during HA
transition. Queued (in call queue or waiting on the lock) saveNamespace
requests should also be rejected for a while just like checkpointing is not
allowed for some time after a cancellation.
- Locking will be a bit tricky.

I agree that we need to address the saveNamespace issue, but prefer doing it in
a separate jira.

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157172#comment-14157172
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672615/HDFS-7097.patch
  against trunk revision a56f3ec.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8303//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8303//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8303//console

This message is automatically generated.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-23 Thread Vinayakumar B (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144584#comment-14144584
]

Vinayakumar B commented on HDFS-7097:
-

Thanks Kihwal, Changes looks good.

I too have the same question as [~mingma].
1. though its not frequent, saveNamespace RPC on standby will create the same
problem as mentioned in description.
can we have different locks for saveNamespace based on HA state?

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-22 Thread Ming Ma (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143585#comment-14143585
]

Ming Ma commented on HDFS-7097:
---

Thanks, Kihwal. This is quite useful. Good idea to use another lock for
coordination.

1. Does FSNameSystem.saveNamespace still need to take its readLock if it is
called in Standby?
2. It will be useful to add some unit test to verify BR can still be processed
when checkpoint is in progress.

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-19 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140717#comment-14140717
 ] 

Kihwal Lee commented on HDFS-7097:
--

On a standby name node, the persistent state (i.e. things saved to fsimage) is 
updated only by replaying edits once it is started.  
- During checkpointing do not allow any state changes that will be persisted. 
(I.e. edit log replaying)
- During checkpointing do not allow any RPC calls that may affect checkpointing 
itself. (saveNamespace, restoreFailedStorage, etc.)

A new {{ReentrantLock}},{{nsLock}}, is introduced to coordinate checkpointing 
and other activities.  Any thing that requires both {{nsLock}} and {{fsLock}}, 
{{nsLock}} is to be locked first. Otherwise it can block other RPC calls that 
do not require {{nsLock}}, not to mention deadlock.  These locks are all locked 
interruptibly, so that the threads can be stopped during HA transition.

Also improved {{FSImageFormat.Saver}} on cancellation checking, by making the 
check interval counter a class-level variable and following the PB format's 
threshold variable name. This only matters if the legacy oiv image saving is 
enabled.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140735#comment-14140735
 ] 

Hadoop QA commented on HDFS-7097:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12670009/HDFS-7097.patch
  against trunk revision bf27b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs-nfs:

org.apache.hadoop.hdfs.nfs.nfs3.TTests
org.apache.hadoop.hdfs.nfs.nfs3.TestOpenFilTests

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8106//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8106//console

This message is automatically generated.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-19 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140861#comment-14140861
]

Kihwal Lee commented on HDFS-7097:
--

The pre-commit didn't seem to do the right thing. It did apply the patch, but
somehow thought the change is in hadoop-hdfs-nfs.

{panel}
Running tests in hadoop-hdfs-project/hadoop-hdfs-nfs
/home/jenkins/tools/maven/latest/bin/mvn clean install -fn -Pnative
-Drequire.test.libhadoop -DHadoopPatchProcess
{panel}

Allow block reports to be processed during checkpointing on standby name node
-

Key: HDFS-7097
URL: https://issues.apache.org/jira/browse/HDFS-7097
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
Attachments: HDFS-7097.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

28 matches

Site Navigation

Mail list logo

Footer information