[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2016-02-01 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127552#comment-15127552
 ] 

Junping Du commented on HDFS-8676:
--

Move it out of 2.6.4 to 2.6.5 as this jira haven't update for a while.

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2016-01-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080557#comment-15080557
 ] 

Junping Du commented on HDFS-8676:
--

Hi [~walter.k.su] and [~kihwal], I tried to cherry-pick this patch to 
branch-2.6 but it seems to have conflicts. Would you help to commit the patch 
to branch-2.6 or provide a patch against branch-2.6? Thanks!

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-11-23 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023269#comment-15023269
 ] 

Sangjin Lee commented on HDFS-8676:
---

Thanks for the reply [~kihwal]. Does this patch depend on HDFS-7645 in any way, 
or can it be committed as a standalone fix?

It sounds like HDFS-7645 should be included along with HDFS-8656 and HDFS-9426 
(which is still open).

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-11-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023052#comment-15023052
 ] 

Kihwal Lee commented on HDFS-8676:
--

bq. Does this issue exist in 2.6.x? Should this be backported to branch-2.6?
yes & yes, imo.

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-11-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019118#comment-15019118
 ] 

Sangjin Lee commented on HDFS-8676:
---

Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955102#comment-14955102
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8619 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8619/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955114#comment-14955114
 ] 

Kihwal Lee commented on HDFS-8676:
--

To fix this in 2.7, we need to bring in HDFS-7645 and HDFS-8656.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955081#comment-14955081
 ] 

Kihwal Lee commented on HDFS-8676:
--

+1

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955135#comment-14955135
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #520 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/520/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955157#comment-14955157
 ] 

Kihwal Lee commented on HDFS-8676:
--

We have come to know that this bug not only causes heartbeat expiration, but 
fails writes. Since the deletion is executed by the actor thread synchronously, 
incremental block reports are blocked while deletion is in progress. Flie 
closures or adding blocks fail, if deletion takes a long time.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955259#comment-14955259
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #531 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/531/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955197#comment-14955197
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2466 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2466/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955196#comment-14955196
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1256 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1256/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java


> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955671#comment-14955671
 ] 

Walter Su commented on HDFS-8676:
-

Thanks [~kihwal] for reviewing the patch, reporting this, and good analysis!

> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955638#comment-14955638
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #490 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/490/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

2015-10-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955775#comment-14955775
 ] 

Hudson commented on HDFS-8676:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2428 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2428/])
HDFS-8676. Delayed rolling upgrade finalization can cause heartbeat (kihwal: 
rev 5b43db47a313decccdcca8f45c5708aab46396df)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceStorage.java


> Delayed rolling upgrade finalization can cause heartbeat expiration and write 
> failures
> --
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952163#comment-14952163
 ] 

Hadoop QA commented on HDFS-8676:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  25m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 54s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 17s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 28s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 224m 30s | Tests passed in hadoop-hdfs. 
|
| | | 280m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765752/HDFS-8676.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 7e2c971 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12923/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12923/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12923/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12923/console |


This message was automatically generated.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-09 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951046#comment-14951046
 ] 

Kihwal Lee commented on HDFS-8676:
--

pre-commit is having problem cleaning up the garbage from previous builds.
We fixed a similar issue before... I guess it's broken again.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-10-01 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940045#comment-14940045
 ] 

Walter Su commented on HDFS-8676:
-

Thanks Lee for reviewing. I'm on holiday. I'll update it soon.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-09-30 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936917#comment-14936917
 ] 

Kihwal Lee commented on HDFS-8676:
--

The patch looks good in general. There is a potential problem, which was part 
of the original code.  There is a precondition check that can throw 
{{IllegalStateException}}, which is a {{RuntimeException}}.  This can cause 
offerService() to blow up in the middle of heartbeat response processing. For 
example, important command like {{DNA_ACCESSKEYUPDATE}} can be dropped.  
Instead of blowing up in the middle, it should log {{ERROR}} and move on.  I 
suggest changing it to a combination of {{assert}} and conditional statement. 
{{assert}} will make sure it blows up in testing, so we will know if something 
is obviously broken. In production, the conditional statement will log the 
error message and simply skip the deletion.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization can cause heartbeat expiration

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934926#comment-14934926
 ] 

Hadoop QA commented on HDFS-8676:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m  3s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 149m 38s | Tests failed in hadoop-hdfs. |
| | | 195m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
| Timed out tests | org.apache.hadoop.hdfs.server.mover.TestStorageMover |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764188/HDFS-8676.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 151fca5 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12735/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12735/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12735/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12735/console |


This message was automatically generated.

> Delayed rolling upgrade finalization can cause heartbeat expiration
> ---
>
> Key: HDFS-8676
> URL: https://issues.apache.org/jira/browse/HDFS-8676
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Walter Su
>Priority: Critical
> Attachments: HDFS-8676.01.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks 
> can pile up in the datanode trash directories until an upgrade is finalized.  
> When it is finally finalized, the deletion of trash is done in the service 
> actor thread's context synchronously.  This blocks the heartbeat and can 
> cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade 
> finalization.  The deletion of trash directories should be made asynchronous.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)