[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2017-01-17 Thread Liang Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825984#comment-15825984
 ] 

Liang Chen commented on HDFS-7815:
--

I met the same problem , turn StateChange level to WARN, it works

> Loop on 'blocks does not belong to any file'
> 
>
> Key: HDFS-7815
> URL: https://issues.apache.org/jira/browse/HDFS-7815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.6.0
> Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
> with 19TB disk for hdfs.
>Reporter: Frode Halvorsen
>
> I am currently experincing a looping situation;
> The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
> stating that some blocks don't belong to any file. During this time, it's 
> unresponsive to any requests from datanodes, and if the zoo-keper had been 
> running, it would have taken the name-node down (ssh-fencing : kill).
> When it has finished the 'round', it starts to do some normal work, and among 
> other things, telling the datanode to delete the blocks. But before the 
> datanode has gotten around to delete the blocks, and is about to report back 
> to the namenode, the namenode  has stared on the next round of reporing the 
> same blocks that don't belong to anly file. Thus, the datanode gets a timout 
> when reporing block-updates for the deleted blocks, And this, of course 
> repeats itself over and over again... 
> There is actually two issues , I think,;
> 1- the namenode gets totally unresponsive when reporing the blocks (could 
> this be a debug-line instead of a INFO-line)
> 2 - the namenode seems to 'forget' that it has already reported those blocks 
> just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-03-19 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368961#comment-14368961
 ] 

Frode Halvorsen commented on HDFS-7815:
---

Could you please let me know what I have to put in log4j.properties to tune 
down the logger-lever for this ?

 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-03-19 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369836#comment-14369836
 ] 

Chris Nauroth commented on HDFS-7815:
-

Hi, [~frha].  You can add this line to your log4j.properties to suppress the 
block state change logging:

{code}
log4j.logger.BlockStateChange=WARN
{code}

However, if you're running a distro based on Apache Hadoop 2.6.0, then that 
version has a bug that accidentally changed the routing of these log messages.  
This was fixed in HDFS-7425, so subsequent versions won't have this problem.  
If you're running that version and the above doesn't work, then you can do this 
instead:

{code}
log4j.logger.org.apache.hadoop.hdfs.StateChange=WARN
{code}


 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-03-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345735#comment-14345735
 ] 

Chris Nauroth commented on HDFS-7815:
-

Hi, [~frha].  In the cases we've seen of this bug, the fact that the heavy 
logging occurred while holding the namesystem lock turned out to be the root 
cause.  This seems to agree with your observation that a DataNode experienced a 
timeout during a subsequent block report.  That block report would not have 
been able to make progress on the NameNode side, since the lock would have been 
held by the thread doing the logging.  By moving the logging outside of the 
lock, we give other RPC handler threads the opportunity to acquire the lock and 
make progress.

We also have used the workaround of tuning log4j.properties to suppress this 
logging if you don't need or want those log messages.  This is a workaround 
that can be applied now without any code changes required.

 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-02-23 Thread Frode Halvorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333659#comment-14333659
 ] 

Frode Halvorsen commented on HDFS-7815:
---

How can moving the logging result in the server not repeating the proseccing of 
the blocks in only 2 minutes?  I would think that the root-problem, is that the 
name-node actually repeats the processing of the list of blocks. If it hadn't 
repeated the process, there would be enough resources to deal with the reports 
from the datanode..
Even if you move the logging outside the namesystems write-lock, I would very 
soon come into a situation where the namenode gets so many blocks to process, 
that it would not be finished before starting over. It's enough for me to 
delete a directory with 5 million files in it, and the server would loop an 
endless amount of times before the datanodes is finished with the job.  

 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-02-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329249#comment-14329249
 ] 

Chris Nauroth commented on HDFS-7815:
-

Hello, [~frha].

This bug was fixed in HDFS-7503 by moving this logging outside of the 
namesystem write lock, so even if there is a large volume of this logging, 
other NameNode threads can still make progress.  The fix is targeted to Apache 
Hadoop 2.6.1 and 2.7.0, both still awaiting release.  In the meantime, a known 
workaround is to edit log4j.properties to tune down the logger level to WARN.  
Of course, this will have the side effect of suppressing these log messages 
entirely.

I'm resolving this issue as duplicate.

 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)