[ 
https://issues.apache.org/jira/browse/HADOOP-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525633
 ] 

Torsten Curdt commented on HADOOP-1845:
---------------------------------------

We can try to gather a (new) log but if I remember even the unit test had the 
exception. (see the thread on dev) With my still limited knowledge of the 
codebase and comparing to what is going on the cluster it seems it does not 
really cause huge problems. So it was mainly disconcerting. But - at some stage 
we have had so many of these exception in the logs that they were essentially 
spamming the logs. (More than hundred exceptions per minute!) ...which made it 
a bit more than just disconcerting. Maybe the word "panic" was more suited ;) 
At least OPS stopped believing this is not a problem and doesn't add to a good 
reputation of hadoop. It also make you lose the eye for real problems as it 
just drowns them in a see of information.

So yeah ...we should really fix this.

> Datanodes get error message "is valid, and cannot be written to" 
> -----------------------------------------------------------------
>
>                 Key: HADOOP-1845
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1845
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.1
>            Reporter: Hairong Kuang
>             Fix For: 0.15.0
>
>
> >> Copy from dev list:
> Our cluster has 4 nodes and i set the mapred.subimt.replication parameter to 
> 2 on all nodes and the master. Everything has been restarted.
> Unfortuantely, we still have the same exception :
> 2007-09-05 17:01:59,623 ERROR org.apache.hadoop.dfs.DataNode:
> DataXceiver: java.io.IOException: Block blk_-5969983648201186681 is valid, 
> and cannot be written to.
>         at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:515)
>         at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:822)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
>         at java.lang.Thread.run(Thread.java:595)
> >> end of copy
> The message shows that the namenode schedules to replicate a block to a 
> datanode that already holds the block. The namenode block placement algorithm 
> makes sure that it does not schedule a block to a datanode that is confirmed 
> to hold a replica of the block. But it is not aware of any in-transit  block 
> placements (i.e. the scheduled but not confirmed block placements), so 
> occasionally we may still see "is valid, and cannot be written to" errors.
> A fix to the problem is to keep track of all in-transit block  placements, 
> and the block placement algorithm considers these  to-be-confirmed replicas 
> as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to