Hi Torsten, We occasionally see this too. But on a small scale cluster, you are more likely to see this. I filed a jira at https://issues.apache.org/jira/browse/HADOOP-1845.
Cheers, Hairong -----Original Message----- From: Torsten Curdt [mailto:[EMAIL PROTECTED] Sent: Thursday, September 06, 2007 3:25 AM To: hadoop-dev@lucene.apache.org Subject: Re: still getting "is valid, and cannot be written to" We are still seeing bunch of these. Even with a reduced submit replication. Are we the only ones seeing those? If not I'd be running off filing a bug. cheers -- Torsten On 30.08.2007, at 19:47, Hairong Kuang wrote: > Namenode does not schedule a block to a datanode that is confirmed to > hold a replica of the block. But it is not aware of any in-transit > block placement (i.e. the scheduled but not confirmed block > placement), so occasionally we may still see "is valid, and cannot be > written to" errors. > > A fix to the problem is to keep track of all in-transit block > placements, and the block placement algorithm considers these > to-be-confirmed replicas as well. > > Hairong > > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 30, 2007 10:28 AM > To: hadoop-dev@lucene.apache.org > Subject: Re: still getting "is valid, and cannot be written to" > > Raghu Angadi wrote: >> Torsten Curdt wrote: >>> I just checked our mapred.submit.replication and it is higher than >>> the nodes in the cluster - maybe that's the problem? >> >> This pretty much assures at least a few of these exceptions. > > So we have a workaround: lower mapred.submit.replication. And it's > arguably not a bug, but just a misfeature, since it only causes > spurious warnings. > > One fix might be to try to determine mapred.submit.replication based > on the cluster size. But that was contentious when that feature was > added, and I'd rather not re-open that argument again now. > >> You can argue that Namenode should not schedule a block to a node >> twice.. and I agree. > > That sounds like a good thing to fix. Should we file a bug? > > Doug >