Re: still getting "is valid, and cannot be written to"

Torsten Curdt Thu, 06 Sep 2007 03:25:49 -0700

We are still seeing bunch of these. Even with a reduced submitreplication.Are we the only ones seeing those? If not I'd be running off filing abug.


cheers
--
Torsten


On 30.08.2007, at 19:47, Hairong Kuang wrote:

Namenode does not schedule a block to a datanode that is confirmedto hold areplica of the block. But it is not aware of any in-transit blockplacement(i.e. the scheduled but not confirmed block placement), sooccasionally we
may still see "is valid, and cannot be written to" errors.
A fix to the problem is to keep track of all in-transit blockplacements,and the block placement algorithm considers these to-be-confirmedreplicas
as well.

Hairong

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 30, 2007 10:28 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: still getting "is valid, and cannot be written to"

Raghu Angadi wrote:
Torsten Curdt wrote:
I just checked our mapred.submit.replication and it is higher than
the nodes in the cluster - maybe that's the problem?
This pretty much assures at least a few of these exceptions.
So we have a workaround: lower mapred.submit.replication. And it'sarguablynot a bug, but just a misfeature, since it only causes spuriouswarnings.
One fix might be to try to determine mapred.submit.replicationbased on thecluster size. But that was contentious when that feature wasadded, and I'd
rather not re-open that argument again now.
You can argue that Namenode should not schedule a block to a node
twice.. and I agree.
That sounds like a good thing to fix.  Should we file a bug?

Doug

Re: still getting "is valid, and cannot be written to"

Reply via email to