We are still seeing bunch of these. Even with a reduced submit
replication.
Are we the only ones seeing those? If not I'd be running off filing a
bug.
cheers
--
Torsten
On 30.08.2007, at 19:47, Hairong Kuang wrote:
Namenode does not schedule a block to a datanode that is confirmed
to hold a
replica of the block. But it is not aware of any in-transit block
placement
(i.e. the scheduled but not confirmed block placement), so
occasionally we
may still see "is valid, and cannot be written to" errors.
A fix to the problem is to keep track of all in-transit block
placements,
and the block placement algorithm considers these to-be-confirmed
replicas
as well.
Hairong
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 30, 2007 10:28 AM
To: hadoop-dev@lucene.apache.org
Subject: Re: still getting "is valid, and cannot be written to"
Raghu Angadi wrote:
Torsten Curdt wrote:
I just checked our mapred.submit.replication and it is higher than
the nodes in the cluster - maybe that's the problem?
This pretty much assures at least a few of these exceptions.
So we have a workaround: lower mapred.submit.replication. And it's
arguably
not a bug, but just a misfeature, since it only causes spurious
warnings.
One fix might be to try to determine mapred.submit.replication
based on the
cluster size. But that was contentious when that feature was
added, and I'd
rather not re-open that argument again now.
You can argue that Namenode should not schedule a block to a node
twice.. and I agree.
That sounds like a good thing to fix. Should we file a bug?
Doug