I hope this helps. See below.
<>< Lance.
> my questions are:
>
> 2. the disk seems to be "cured" by re-enabling DMA . . . but what is the state
> of my array likely to be after the errors above? Can I safely assume this was
> harmless? I mean, they WERE write errors after all, yes? Is my array still in
> sync? Is there any way to tell other than by unmounting the array and fscking?
> 3. is the failure simply not sufficiently severe to trigger removal from the
> array and hot reconstruction onto the host spare which is available?
>
The md driver calls the device's block driver for the specific device. It is there (or
lower) that all media error detection
and retries are performed. If the request made from the md driver fails (by the buffer
not being marked uptodate), then the
md driver assumes the device is bad and stops communicating with it (no retries
attempted.) There is an exception: the md
driver will do some retries while doing a resync, but no retries are attempted under
normal working conditions.
So, if the lower level device drivers for the IDE devices are working correctly by
doing sometimes needed retries and
delivers the data as requested, the md driver never knows about any hiccups along the
way. This is good and bad. Good in
that the md driver doesn't need to worry about different types of devices and their
peculiar behavior, but bad in that the
md driver cannot predict device failures due to flaky or deteriorating hardware.
So, if the md driver doesn't fail a drive that is because the lower levels have taken
care of all the nitty details and have
supposedly performed the requested data transfer correctly. As long as the actual
device drivers do the requests, the md
driver won't know about any problems.
>
> 4. is there some way to mark this disk bad right now, so that reconstruction
> is carried out from the disks I trust? I do have a hot spare . . .
>
You can use the 'raidhotremove' utility.