Re: raid5:md3: read error corrected , followed by , Machine Check Exception: .

2007-07-14 Thread Alan Cox
On Sat, 14 Jul 2007 17:08:27 -0700 (PDT) Mr. James W. Laferriere [EMAIL PROTECTED] wrote: Hello All , I was under the impression that a 'machine check' would be caused by some near to the CPU hardware failure , Not a bad disk ? It indicates a hardware failure Jul 14 23:00:26

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
are available here: http://www.usb.org/developers/devclass_docs Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
use it, although it would take a long time to build because it includes so many drivers. Whittling it down to just the drivers you need would be tedious but not very difficult. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
with a hardware issue, not a kernel issue, just because it is so consistent. People have reported problems in which the hardware fails when it encounters a certain pattern of bytes in the data stream. Maybe you're seeing the same sort of thing. Alan Stern - To unsubscribe from this list: send

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
khubd running. That in itself is a very bad sign. You need to look at the dmesg log. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-17 Thread Alan Stern
is liable to miss bits and pieces of the kernel log when a lot of information comes along all at once. You're much better off getting the stack trace data directly from dmesg. (And when you do, you don't end up with 30 columns of wasted data added to the beginning of each line.) Alan Stern

Re: end to end error recovery musings

2007-02-27 Thread Alan
with the same upper bits. More problematic is losing indirect blocks, and being able to keep some kind of [inode low bits/block index] would help put stuff back together. Alan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More

Re: end to end error recovery musings

2007-02-27 Thread Alan
(and a workaround) which is SATA capable 8) Alan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: end to end error recovery musings

2007-02-26 Thread Alan
write a sector on a device with physical sector size larger than logical block size (as allowed by say ATA7) then it's less clear what happens. I don't know if the drive firmware implements multiple tails in this case. On a read error it is worth trying the other parts of the I/O. Alan

Re: end to end error recovery musings

2007-02-26 Thread Alan
I think that this is mostly true, but we also need to balance this against the need for higher levels to get a timely response. In a really large IO, a naive retry of a very large write could lead to a non-responsive system for a very large time... And losing the I/O could result in a

Re: end to end error recovery musings

2007-02-26 Thread Alan
One interesting counter example is a smaller write than a full page - say 512 bytes out of 4k. If we need to do a read-modify-write and it just so happens that 1 of the 7 sectors we need to read is flaky, will this look like a write failure? The current core kernel code can't handle

Re: sata badness in 2.6.20-rc1? [Was: Re: md patches in -mm]

2006-12-15 Thread Alan
On Fri, 15 Dec 2006 13:39:27 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Fri, 15 Dec 2006 13:05:52 -0800 Andrew Morton [EMAIL PROTECTED] wrote: Jeff, I shall send all the sata patches which I have at you one single time and I shall then drop the lot. So please don't flub them.

Re: Linux: Why software RAID?

2006-08-24 Thread Alan Cox
Ar Iau, 2006-08-24 am 09:07 -0400, ysgrifennodd Adam Kropelin: Jeff Garzik [EMAIL PROTECTED] wrote: with sw RAID of course if the builder is careful to use multiple PCI cards, etc. Sw RAID over your motherboard's onboard controllers leaves you vulnerable. Generally speaking the channels on

Re: Linux: Why software RAID?

2006-08-24 Thread Alan Cox
Ar Iau, 2006-08-24 am 07:31 -0700, ysgrifennodd Marc Perkel: So - the bottom line answer to my question is that unless you are running raid 5 and you have a high powered raid card with cache and battery backup that there is no significant speed increase to use hardware raid. For raid 0 there

Re: [PATCH 000 of 5] md: Introduction

2006-01-18 Thread Alan Cox
On Mer, 2006-01-18 at 09:14 +0100, Sander wrote: If the (harddisk internal) remap succeeded, the OS doesn't see the bad sector at all I believe. True for ATA, in the SCSI case you may be told about the remap having occurred but its a by the way type message not an error proper. If you (the

Re: [PATCH] RAID5 NULL Checking Bug Fixt

2001-05-16 Thread Alan Cox
On Wednesday May 16, [EMAIL PROTECTED] wrote: (more patches to come. They will go to Linus, Alan, and linux-raid only). This is the next one, which actually addresses the NULL Checking Bug. Thanks. As Linus merges I'll switch over to match his tree. Less diff is good 8

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
1) Read and write errors should be retried at least once before kicking the drive out of the array. This doesn't seem unreasonable on the face of it. Device level retries are the job of the device level driver 2) On more persistent read errors, the failed block (or whatever unit is

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
any data, but under normal default drive setup the sector will not be reallocated. If testing the failing sector is too much effort, a simple overwrite with the corrected data, at worst, improves the chances of the drive firmware being able to reallocate the sector. This works just fine

Re: Proposed RAID5 design changes.

2001-03-21 Thread Alan Cox
Umm. Isn't RAID implemented as the md device? That implies that it is responsible for some kind of error management. Bluntly, the file systems don't declare a file system kaput until they've retried the critical I/O operations. Why should RAID5 be any less tolerant? File systems give up the