Hi James,

Zitat von "A. James Lewis" <[email protected]>:
OK, but in that case bcache is not between your MD RAID and it's disks, so if your disks are dropping out of the MD array, that has to be either an independent problem, or a very complex bug.

My guess is that it's a rather simple timeout / locking problem, which leads to an expiring timer in the MD code. And bcache has a well-known history for locking problems, according to the mailing list.

Regards,
Jens

James


On 07/08/15 16:36, Jens-U. Mozdzen wrote:
Hi James,

Zitat von "A. James Lewis" <[email protected]>:
That's interesting, are you putting your MD on top of multiple bcache devices... rather than bcache on top of an MD device... I wonder what the rationale behind this is?

Hi James, no such thing here...

bcache is running on top of two MD-RAIDs - RAID6 with 7 spinning drives and RAID1 with two SSDs.

The stack is, from bottom to top:

- MD-RAID6 data, MD-RAID1 cache
- bcache (/dev/bcache0, used as an LVM PV)
- LVM
- many LVs
- DRBD on top of most of the LVs
- Ext4 on each of the DRBD devices
- SCST / NFS / SMB sharing these file systems

In the referenced incidents, SCST reports that (many) writes failed due to time-out, and MD reports a single disk faulty. No other traces in syslog, especially no stalled processes, locking problems or kernel bugs.

The i/o pattern is highly parallel reads and writes, mostly via SCST.

Regards,
Jens

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to