Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On 02/05/2011 00:06, Jameson Graef Rollins wrote: On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchingsb...@decadent.org.uk wrote: On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. This is not directly related to your issues here, but it is possible to make a 1-disk raid1 set so that you are not normally degraded. When you want to do the backup, you can grow the raid1 set with the usb disk, want for the resync, then fail it and remove it, then grow the raid1 back to 1 disk. That way you don't feel you are always living in a degraded state. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4dbe753d.20...@westcontrol.com
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Mon, 02 May 2011 11:11:25 +0200, David Brown da...@westcontrol.com wrote: This is not directly related to your issues here, but it is possible to make a 1-disk raid1 set so that you are not normally degraded. When you want to do the backup, you can grow the raid1 set with the usb disk, want for the resync, then fail it and remove it, then grow the raid1 back to 1 disk. That way you don't feel you are always living in a degraded state. Hi, David. I appreciate the concern, but I am not at all concerned about living in a degraded state. I'm far more concerned about data loss and the fact that this bug has seemingly revealed that some commonly held assumptions and uses of software raid are wrong, with potentially far-reaching affects. I also don't see how the setup you're describing will avoid this bug. If this bug is triggered by having a layer between md and the filesystem and then changing the raid configuration by adding or removing a disk, then I don't see how there's a difference between hot-adding to a degraded array and growing a single-disk raid1. In fact, I would suspect that your suggestion would be more problematic because it involves *two* raid reconfigurations (grow and then shrink) rather than one (hot-add) to achieve the same result. I imagine that each raid reconfiguration could potentially triggering the bug. But I still don't have a clear understanding of what is going on here to be sure. jamie. pgp6LAlfZBXLV.pgp Description: PGP signature
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On 02/05/11 18:38, Jameson Graef Rollins wrote: On Mon, 02 May 2011 11:11:25 +0200, David Brownda...@westcontrol.com wrote: This is not directly related to your issues here, but it is possible to make a 1-disk raid1 set so that you are not normally degraded. When you want to do the backup, you can grow the raid1 set with the usb disk, want for the resync, then fail it and remove it, then grow the raid1 back to 1 disk. That way you don't feel you are always living in a degraded state. Hi, David. I appreciate the concern, but I am not at all concerned about living in a degraded state. I'm far more concerned about data loss and the fact that this bug has seemingly revealed that some commonly held assumptions and uses of software raid are wrong, with potentially far-reaching affects. I also don't see how the setup you're describing will avoid this bug. If this bug is triggered by having a layer between md and the filesystem and then changing the raid configuration by adding or removing a disk, then I don't see how there's a difference between hot-adding to a degraded array and growing a single-disk raid1. In fact, I would suspect that your suggestion would be more problematic because it involves *two* raid reconfigurations (grow and then shrink) rather than one (hot-add) to achieve the same result. I imagine that each raid reconfiguration could potentially triggering the bug. But I still don't have a clear understanding of what is going on here to be sure. I didn't mean to suggest this as a way around these issues - I was just making a side point. Like you and others in this thread, I am concerned about failures that could be caused by having the sort of layered and non-homogeneous raid you describe. I merely mentioned single-disk raid1 mirrors as an interesting feature you can get with md raid. Many people don't like to have their system in a continuous error state - it can make it harder to notice when you have a /real/ problem. And single-disk mirrors gives you the same features, but no degraded state. As you say, it is conceivable that adding or removing disks to the raid could make matters worse. From what I have read so far, it looks like you can get around problems here if the usb disk is attached when the block layers are built up (i.e., when the dm-crypt is activated, and the lvm and filesystems on top of it). It should then be safe to remove it, and re-attach it later. Of course, it's hardly ideal to have to attach your backup device every time you boot the machine! -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4dbefdcd.80...@hesbynett.no
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings b...@decadent.org.uk wrote: On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. Well, this is not expected to work. Possibly the hot-addition of a disk with different bio restrictions should be rejected. But I'm not sure, because it is safe to do that if there is no mounted filesystem or stacking device on top of the RAID. Hi, Ben. Can you explain why this is not expected to work? Which part exactly is not expected to work and why? I would recommend using filesystem-level backup (e.g. dirvish or backuppc). Aside from this bug, if the SSD fails during a RAID resync you will be left with an inconsistent and therefore useless 'backup'. I appreciate your recommendation, but it doesn't really have anything to do with this bug report. Unless I am doing something that is *expressly* not supposed to work, then it should work, and if it doesn't then it's either a bug or a documentation failure (ie. if this setup is not supposed to work then it should be clearly documented somewhere what exactly the problem is). The block layer correctly returns an error after logging this message. If it's due to a read operation, the error should be propagated up to the application that tried to read. If it's due to a write operation, I would expect the error to result in the RAID becoming desynchronised. In some cases it might be propagated to the application that tried to write. Can you say what is correct about the returned error? That's what I'm still not understanding. Why is there an error and what is it coming from? jamie. pgpcHV7VuPYMt.pgp Description: PGP signature
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote: On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings b...@decadent.org.uk wrote: On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. Well, this is not expected to work. Possibly the hot-addition of a disk with different bio restrictions should be rejected. But I'm not sure, because it is safe to do that if there is no mounted filesystem or stacking device on top of the RAID. Hi, Ben. Can you explain why this is not expected to work? Which part exactly is not expected to work and why? Adding another type of disk controller (USB storage versus whatever the SSD interface is) to a RAID that is already in use. I would recommend using filesystem-level backup (e.g. dirvish or backuppc). Aside from this bug, if the SSD fails during a RAID resync you will be left with an inconsistent and therefore useless 'backup'. I appreciate your recommendation, but it doesn't really have anything to do with this bug report. Unless I am doing something that is *expressly* not supposed to work, then it should work, and if it doesn't then it's either a bug or a documentation failure (ie. if this setup is not supposed to work then it should be clearly documented somewhere what exactly the problem is). The normal state of a RAID set is that all disks are online. You have deliberately turned this on its head; the normal state of your RAID set is that one disk is missing. This is such a basic principle that most documentation won't mention it. The block layer correctly returns an error after logging this message. If it's due to a read operation, the error should be propagated up to the application that tried to read. If it's due to a write operation, I would expect the error to result in the RAID becoming desynchronised. In some cases it might be propagated to the application that tried to write. Can you say what is correct about the returned error? That's what I'm still not understanding. Why is there an error and what is it coming from? The error is that you changed the I/O capabilities of the RAID while it was already in use. But what I was describing as 'correct' was that an error code was returned, rather than the error condition only being logged. If the error condition is not properly propagated then it could lead to data loss. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings b...@decadent.org.uk wrote: On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote: On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings b...@decadent.org.uk wrote: On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. Well, this is not expected to work. Possibly the hot-addition of a disk with different bio restrictions should be rejected. But I'm not sure, because it is safe to do that if there is no mounted filesystem or stacking device on top of the RAID. Hi, Ben. Can you explain why this is not expected to work? Which part exactly is not expected to work and why? Adding another type of disk controller (USB storage versus whatever the SSD interface is) to a RAID that is already in use. Normally this practice is perfectly OK. If a filesysytem is mounted directly from an md array, then adding devices to the array at any time is fine, even if the new devices have quite different characteristics than the old. However if there is another layer in between md and the filesystem - such as dm - then there can be problem. There is no mechanism in the kernl for md to tell dm that things have changed, so dm never changes its configuration to match any change in the config of the md device. A filesystem always queries the config of the device as it prepares the request. As this is not an 'active' query (i.e. it just looks at variables, it doesn't call a function) there is no opportunity for dm to then query md. There is a -merge_bvec_fn which could be pushed into service. i.e. if md/raid1 defined some trivial merge_bvec_fn, then it would probably work. However the actual effect of this would probably to cause every bio created by the filesystem to be just one PAGE in size, and this is guaranteed always to work. So it could be a significant performance hit for the common case. We really need either: - The fs sends down arbitrarily large requests, and the lower layers split them up if/when needed or - A mechanism for a block device to tell the layer above that something has changed. But these are both fairly intrusive which unclear performance/complexity implications and no one has bothered. NeilBrown -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110502102224.7787d6bd@notabene.brown
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On 05/01/2011 08:00 PM, Ben Hutchings wrote: On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote: Hi, Ben. Can you explain why this is not expected to work? Which part exactly is not expected to work and why? Adding another type of disk controller (USB storage versus whatever the SSD interface is) to a RAID that is already in use. [...] The normal state of a RAID set is that all disks are online. You have deliberately turned this on its head; the normal state of your RAID set is that one disk is missing. This is such a basic principle that most documentation won't mention it. This is somewhat worrisome to me. Consider a fileserver with non-hotswap disks. One disk fails in the morning, but the machine is in production use, and the admin's goals are: * minimize downtime, * reboot only during off-hours, and * minimize the amount of time that the array is spent de-synced. A responsible admin might reasonably expect to attach a disk via a well-tested USB or ieee1394 adapter, bring the array back into sync, announce to the rest of the organization that there will be a scheduled reboot later in the evening. Then, at the scheduled reboot, move the disk from the USB/ieee1394 adapter to the direct ATA interface on the machine. If this sequence of operations is likely (or even possible) to cause data loss, it should be spelled out in BIG RED LETTERS someplace. I don't think any of the above steps seem unreasonable, and the set of goals the admin is attempting to meet are certainly commonplace goals. The error is that you changed the I/O capabilities of the RAID while it was already in use. But what I was describing as 'correct' was that an error code was returned, rather than the error condition only being logged. If the error condition is not properly propagated then it could lead to data loss. How is an admin to know which I/O capabilities to check before adding a device to a RAID array? When is it acceptable to mix I/O capabilities? Can a RAID array which is not currently being used as a backing store for a filesystem be assembled of unlike disks? What if it is then (later) used as a backing store for a filesystem? One of the advantages people tout for in-kernel software raid (over many H/W RAID implementations) is the ability to mix disks, so that you're not reliant on a single vendor during a failure. If this advantage doesn't extend across certain classes of disk, it would be good to be unambiguous about what can be mixed and what cannot. Regards, --dkg signature.asc Description: OpenPGP digital signature
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote: On 05/01/2011 08:00 PM, Ben Hutchings wrote: On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote: Hi, Ben. Can you explain why this is not expected to work? Which part exactly is not expected to work and why? Adding another type of disk controller (USB storage versus whatever the SSD interface is) to a RAID that is already in use. [...] The normal state of a RAID set is that all disks are online. You have deliberately turned this on its head; the normal state of your RAID set is that one disk is missing. This is such a basic principle that most documentation won't mention it. This is somewhat worrisome to me. Consider a fileserver with non-hotswap disks. One disk fails in the morning, but the machine is in production use, and the admin's goals are: * minimize downtime, * reboot only during off-hours, and * minimize the amount of time that the array is spent de-synced. A responsible admin might reasonably expect to attach a disk via a well-tested USB or ieee1394 adapter, bring the array back into sync, announce to the rest of the organization that there will be a scheduled reboot later in the evening. Then, at the scheduled reboot, move the disk from the USB/ieee1394 adapter to the direct ATA interface on the machine. If this sequence of operations is likely (or even possible) to cause data loss, it should be spelled out in BIG RED LETTERS someplace. So far as I'm aware, the RAID may stop working, but without loss of data that's already on disk. I don't think any of the above steps seem unreasonable, and the set of goals the admin is attempting to meet are certainly commonplace goals. The error is that you changed the I/O capabilities of the RAID while it was already in use. But what I was describing as 'correct' was that an error code was returned, rather than the error condition only being logged. If the error condition is not properly propagated then it could lead to data loss. How is an admin to know which I/O capabilities to check before adding a device to a RAID array? When is it acceptable to mix I/O capabilities? Can a RAID array which is not currently being used as a backing store for a filesystem be assembled of unlike disks? What if it is then (later) used as a backing store for a filesystem? [...] I think the answers are: - Not easily - When the RAID does not have another device on top - Yes - Yes but Neil can correct me on this. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Mon, 02 May 2011 02:04:18 +0100, Ben Hutchings b...@decadent.org.uk wrote: On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote: So far as I'm aware, the RAID may stop working, but without loss of data that's already on disk. What exactly does RAID may stop working mean? Do you mean that this bug will be triggered? The raid will refuse to do further syncs? Or do you mean something else? How is an admin to know which I/O capabilities to check before adding a device to a RAID array? When is it acceptable to mix I/O capabilities? Can a RAID array which is not currently being used as a backing store for a filesystem be assembled of unlike disks? What if it is then (later) used as a backing store for a filesystem? [...] I think the answers are: - Not easily - When the RAID does not have another device on top This is very upsetting to me, if it's true. It completely undermines all of my assumptions about how software raid works. Are you really saying that md with mixed disks is not possible/supported when the md device has *any* other device on top of it? This is a in fact a *very* common setup. *ALL* of my raid devices have other devices on top of them (lvm at least). In fact, the debian installer supports putting dm and/or lvm on top of md on mixed disks. If what you're saying is true then the debian installer is in big trouble. jamie. pgpc3uNN5oQLD.pgp Description: PGP signature
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
} -Original Message- } From: linux-raid-ow...@vger.kernel.org [mailto:linux-raid- } ow...@vger.kernel.org] On Behalf Of NeilBrown } Sent: Sunday, May 01, 2011 8:22 PM } To: Ben Hutchings } Cc: Jameson Graef Rollins; 624...@bugs.debian.org; linux- } r...@vger.kernel.org } Subject: Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio } too big device md0 (248 240) in kern.log } } On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings b...@decadent.org.uk } wrote: } } On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote: } On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings } b...@decadent.org.uk wrote: }On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: } I run what I imagine is a fairly unusual disk setup on my laptop, } consisting of: } } ssd - raid1 - dm-crypt - lvm - ext4 } } I use the raid1 as a backup. The raid1 operates normally in } degraded } mode. For backups I then hot-add a usb hdd, let the raid1 sync, } and } then fail/remove the external hdd. } }Well, this is not expected to work. Possibly the hot-addition of a } disk }with different bio restrictions should be rejected. But I'm not } sure, }because it is safe to do that if there is no mounted filesystem or }stacking device on top of the RAID. } } Hi, Ben. Can you explain why this is not expected to work? Which } part } exactly is not expected to work and why? } } Adding another type of disk controller (USB storage versus whatever the } SSD interface is) to a RAID that is already in use. } } Normally this practice is perfectly OK. } If a filesysytem is mounted directly from an md array, then adding devices } to the array at any time is fine, even if the new devices have quite } different characteristics than the old. } } However if there is another layer in between md and the filesystem - such } as } dm - then there can be problem. } There is no mechanism in the kernl for md to tell dm that things have } changed, so dm never changes its configuration to match any change in the } config of the md device. } } A filesystem always queries the config of the device as it prepares the } request. As this is not an 'active' query (i.e. it just looks at } variables, it doesn't call a function) there is no opportunity for dm to } then } query md. } } There is a -merge_bvec_fn which could be pushed into service. i.e. if } md/raid1 defined some trivial merge_bvec_fn, then it would probably work. } However the actual effect of this would probably to cause every bio } created } by the filesystem to be just one PAGE in size, and this is guaranteed } always } to work. So it could be a significant performance hit for the common } case. } } We really need either: } - The fs sends down arbitrarily large requests, and the lower layers } split }them up if/when needed } or } - A mechanism for a block device to tell the layer above that something } has }changed. } } But these are both fairly intrusive which unclear performance/complexity } implications and no one has bothered. } } NeilBrown Maybe mdadm should not allow a disk to be added if its characteristics are different enough to be an issue? And require the --force option if the admin really wants to do it anyhow. Oh, and a good error message explaining the issues and risks. :) Guy -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/AFE0035C8E784AF8BE3370E7D72A2595@m5
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On 05/01/2011 08:22 PM, NeilBrown wrote: However if there is another layer in between md and the filesystem - such as dm - then there can be problem. There is no mechanism in the kernl for md to tell dm that things have changed, so dm never changes its configuration to match any change in the config of the md device. A filesystem always queries the config of the device as it prepares the request. As this is not an 'active' query (i.e. it just looks at variables, it doesn't call a function) there is no opportunity for dm to then query md. Thanks for this followup, Neil. Just to clarify, it sounds like any one of the following situations on its own is *not* problematic from the kernel's perspective: 0) having a RAID array that is more often in a de-synced state than in an online state. 1) mixing various types of disk in a single RAID array (e.g. SSD and spinning metal) 2) mixing various disk access channels within a single RAID array (e.g. USB and SATA) 3) putting other block device layers (e.g. loopback, dm-crypt, dm (via lvm or otherwise) above md and below a filesystem 4) hot-adding a device to an active RAID array from which filesystems are mounted. However, having any layers between md and the filesystem becomes problematic if the array is re-synced while the filesystem is online, because the intermediate layer can't communicate $SOMETHING (what specifically?) from md to the kernel's filesystem code. As a workaround, would the following sequence of actions (perhaps impossible for any given machine's operational state) allow a RAID re-sync without the errors jrollins reports or requiring a reboot? a) unmount all filesystems which ultimately derive from the RAID array b) hot-add the device with mdadm c) re-mount the filesystems or would something else need to be done with lvm (or cryptsetup, or the loopback device) between steps b and c? Coming at it from another angle: is there a way that an admin can ensure that the RAID array can be re-synced without unmounting the filesystems other than limiting themselves to exactly the same models of hardware for all components in the storage chain? Alternately, Is there a way to manually inform a given mounted filesystem that it should change $SOMETHING (what?), so that an aware admin could keep filesystems online by issuing this instruction before a raid re-sync? From a modular-kernel perspective: Is this specifically a problem with md itself, or would it also be the case with other block-device layering in the kernel? For example, suppose an admin has (without md) lvm over a bare disk, and a filesystem mounted from an LV. The admin then adds a second bare disk as a PV to the VG, and uses pvmove to transfer the physical extents of the active filesystem to the new disk, while mounted. Assuming that the new disk doesn't have the same characteristics (which characteristics?), does the fact that LVM sits between the underlying disk and the filesystem cause the same problem? What if dm-crypt sits between the disk and lvm? Between lvm and the filesystem? What if the layering is disk-dm-md-fs instead of disk-md-dm-fs ? Sorry for all the questions without having much concrete to contribute at the moment. If these limitations are actually well-documented somewhere, I would be grateful for a pointer. As a systems administrator, i would be unhappy to be caught out by some as-yet-unknown constraints during a hardware failure. I'd like to at least know my constraints beforehand. Regards, --dkg signature.asc Description: OpenPGP digital signature
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote: Package: linux-2.6 Version: 2.6.38-3 Severity: normal -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 As you can see from the kern.log snippet below, I am seeing frequent messages reporting bio too big device md0 (248 240). I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. Well, this is not expected to work. Possibly the hot-addition of a disk with different bio restrictions should be rejected. But I'm not sure, because it is safe to do that if there is no mounted filesystem or stacking device on top of the RAID. I would recommend using filesystem-level backup (e.g. dirvish or backuppc). Aside from this bug, if the SSD fails during a RAID resync you will be left with an inconsistent and therefore useless 'backup'. I started noticing these messages after my last sync. I have not rebooted since. I found a bug report on the launchpad that describes an almost identical situation: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638 The reporter seemed to be concerned that their may be data loss happening. I have not yet noticed any, but of course I'm terrified that it's happening and I just haven't found it yet. Unfortunately the bug was closed with a Won't Fix without any resolution. Is this a kernel bug, or is there something I can do to remedy the situation? I haven't tried to reboot yet to see if the messages stop. I'm obviously most worried about data loss. Please advise! The block layer correctly returns an error after logging this message. If it's due to a read operation, the error should be propagated up to the application that tried to read. If it's due to a write operation, I would expect the error to result in the RAID becoming desynchronised. In some cases it might be propagated to the application that tried to write. If the error is somehow discarded then there *is* a kernel bug with the risk of data loss. I am starting to suspect that these messages are in face associated with data loss on my system. I have witnessed these messages occur during write operations to the disk, and I have also started to see some strange behavior on my system. dhclient started acting weird after these messages appeared (not holding on to leases) and I started to notice database exceptions in my mail client. Interestingly, the messages seem to have gone away after reboot. I will watch closely to see if they return after my next raid1 sync. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
Package: linux-2.6 Version: 2.6.38-3 Severity: normal -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 As you can see from the kern.log snippet below, I am seeing frequent messages reporting bio too big device md0 (248 240). I run what I imagine is a fairly unusual disk setup on my laptop, consisting of: ssd - raid1 - dm-crypt - lvm - ext4 I use the raid1 as a backup. The raid1 operates normally in degraded mode. For backups I then hot-add a usb hdd, let the raid1 sync, and then fail/remove the external hdd. I started noticing these messages after my last sync. I have not rebooted since. I found a bug report on the launchpad that describes an almost identical situation: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638 The reporter seemed to be concerned that their may be data loss happening. I have not yet noticed any, but of course I'm terrified that it's happening and I just haven't found it yet. Unfortunately the bug was closed with a Won't Fix without any resolution. Is this a kernel bug, or is there something I can do to remedy the situation? I haven't tried to reboot yet to see if the messages stop. I'm obviously most worried about data loss. Please advise! Thanks so much for any help. jamie. - -- Package-specific info: ** Version: Linux version 2.6.38-2-amd64 (Debian 2.6.38-3) (b...@decadent.org.uk) (gcc version 4.4.5 (Debian 4.4.5-15) ) #1 SMP Thu Apr 7 04:28:07 UTC 2011 ** Command line: BOOT_IMAGE=/vmlinuz-2.6.38-2-amd64 root=/dev/mapper/servo-root ro vga=788 ** Not tainted ** Kernel log: [134465.496126] bio too big device md0 (248 240) [134465.544976] bio too big device md0 (248 240) [134465.626438] bio too big device md0 (248 240) [134465.675884] bio too big device md0 (248 240) [134465.752459] bio too big device md0 (248 240) [134465.827410] bio too big device md0 (248 240) [134466.087495] bio too big device md0 (248 240) [134466.155538] bio too big device md0 (248 240) [134466.225549] bio too big device md0 (248 240) [134466.268505] bio too big device md0 (248 240) [134466.397099] bio too big device md0 (248 240) [134466.464110] bio too big device md0 (248 240) [134466.501557] bio too big device md0 (248 240) [134466.547847] bio too big device md0 (248 240) [134466.636949] bio too big device md0 (248 240) [134466.695790] bio too big device md0 (248 240) [134466.748543] bio too big device md0 (248 240) [134466.791067] bio too big device md0 (248 240) [134466.822082] bio too big device md0 (248 240) [134466.834387] bio too big device md0 (248 240) [134466.884726] bio too big device md0 (248 240) [134466.933843] bio too big device md0 (248 240) [134466.982737] bio too big device md0 (248 240) [134467.021168] bio too big device md0 (248 240) [134467.093886] bio too big device md0 (248 240) [134467.113183] bio too big device md0 (248 240) [134467.133697] bio too big device md0 (248 240) [134467.163391] bio too big device md0 (248 240) [134467.238819] bio too big device md0 (248 240) [134467.279655] bio too big device md0 (248 240) [134467.337005] bio too big device md0 (248 240) [134467.406347] bio too big device md0 (248 240) [134467.462565] bio too big device md0 (248 240) [134467.499770] bio too big device md0 (248 240) [134467.544269] bio too big device md0 (248 240) [134511.879575] bio too big device md0 (248 240) [134511.903777] bio too big device md0 (248 240) [135819.708128] bio too big device md0 (248 240) [135833.674591] bio too big device md0 (248 240) [135833.675175] bio too big device md0 (248 240) [135833.679417] bio too big device md0 (248 240) [135833.683757] bio too big device md0 (248 240) [135833.687908] bio too big device md0 (248 240) [135833.691984] bio too big device md0 (248 240) [135833.696038] bio too big device md0 (248 240) [135833.700465] bio too big device md0 (248 240) [135833.705000] bio too big device md0 (248 240) [135833.709328] bio too big device md0 (248 240) [135833.713498] bio too big device md0 (248 240) [135833.717687] bio too big device md0 (248 240) [135833.721729] bio too big device md0 (248 240) [135833.727046] bio too big device md0 (248 240) [135833.732615] bio too big device md0 (248 240) [135833.736938] bio too big device md0 (248 240) [135835.924148] bio too big device md0 (248 240) [135835.941912] bio too big device md0 (248 240) [135835.942503] bio too big device md0 (248 240) [135835.955810] bio too big device md0 (248 240) [135836.007533] bio too big device md0 (248 240) [135836.016057] bio too big device md0 (248 240) [135836.020241] bio too big device md0 (248 240) [135836.020257] bio too big device md0 (248 240) [135836.028139] bio too big device md0 (248 240) [135836.038644] bio too big device md0 (248 240) [135836.039922] bio too big device md0 (248 240) [135836.070426] bio too big device md0 (248 240) [135836.102252] bio too big device md0 (248 240) [135836.103499] bio too big device md0 (248 240) [135836.104840] bio too big device
Bug#624343: linux-image-2.6.38-2-amd64: frequent message bio too big device md0 (248 240) in kern.log
I am starting to suspect that these messages are in face associated with data loss on my system. I have witnessed these messages occur during write operations to the disk, and I have also started to see some strange behavior on my system. dhclient started acting weird after these messages appeared (not holding on to leases) and I started to notice database exceptions in my mail client. Interestingly, the messages seem to have gone away after reboot. I will watch closely to see if they return after my next raid1 sync. jamie. pgprQbgSVTavb.pgp Description: PGP signature