[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2013-01-05 Thread ceg
** Also affects: mdadm (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: debian-installer (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: ubiquity (Ubuntu)
   Importance: Undecided
   Status: New

** Description changed:

+ Note: Bug is also present when hot-plugging USB, Firewire etc. devices.
+ 
+ ---
+ 
  This is on a MSI Wind U100 and I've got the following stack running:
  HDD  SD card (USB card reader) - RAID1 - LUKS - LVM - Reiser
  
  Whenever I remove the HDD from the Raid1
   mdadm /dev/md0 --fail /dev/sda2
-  mdadm /dev/md0 --remove /dev/sda2) 
+  mdadm /dev/md0 --remove /dev/sda2)
  for powersaving reasons, I cannot run any apt related tools.
  
   sudo apt-get update
  [...]
  Hit http://de.archive.ubuntu.com intrepid-updates/multiverse Sources
  Reading package lists... Error!
  E: Read error - read (5 Input/output error)
  E: The package lists or status file could not be parsed or opened.
  
  Taking a look at the kernel log shows (and many more above):
   dmesg|tail
  [ 9479.330550] bio too big device md0 (248  240)
  [ 9479.331375] bio too big device md0 (248  240)
  [ 9479.332182] bio too big device md0 (248  240)
  [ 9611.980294] bio too big device md0 (248  240)
  [ 9742.929761] bio too big device md0 (248  240)
  [ 9852.932001] bio too big device md0 (248  240)
  [ 9852.935395] bio too big device md0 (248  240)
  [ 9852.938064] bio too big device md0 (248  240)
  [ 9853.081046] bio too big device md0 (248  240)
  [ 9853.081688] bio too big device md0 (248  240)
  
  $ sudo mdadm --detail /dev/md0
  /dev/md0:
- Version : 00.90
-   Creation Time : Tue Jan 13 11:25:57 2009
-  Raid Level : raid1
-  Array Size : 3871552 (3.69 GiB 3.96 GB)
-   Used Dev Size : 3871552 (3.69 GiB 3.96 GB)
-Raid Devices : 2
-   Total Devices : 1
+ Version : 00.90
+   Creation Time : Tue Jan 13 11:25:57 2009
+  Raid Level : raid1
+  Array Size : 3871552 (3.69 GiB 3.96 GB)
+   Used Dev Size : 3871552 (3.69 GiB 3.96 GB)
+    Raid Devices : 2
+   Total Devices : 1
  Preferred Minor : 0
- Persistence : Superblock is persistent
+ Persistence : Superblock is persistent
  
-   Intent Bitmap : Internal
+   Intent Bitmap : Internal
  
- Update Time : Fri Jan 23 21:47:35 2009
-   State : active, degraded
-  Active Devices : 1
+ Update Time : Fri Jan 23 21:47:35 2009
+   State : active, degraded
+  Active Devices : 1
  Working Devices : 1
-  Failed Devices : 0
-   Spare Devices : 0
+  Failed Devices : 0
+   Spare Devices : 0
  
-UUID : 89863068:bc52a0c0:44a5346e:9d69deca (local to host m-twain)
-  Events : 0.8767
+    UUID : 89863068:bc52a0c0:44a5346e:9d69deca (local to host m-twain)
+  Events : 0.8767
  
- Number   Major   Minor   RaidDevice State
-0   000  removed
-1   8   171  active sync writemostly   /dev/sdb1
+ Number   Major   Minor   RaidDevice State
+    0   000  removed
+    1   8   171  active sync writemostly   /dev/sdb1
  
  $ sudo ubuntu-bug -p linux-meta
- dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': 
Input/output error  
  
- dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': 
Input/output error 
+ dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': 
Input/output error
+ dpkg-query: failed in buffer_read(fd): copy info file `/var/lib/dpkg/status': 
Input/output error
  [...]
  
  Will provide separate attachements.

** Bug watch added: Debian Bug tracker #624343
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=624343

** Also affects: linux via
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=624343
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/320638

Title:
  hot-add/remove in mixed HDD/USB/SD-card  RAIDs - data corruption (bio
  too big device md0 (248  200))

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/320638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-07-24 Thread Jim Lieb
@Stephan,

The purpose of RAID is reliability, not a power saving strategy.  It is
true that there are bitmaps to minimize the bulk of the re-sync, an
optimization, but that is all it is.  The re-sync code schedules these
so that there is minimal impact on overall performance during the re-
sync.  On a heavily used system, such as a server, this can take hours.
It has been my experience that the disk subsystem gets pounded to death
during this time.

There are a number of issues wrt mixing devices in this manner.  Whereas
HDD storage has access latencies in the msec range, read and write
speeds are the same.  While an SSD does not have access latency and read
performance is in the HDD range, its write speed is not only
asymetrically slower, but significantly slower.  The manufacturers are
not there *yet* to compete with HDDs.  Even in private, NDA discussions
they tend to be vague about these important details.  SDs, CFs, and USB
sticks do not even fit this category.  They are low cost, low power
secondary storage.  Your idea of mixing HDD and SSD storage is an
interesting idea.  However, the mix has problems.

Your comments about vulnerability are true.  What RAID *should* do and
what it actually does are two different things.  This is why Btrfs and
ZFS (among others) address these very issues.  However, you are treating
a degraded RAID as a normal case.  In practice, this is not true and
unwanted.  Case in point, no one uses RAID0 for anything other than low
value, big bit bucket storage.  Yes, this is a candidate for
idempotent, large dataset munching but not for your only copy of the
photo album (or the AMEX transaction database).  RAID1 is an interesting
idea but most low end arrays are, in fact, RAID10 to get striping.  As I
mentioned above, the re-build is pounding the disk and as soon as the
drive can come back to the array and start the rebuild, the smaller the
rebuild queue will be.  This also introduces stripes to improve
performance which makes your mix-and-match problematical.  RAID arrays
*really* want identical units.  You really don't even want to mix sizes
in the same speed range because sectors/track are usually different.
The mis-match results in asymetric performance on the array, making
performance match the slowest unit. These are the big issues in a RAID
subsystem design.  It is all about redundancy and speed optimizations
given that every write transaction involves extra writes over the single
disk case.  Your use case is not on the list for I/O subsystem
designers.  See below for what they are looking at.

I should address your issues about propagating size changes up and down
the stack.  The first issue is MD somehow notifying the upper layers
that there have been some size changes.  This works the wrong way.  The
application demands determine the various sizings starting with the
filesystem.  Case in point, a database specifies its own page size,
often being some multiple of a predominant row size.  This in turn
determines the write size to the filesystem.  This is where ext4 extents
come in and where the current linux penchant for one-size-fits-all 4k
page gets in the way.  This, in turn, mixes with the array and
underlying disk geometry to determine stripe size.  Tuning this is
somewhat a black art.  Change the use case to streaming media files and
all the sizes change.  In other words, the sizing comes down from above,
not up from the bottom.  Remember, once written to storage, the sizing
of the row/db-page/extent/stripe is fixed until re-written.

Your second suggestion does two things, first, it effectively disables
the caching, and second, it leaks private information across the layer
boundary, and for what purpose?  It is the upper layers that advise the
configuration of the lower, not the other way around.  And, this is a
tuning issue.  The database will still work if I swap in, via LVM, a new
array of different geometry.  It will just run crappy if I don't config
it to be at least a multiple/modulo equivalent of what it
replaces/augments.

Both of your suggested solutions require new code and changes to the
APIs of multiple layers, something that will have little chance of
making it into the mainline.  Since Ubuntu must track  with mainline,
this means that it would be acceptable for inclusion in Ubuntu only
after it has been included in the mainline.  The idea is an interesting
one but it is not a patch but new development.

Your final paragraph hits the point.  This is a fundamental design issue
raised by your corner case.  I hope the above explains why your need is
still a corner case, out of the scope of the design.  No, it is not just
exotic; it is indeed not as was intended for the above reasons.  Please
don't take offense, but one could easily say that this is using a
vernier caliper as a pipe wrench.  Might I suggest an alternative that
the industry is moving toward.

Laptop power consumption is a real problem.  This is a real problem in
the data center as well. 

[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-07-07 Thread Stephan Diestelhorst
@Jim: Thanks for getting back to me on this one!

Your understanding for my purposes is correct. Let me address your
points one by one:

 You save little, if any, power because an array restore requires a
complete disk copy, not an update of some number of out-of-date blocks.
...

No. First of all there are write-intent bitmaps that reduce the amount
of synchronisation needed. Second the SD-card is in the raid all the
time, with option write-mostly. Hence I can just drop the HDD from the
RAID, spin it down and save power that way. When I'm with a power supply
again, resynching copies only the changed stuff to the HDD.

 It can also leave your system vulnerable. I have heard of reiser
filesystems fail w/ compromised raid arrays, which this is in your
powersaving mode.

It must not. This is something everybody expects from the two following
things: RAID and block device abstraction. Any sane RIAD implementation
will guarantee to provide the same semantics on the block layer
interface (with performance degradation, sure). The file system in turn
must only rely on the semantics provided by the block layer interface.
As these semantics remain the same the FS has no way of telling a
difference and malfunctioning because of it. Any such behaviour is
clearly a bug the RAID implementation or the FS.

 There is power management in the newer kernels to cycle down the hdd
to conserve power but this takes careful tuning.

This is improving steadily, but cannot save the same power as if the HDD
was not used at all.

 If you want backup with this mix of devices, use rsync to the usb
stick.

I can do this copying, but I cannot remount the root FS on the fly to
the copy on the USB key / SD card.

 A usb stick is not the same as an SSD. ...

It is for all that the correctness / semantics of the entire FS stack
should be concerned: A block device. All the differences you mention all
well understood but they may impact only quantitative aspects, such as
performance, MTBF etc. of the system.

After skimming through the kernel source once more I feel that the real
problem lies within unclear specifications regarding the 'constantness'
of max_sectors. MD assumes that it can adjust the value it reports on
_polling_ according to the characteristics of the devices in the RAID.
Some layer in the stack (LVM or LUKS, both based on the same block layer
abstraction or even the FS) apparently cannot cope with variable
max_sectors.

In addition to the possible solutions mentioned in previous comments, I can 
think of several other ways to deal with the issue:
a) Provide a way for MD to _notify_ the upper layers of changing characterisics 
of the block device. Then all layers would have the responsibility to chain the 
notification to their respective parent layers. This may be nasty as it 
requires reversed connection knowledge.

b) Have more graceful handling of the issue on detection of faulty
acceses. Once the upper requesting layer receives a specific error
message for an access it can initiate a reprobe of the actual
max_sectors value from the underlying block device. The contract would
then be that intermediate block layers do not cache this information but
rather ask their lower layers for up-to-date information.

Please note that hits is a fundamental design issue, which has turned up
because of an -admittedly exotic- setup, rather than some used-in-a-way-
it-was-not-meant-to-be-used thing.

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-07-06 Thread Jim Lieb
** Changed in: linux (Ubuntu)
 Assignee: (unassigned) = Jim Lieb (jim-lieb)

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-06-30 Thread Jim Lieb
@Stephan, As I understand this, you are using an md raid1 to save power
and have a backup?  This is failure prone and dangerous to your data.
The MD layer is pretty resilient but it is meant for RAID of like disks.
This is why the numbers are weird.  It expects a scsi/sata hdd and
works best if they are matched, i.e. same disk model.  Mixing in a usb
stick is not the same thing.  You save little, if any, power because an
array restore requires a complete disk copy, not an update of some
number of out-of-date blocks.  I wouldn't be surprised if it consumes
even more since it is a steady state transfer load until the restore is
complete.  This restore, depending on subsystem traffic and disk size
can take a significant amount of time.  It can also leave your system
vulnerable.  I have heard of reiser filesystems fail w/ compromised raid
arrays, which this is in your powersaving mode.  There is power
management in the newer kernels to cycle down the hdd to conserve power
but this takes careful tuning.  If you want backup with this mix of
devices, use rsync to the usb stick.  This will be consistent and only
write what is needed.  A usb stick is not the same as an SSD.  The
former is meant as a replacement for floppies, namely lower than hdd
transaction rates, removability, and expected limits to lifetime.  The
later is meant for hdd type applications.  I suggest you reconsider your
configuration.

I did not invalid this bug yet pending your reply.

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-06-02 Thread Andy Whitcroft
This is not a bug in the linux-meta package, moving to the linux
package.

** Package changed: linux-meta (Ubuntu) = linux (Ubuntu)

** Changed in: linux (Ubuntu)
   Importance: Undecided = Critical

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-06-02 Thread Leann Ogasawara
** Changed in: linux (Ubuntu)
   Status: New = Triaged

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-01-26 Thread Stephan Diestelhorst
Mhmhm, I'm replying to myself. Again.
https://kerneltrap.org/mailarchive/linux-kernel/2007/4/10/75875

NeilBrown:
...
dm doesn't know that md/raid1 has just changed max_sectors and there
is no convenient way for it to find out.  So when the filesystem tries
to get the max_sectors for the dm device, it gets the value that dm set
up when it was created, which was somewhat larger than one page.
When the request gets down to the raid1 layer, it caused a problem.
...

This seems to be exactly the issue I see. For whatever reason, Reiser queries
dm for max_sectors and receives the value still valid when the disk was around.
 (248 seems to be chosen due to being 256 and dividable by 8, see this ancient
 thread: http://lkml.indiana.edu/hypermail/linux/kernel/0303.1/0880.html )

Various solutions come to mind:
a) Gracefully handle the issue, i.e. split the request once it does not fit 
into the limit.
  a1) at the caller
  a2) at the callee
b) Let LVM query max_sectors before every (?) request it sends through to the 
device below.

I feel that somehting is really wrong here.

Please fix.

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 320638] Re: Raid1 HDD and SD card - data corruption (bio too big device md0 (248 200))

2009-01-24 Thread Stephan Diestelhorst
** Summary changed:

- Raid1 HDD and SD card - bio too big device md0 (248  200)
+ Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))

-- 
Raid1 HDD and SD card - data corruption (bio too big device md0 (248  200))
https://bugs.launchpad.net/bugs/320638
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs