Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-03-11 Thread Mader, Alexander
Am 27.02.2016 um 10:01 schrieb Mader, Alexander:
> On Fri, 26 Feb 2016 23:45:01 +0100 "Mader, Alexander" wrote:
>> On Mon, 22 Feb 2016 18:19:58 + Chris Boot wrote:
>>> Adding "rootdelay=15" to the kernel command-line probably fixes this
>>> (can't test as we can't just reboot this production server at will) but
>>> this isn't a nice fix.
>> I just tried "rootdelay=15" as well as "rootdelay=30" to no avail. For
>> this I entered the extended boot options in GRUB2, used "e" to edit the
>> parameter section, and inserted as second line the rootdelay. Should I
>> rather edit the /etc/default/grub?
> 
> Hello,
> 
> I just tried booting with /etc/default/grub edited, but it does not
> help; presumably because root is not my problem but home. Is there
> something similar for other partitions but root? As you might have
> noticed, I am quiet flexible with reboot experiments for testing something.
> 
> The entries in /etc/default/grub:
> 
> 8< - >8
> GRUB_DEFAULT=0
> GRUB_TIMEOUT=5
> GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
> GRUB_CMDLINE_LINUX_DEFAULT="quiet"
> GRUB_CMDLINE_LINUX="rootdelay=15"
> 8< - >8
> 
> After update-grub ...

Am 10.03.2016 um 15:57 schrieb Samuel Thibault:
> Samuel Thibault, on Thu 10 Mar 2016 14:45:20 +0100, wrote:
>> Dmitry Smirnov, on Mon 29 Feb 2016 18:46:10 +1100, wrote:
>>> I tried adding "rootdelay=4" to kernel boot parameters but it did
not help.
>>
>> I had tried rootdelay=15, it didn't help me either.
>
> Ok, I had to increase that to 45 seconds to get it working automatically.

Hello,

I tried that 45s delay as well using the GRUB edit and it just added a
long delay before starting anything, but the manual assembly of md0 was
still necessary.

Best regards, Alexander.



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-03-10 Thread Samuel Thibault
Samuel Thibault, on Thu 10 Mar 2016 14:45:20 +0100, wrote:
> Dmitry Smirnov, on Mon 29 Feb 2016 18:46:10 +1100, wrote:
> > I tried adding "rootdelay=4" to kernel boot parameters but it did not help.
> 
> I had tried rootdelay=15, it didn't help me either.

Ok, I had to increase that to 45 seconds to get it working
automatically.  Leaving notes here for people with the same problem,
needing a workaround until this gets fixed.

More precisely, here is the boot log with that I get with rootdelay=15
only:

[5.720053] floppy0: no floppy controllers found
[5.729329] work still pending

Here it pauses for about 10 seconds more, i.e. up to the rootdelay, then
this:

Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... 
Warning: starting local-top mdadm
Warning: MD_DEVS is all
Begin: Assembling all MD arrays ... mdadm: No devices listed in conf file were 
found.
Failure: failed to assemble all arrays.
done.

I.e. it failed to assemble. Then it pauses again for about 15 more seconds.

[   32.775174] scsi0 : ioc0: LSISAS1068E B3, FwRev=00192f00h, Ports=1, 
MaxQ=266, IRQ=16

At last scsi responds...

[   32.828520] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 0, phy 
0, sas_addr 0x5000cca00758d704
[   32.848561] scsi 0:0:0:0: Direct-Access HITACHI  HUS151414VLS300  A48B 
PQ: 0 ANSI: 5
[   32.869051] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 1, phy 
1, sas_addr 0x5000cca00759b0d4
[   32.889048] scsi 0:0:1:0: Direct-Access HITACHI  HUS151414VLS300  A48B 
PQ: 0 ANSI: 5
[   32.909548] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 
4, sas_addr 0x12210400
[   32.931826] scsi 0:0:2:0: Direct-Access ATA  Samsung SSD 850  2B6Q 
PQ: 0 ANSI: 5
[   32.951748] mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 5, phy 
5, sas_addr 0x12210500
[   32.974070] scsi 0:0:3:0: Direct-Access ATA  Samsung SSD 850  2B6Q 
PQ: 0 ANSI: 5
[   33.003539] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   33.014259] sd 0:0:0:0: [sda] 286749480 512-byte logical blocks: (146 GB/136 
GiB)
[   33.029511] sd 0:0:1:0: Attached scsi generic sg1 type 0
[   33.040297] sd 0:0:2:0: Attached scsi generic sg2 type 0
[   33.050973] sd 0:0:3:0: [sdd] 2000409264 512-byte logical blocks: (1.02 
TB/953 GiB)
[   33.066336] sd 0:0:1:0: [sdb] 286749480 512-byte logical blocks: (146 GB/136 
GiB)
[   33.081347] sd 0:0:2:0: [sdc] 2000409264 512-byte logical blocks: (1.02 
TB/953 GiB)
[   33.096778] sd 0:0:3:0: Attached scsi generic sg3 type 0
[   33.107452] sd 0:0:0:0: [sda] Write Protect is off
[   33.120484] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, 
supports DPO and FUA
[   33.137758] sd 0:0:1:0: [sdb] Write Protect is off
[   33.148356] sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, 
supports DPO and FUA
[   33.165631] sd 0:0:3:0: [sdd] Write Protect is off
[   33.175250] sd 0:0:2:0: [sdc] Write Protect is off
[   33.185825] sd 0:0:3:0: [sdd] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[   33.203960] sd 0:0:2:0: [sdc] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[   33.229427]  sdd: sdd1
[   33.234218]  sdc: sdc1
[   33.246768] sd 0:0:3:0: [sdd] Attached SCSI disk
[   33.257655] sd 0:0:2:0: [sdc] Attached SCSI disk
[   33.281649]  sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 >
[   33.292320]  sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 >
[   33.312635] sd 0:0:0:0: [sda] Attached SCSI disk
[   33.323737] sd 0:0:1:0: [sdb] Attached SCSI disk
[   33.333010] random: nonblocking pool is initialized
[   33.606845] device-mapper: uevent: version 1.0.3
[   33.616518] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: 
dm-de...@redhat.com
done.
Begin: Running /[   33.640698] PM: Starting manual resume from disk
scripts/local-premount ... done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... 
done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.

etc.

When I set rootdelay to 45, scsi responds at 32s, and assembly is tried
at 45s and can now succeed.

Samuel



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-03-10 Thread Samuel Thibault
Hello,

Dmitry Smirnov, on Mon 29 Feb 2016 18:46:10 +1100, wrote:
> I tried adding "rootdelay=4" to kernel boot parameters but it did not help.

I had tried rootdelay=15, it didn't help me either.

Samuel



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-28 Thread Dmitry Smirnov
I got hit by this regression too. :(

After recent update of one of my Jessie systems, new kernel 3.16.0-4 failed 
to assemble healthy mdadm RAID:


1.665134] md0: failed to create bitmap (-22)
mdadm: failed to RUN_ARRAY /dev/md/0: Invalid argument
mdadm: /dev/md/1 has been started with 2 drives.
mdadm: /dev/md/2 has been started with 2 drives.
1.970801] md0: failed to create bitmap (-22)
mdadm: failed to RUN_ARRAY /dev/md/0: Invalid argument
Gave up waiting for root device.


I booted successfully with older kernel 3.14-2 (untouched by update-initramfs 
on update), checked HDDs thoroughly and installed kernel 4.3.0-0.bpo.1-amd64
that failed to assemble md0 just like 3.16 did.

There are 2 HDDs with 3 partitions each and 3 mdadm devices combining 
corresponding partitions in RAID-1 mode; rootfs (ext4) is on md0.

: cat /proc/mdstat 
Personalities : [raid1] 
md2 : active (auto-read-only) raid1 sdb3[1] sda3[0]
  1920960 blocks super 1.2 [2/2] [UU]
  bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
  222525248 blocks super 1.2 [2/2] [UU] 

  
  bitmap: 0/2 pages [0KB], 65536KB chunk

  


  
md0 : active raid1 sda1[0] sdb1[1]  

  
  68390784 blocks super 1.2 [2/2] [UU]  

  
  bitmap: 1/1 pages [4KB], 65536KB chunk

  


Booting into recovery mode fails with multiple messages

Begin: Running /scripts/local-block ... done.

followed by "Gave up waiting for root device",

I tried adding "rootdelay=4" to kernel boot parameters but it did not help.
This system never had any problems assembling mdadm RAIDs before...

I also booted system using old rescue disk and verified all mdadm devices 
with

echo -n "check" > /sys/block/mdN/md/sync_action

No errors was logged whatsoever yet newly installed kernels fail to assemple 
mdadm device with rootfs...

Please advise.

-- 
Cheers,
 Dmitry Smirnov.

---

I am easily satisfied with the very best.
-- Winston Churchill


signature.asc
Description: This is a digitally signed message part.


Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-27 Thread Mader, Alexander
On Sat, 27 Feb 2016 10:01:21 +0100 "Mader, Alexander" 
wrote:
> On Fri, 26 Feb 2016 23:45:01 +0100 "Mader, Alexander" 
> wrote:
> > On Mon, 22 Feb 2016 18:19:58 + Chris Boot  wrote:
> > > Adding "rootdelay=15" to the kernel command-line probably fixes this
> > > (can't test as we can't just reboot this production server at will) but
> > > this isn't a nice fix.
> > I just tried "rootdelay=15" as well as "rootdelay=30" to no avail. For
> > this I entered the extended boot options in GRUB2, used "e" to edit the
> > parameter section, and inserted as second line the rootdelay. Should I
> > rather edit the /etc/default/grub?
> I just tried booting with /etc/default/grub edited, but it does not
> help; presumably because root is not my problem but home. Is there
> something similar for other partitions but root? As you might have
> noticed, I am quiet flexible with reboot experiments for testing something.

Hello,

I tried to trick the mount sequence by adding "_netdev" to the home
volume. As a result the boot completed but the home directory was not
available, anyway. The 1:30min pause in the boot sequence was there as well.

The change in /etc/fstab:

8< - >8
# /home was on /dev/md0 during installation
UUID=8e677927-d719-48eb-815d-d16170bdd003 /home   xfs
defaults,_netdev0   2
8< - >8

Best regards, Alexander.



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-27 Thread Mader, Alexander
On Fri, 26 Feb 2016 23:45:01 +0100 "Mader, Alexander" 
wrote:
> On Mon, 22 Feb 2016 18:19:58 + Chris Boot  wrote:
> > Adding "rootdelay=15" to the kernel command-line probably fixes this
> > (can't test as we can't just reboot this production server at will) but
> > this isn't a nice fix.
> I just tried "rootdelay=15" as well as "rootdelay=30" to no avail. For
> this I entered the extended boot options in GRUB2, used "e" to edit the
> parameter section, and inserted as second line the rootdelay. Should I
> rather edit the /etc/default/grub?

Hello,

I just tried booting with /etc/default/grub edited, but it does not
help; presumably because root is not my problem but home. Is there
something similar for other partitions but root? As you might have
noticed, I am quiet flexible with reboot experiments for testing something.

The entries in /etc/default/grub:

8< - >8
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX="rootdelay=15"
8< - >8

After update-grub I have in /boot/grub/grub.cfg:

8< - >8
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class
gnu --class os $menuentry_id_option
'gnulinux-simple-2c2b6225-6fd0-43af-9408-c4395b43c0f7' {
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_msdos
insmod ext2
set root='hd0,msdos2'
if [ x$feature_platform_search_hint = xy ]; then
  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos2
--hint-efi=hd0,msdos2 --hint-baremetal=ahci0,msdos2 --hint='hd0,msdos2'
 2c2b6225-6fd0-43af-9408-c4395b43c0f7
else
  search --no-floppy --fs-uuid --set=root
2c2b6225-6fd0-43af-9408-c4395b43c0f7
fi
echo'Linux 4.2.0-0.bpo.1-amd64 wird geladen …'
linux   /boot/vmlinuz-4.2.0-0.bpo.1-amd64
root=UUID=2c2b6225-6fd0-43af-9408-c4395b43c0f7 ro rootdelay=15 quiet
echo'Initiale Ramdisk wird geladen …'
initrd  /boot/initrd.img-4.2.0-0.bpo.1-amd64
}
8< - >8

The "rootdelay" occurs in every menuentry as expected and booting pauses
before accessing root.

Best regards, Alexander.



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-26 Thread Mader, Alexander
On Mon, 22 Feb 2016 18:19:58 + Chris Boot  wrote:
> Adding "rootdelay=15" to the kernel command-line probably fixes this
> (can't test as we can't just reboot this production server at will) but
> this isn't a nice fix.

Hello,

I just tried "rootdelay=15" as well as "rootdelay=30" to no avail. For
this I entered the extended boot options in GRUB2, used "e" to edit the
parameter section, and inserted as second line the rootdelay. Should I
rather edit the /etc/default/grub?

Best regards, Alexander.



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-22 Thread Dimitri John Ledkov
Thanks. Would you be able at all check if the RAID was always rebuilt
on each boot? From past boot logs or some such.

Regards,

Dimitri.

On 22 February 2016 at 18:19, Chris Boot  wrote:
> Hi folks,
>
> I think that this bug is a regression due to the fix for #784070, which
> was fairly recently pushed to both unstable and *stable*. We have at
> least one client server that got bitten by this and failed to boot.
>
> I suspect the incremental assembly, which breaks booting from degraded
> arrays, was sufficient to start arrays on block devices that take a bit
> longer to show up than usual. With that now disabled, if the disks
> haven't showed up by the time mdadm runs in local-top, you're screwed.
>
> Adding "rootdelay=15" to the kernel command-line probably fixes this
> (can't test as we can't just reboot this production server at will) but
> this isn't a nice fix.
>
> It seems like #714155 is also somehow related, but hasn't seen any
> activity since July 2013.
>
> HTH,
> Chris
>
> --
> Chris Boot
> bo...@debian.org
> GPG: 8467 53CB 1921 3142 C56D  C918 F5C8 3C05 D9CE 
>
> ___
> pkg-mdadm-devel mailing list
> pkg-mdadm-de...@lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-mdadm-devel



-- 
Regards,

Dimitri.



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-22 Thread Chris Boot
Hi folks,

I think that this bug is a regression due to the fix for #784070, which
was fairly recently pushed to both unstable and *stable*. We have at
least one client server that got bitten by this and failed to boot.

I suspect the incremental assembly, which breaks booting from degraded
arrays, was sufficient to start arrays on block devices that take a bit
longer to show up than usual. With that now disabled, if the disks
haven't showed up by the time mdadm runs in local-top, you're screwed.

Adding "rootdelay=15" to the kernel command-line probably fixes this
(can't test as we can't just reboot this production server at will) but
this isn't a nice fix.

It seems like #714155 is also somehow related, but hasn't seen any
activity since July 2013.

HTH,
Chris

-- 
Chris Boot
bo...@debian.org
GPG: 8467 53CB 1921 3142 C56D  C918 F5C8 3C05 D9CE 



Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-20 Thread Alexander Mader
Package: mdadm
Version: 3.3.2-5+deb8u1
Followup-For: Bug #814036

Dear Maintainer,

first of all: thank you for providing and maintaining this packeage for Debian!

   * What led up to the situation?

After update to the current version around January, 23rd, or so the home volume 
does not 
assemble automatically anymore. It is a RAID1 mirroring two disks of identical 
produce. 
The S.M.A.R.T. does not report anything wrong with the disks. So, during system 
start-up the 
root volume is checked correctly and then the sequence pauses waiting for the 
home volume to 
appear for fsck. After 1:30min pause the sequence completes dropping into 
emergency mode. In 
the emegency shell I issue:

   # mdadm --assemble /dev/mdadm/0

which leads to: 

   mdadm: /dev/md/0 has been started with 2 drives
   [some time] systemd-fsck[some PID]: /sbin/fsck.xfs: XFS file system

after issuing:

   # systemctl default;exit

I get:

   Aufgelegt
   exit

and everything starts as expected.

Currently, I am using 4.2.0-0.bpo.1-amd64 kernel. The issue occurs with the 
standard kernel 
3.16.0-4-amd64, too. Starting into single mode does not help, either.

Please, give me some hint how to get back to fully automatic start up of the 
system.

-- Package-specific info:
--- mdadm.conf
DEVICE partitions
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST 
MAILADDR root
ARRAY /dev/md/0 metadata=1.2 UUID=7e9e8d1d:6e7537df:aa9bb19b:70432a09 
name=douglas:0

--- /etc/default/mdadm
INITRDSTART='all'
AUTOCHECK=true
START_DAEMON=true
DAEMON_OPTIONS="--syslog"
VERBOSE=false

--- /proc/mdstat:
Personalities : [raid1] 
md0 : active raid1 sdb1[0] sdc1[1]
  976759672 blocks super 1.2 [2/2] [UU]
  
unused devices: 

--- /proc/partitions:
major minor  #blocks  name

   80   87918264 sda
   818790016 sda1
   82   79126528 sda2
  1101048575 sr0
   8   16  976762584 sdb
   8   17  976760832 sdb1
   8   32  976762584 sdc
   8   33  976760832 sdc1
   90  976759672 md0

--- LVM physical volumes:
LVM does not seem to be used.
--- mount output
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=497031,mode=755)
devpts on /dev/pts type devpts 
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,relatime,size=809984k,mode=755)
/dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs 
(rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs 
(rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/md0 on /home type xfs (rw,relatime,attr2,inode64,noquota)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
tmpfs on /run/user/119 type tmpfs 
(rw,nosuid,nodev,relatime,size=404992k,mode=700,uid=119,gid=126)
tmpfs on /run/user/1000 type tmpfs 
(rw,nosuid,nodev,relatime,size=404992k,mode=700,uid=1000,gid=1000)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse 
(rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

--- initrd.img-4.2.0-0.bpo.1-amd64:
22105 blocks
131be8e959b59b6dc48c289966125cd4  ./sbin/mdadm
f2ccfe3cb3445d98351a7f634182c610  ./conf/mdadm
599bbf3fe6093157a26863dcb59cdf5d  ./scripts/local-top/mdadm
d3be82c0f275d6c25b04d388baf9e836  ./etc/modprobe.d/mdadm.conf
aed4ea5a182660bf39e94a5d66112d54  ./etc/mdadm/mdadm.conf
eb6dfbaa9a7f5b3142dd8647572cd6d4  
./lib/modules/4.2.0-0.bpo.1-amd64/kernel/drivers/md/raid10.ko
6544487f7c02ef708d9ebc3047b09070  
./lib/modules/4.2.0-0.bpo.1-amd64/kernel/drivers/md/linear.ko
e2e9a82a86fab5deb195bb985a0697ef  

Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-08 Thread Ben Hutchings
Control: reassign -1 mdadm

On Sun, 2016-02-07 at 20:07 +0100, Samuel Thibault wrote:
> Package: initramfs-tools
> Version: 0.120
> Severity: important
> 
> Hello,
> 
> Our server failed to reboot this afternoon. initrd was stuck trying to
> get the root device, running local-block in a loop before starting an
> emergency shell.  There, running mdam -A --scan discovered everything
> and exitting the shell allowed the boot to proceed.  There was no
> previous mention in the boot about being running mdadm.
> 
> My guess (we can't really afford retrying etc. as it's a production
> system) is that AIUI mdadm is called just once from local-top, but
> that's perhaps too early, the disks are not yet discovered because the
> controller is slow. local-block is then run repeatedly to try to get the
> block devices, but mdadm from local-top should be called repeatedly too
> to try to assemble the md too?

It has always been documented that local-top scripts will be run
exactly once.   We can't change that behaviour now.  It's up to the
mdadm package to retry whatever needs to be done in its local-block
script.

Ben.

-- 
Ben Hutchings
In a hierarchy, every employee tends to rise to his level of incompetence.

signature.asc
Description: This is a digitally signed message part


Bug#814036: initramfs-tools: mdadm doesn't assemble disk

2016-02-07 Thread Samuel Thibault
Package: initramfs-tools
Version: 0.120
Severity: important

Hello,

Our server failed to reboot this afternoon. initrd was stuck trying to
get the root device, running local-block in a loop before starting an
emergency shell.  There, running mdam -A --scan discovered everything
and exitting the shell allowed the boot to proceed.  There was no
previous mention in the boot about being running mdadm.

My guess (we can't really afford retrying etc. as it's a production
system) is that AIUI mdadm is called just once from local-top, but
that's perhaps too early, the disks are not yet discovered because the
controller is slow. local-block is then run repeatedly to try to get the
block devices, but mdadm from local-top should be called repeatedly too
to try to assemble the md too?

Samuel
# mdadm Debian configuration
#
# You can run 'dpkg-reconfigure mdadm' to modify the values in this file, if
# you want. You can also change the values here and changes will be preserved.
# Do note that only the values are preserved; the rest of the file is
# rewritten.
#

# INITRDSTART:
#   list of arrays (or 'all') to start automatically when the initial ramdisk
#   loads. This list *must* include the array holding your root filesystem. Use
#   'none' to prevent any array from being started from the initial ramdisk.
INITRDSTART='all'

# AUTOCHECK:
#   should mdadm run periodic redundancy checks over your arrays? See
#   /etc/cron.d/mdadm.
AUTOCHECK=true

# START_DAEMON:
#   should mdadm start the MD monitoring daemon during boot?
START_DAEMON=true

# DAEMON_OPTIONS:
#   additional options to pass to the daemon.
DAEMON_OPTIONS="--syslog"

# VERBOSE:
#   if this variable is set to true, mdadm will be a little more verbose e.g.
#   when creating the initramfs.
VERBOSE=false
ARRAY /dev/md/0 metadata=1.2 name=apollon:0 
UUID=0251c1f9:87d10ee8:f35106f1:fc2c1ce0
ARRAY /dev/md/1 metadata=1.2 name=apollon:1 
UUID=de5c51dd:dc646570:a791dde2:76b95a00
ARRAY /dev/md/2 metadata=1.2 name=apollon:2 
UUID=b07a6bab:d8d016a7:0203809b:8313199b
ARRAY /dev/md/3 metadata=1.2 name=apollon:3 
UUID=b8fe6506:06f4d3b1:cbbcb08c:73a2a178
ARRAY /dev/md/4 metadata=1.2 name=apollon:4 
UUID=7d928fd5:e3f0d6ec:b81cd933:f056df30
ARRAY /dev/md10 metadata=1.2 name=apollon:10 
UUID=b3f44d83:70a26fe7:8e43d4aa:3242f60c
MAILADDR root